Hi! I am currently a PhD student in the Department of Computer Science of Duke University, and a member of Computational Evolutionary Intelligence (CEI) Lab under the supervision of Prof. Yiran Chen. Before joining Duke University, I received my M.S. degree from Peking University (北京大学) in 2024, under the supervision of Prof. Hailong Jiao. I obtained my BEng degree from Southern University of Science and Technology (南方科技大学) in 2021, advised by Prof. Fengwei An. For now, my research realm focuses on algorithm-hardware co-design. Please don’t hesitate to reach out if you have any questions or interests🍀.
E-mail: yuzhe.fu@duke.edu
📖 Educations
- 2024.08 - Present, PhD student in the department of Computer Science, Duke University.
- 2021.09 - 2024.07, Master of Science, Peking University.
- 2017.09 - 2021.06, Bachelor of Engineering, Southern University of Science and Technology.
- 2019.08 - 2019.12, Global Access Program, University of California, Berkeley.
🔥 News
- 2025.03: 🎉🎉 Our paper “SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval” has been accepted by IEEE ICME! In this work, we proposed a new dataset for long speech information retrieval task in LLM, alone with an training-free pruning strategy for this task.
- 2025.02: 🎉🎉 Our paper “GenAI at the Edge: Comprehensive Survey on Empowering Edge Devices” has been accepted by AAAI SSS! In this work, I provide an overview of recent well-known architectures and chip works for LLMs and diffusion networks. I hope you find it insightful and helpful!
- 2024.10: 🎉🎉 Our paper “Nebula: A 28-nm 109.8 TOPS/W 3D PNN Accelerator Featuring Adaptive Partition, Multi-Skipping, and Block-Wise Aggregation” has been accepted by IEEE ISSCC!
- 2024.08: Starting my PhD journey — wish me luck🍀!
- 2024.04: 🎉🎉 Our paper “SoftAct: A High-Precision Softmax Architecture for Transformers with Nonlinear Functions Support” has been accepted by IEEE TCSVT!
- 2024.02: 🎉🎉 Our paper “Adjustable Multi-Stream Block-Wise Farthest Point Sampling Acceleration in Point Cloud Analysis” has been accepted by IEEE TCAS-II!
- 2023.10: 🎉🎉 I have presented the 3D Point Cloud Neural Network Accelerator at IEEE/ACM ICCAD 2023!
- 2023.08: 🎉🎉 Our paper “Sagitta: An Energy-Efficient Sparse 3D-CNN Accelerator for Real-Time 3D Understanding” has been accepted by IEEE IOTJ!
- 2023.07: 🎉🎉 Our paper “An Energy-Efficient 3D Point Cloud Neural Network Accelerator with Efficient Filter Pruning, MLP Fusion, and Dual-Stream Sampling” has been accepted by IEEE/ACM ICCAD 2023!
📝 Publications

SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval
Yueqian Lin#, Yuzhe Fu#, J. Zhang, Y. Liu, J. Zhang, J. Sun, Hai “Helen” Li, Yiran Chen.
2025, IEEE International Conference on Multimedia & Expo (Accept) (# with equal contribution) [pdf]
Abstract
While current Speech Large Language Models (Speech LLMs) excel at short-form tasks, they struggle with the computational and representational demands of longer audio clips. To advance the model's capabilities with long-form speech, we introduce Speech Information Retrieval (SIR), a long-context task for Speech LLMs, and present SPIRAL, a 1,012-sample benchmark testing models’ ability to extract critical details from long spoken inputs. To overcome the challenges of processing long speech sequences, we propose SpeechPrune, a training-free token pruning strategy that uses speech-text similarity and approximated attention scores to efficiently discard irrelevant tokens. In SPIRAL, SpeechPrune achieves accuracy improvements of 29% and up to 47% over the original model and the random pruning model at a pruning rate of 20%, respectively. SpeechPrune can maintain network performance even at a pruning level of 80%. This highlights the potential of token-level pruning for efficient and scalable long-form speech understanding.
SoftAct: A High-Precision Softmax Architecture for Transformers Supporting Nonlinear Functions
Yuzhe Fu, C. Zhou, T. Huang, E. Han, Y. He, Hailong Jiao.
2024, IEEE Transactions on Circuits and Systems for Video Technology [pdf]
Abstract
Transformer-based deep learning networks are revolutionizing our society. The convolution and attention co-designed (CAC) Transformers have demonstrated superior performance compared to the conventional Transformer-based networks. However, CAC Transformer networks contain various nonlinear functions, such as softmax and complex activation functions, which require high precision hardware design yet typically with significant cost in area and power consumption. To address these challenges, SoftAct, a compact and high-precision algorithm-hardware co-designed architecture, is proposed to implement both softmax and nonlinear activation functions in CAC Transformer accelerators. An improved softmax algorithm with penalties is proposed to maintain precision in hardware. A stage-wise full zero detection method is developed to skip redundant computation in softmax. A compact and reconfigurable architecture with a symmetrically designed linear fitting module is proposed to achieve nonlinear functions. The SoftAct architecture is designed in an industrial 28-nm CMOS technology with the MobileViT-xxs network as the benchmark. Compared with the state of the art, SoftAct improves up to 5.87% network accuracy, 153.2× area efficiency, and 1435× overall efficiency.
Adjustable Multi-Stream Block-Wise Farthest Point Sampling Acceleration in Point Cloud Analysis
Changchun Zhou#, Yuzhe Fu#, Y. Ma, E. Han, Y. He, Hailong Jiao.
2024, IEEE Transactions on Circuits and Systems II: Express Briefs (# with equal contribution) [pdf]
Abstract
Point cloud is increasingly used in a variety of applications. Farthest Point Sampling (FPS) is typically employed for down-sampling to reduce the size of point cloud and enhance the representational capability by preserving contour points in point cloud analysis. However, due to low parallelism and high computational complexity, high energy consumption and long latency are caused, which becomes a bottleneck of hardware acceleration. In this brief, we propose an adjustable multi-stream block-wise FPS algorithm, adjusted by four configurable parameters, according to hardware and accuracy requirements. A unified hardware architecture with one parameter is designed to implement the adjustable multi-stream block-wise FPS algorithm. Furthermore, we present a rapid searching algorithm to select the optimal configuration of the five parameters. Designed in an industrial 28-nm CMOS technology, the proposed hardware architecture achieves a latency of 0.005 (1.401) ms and a frame energy consumption of 0.09 (27.265) µJ/frame for 1 k (24 k) input points at 200 MHz and 0.9 V supply voltage. Compared to the state of the art, the proposed hardware architecture reduces the latency by up to 99.9%, saves the energy consumption by up to 99.5%, and improves the network accuracy by up to 9.34%.📝 Collaborative Publications

GenAI at the Edge: Comprehensive Survey on Empowering Edge Devices
M. Navardi, R. Aalishah, Yuzhe Fu, Y. Lin, Hai Li, Yiran Chen and Tinoosh Mohsenin
2025, AAAI Spring Symposium Series (AAAI SSS) (Accepted)

C. Zhou, T. Huang, Y. Ma, Yuzhe Fu, S. Qiu, X. Song, J. Sun, M. Liu, Y. Yang, G. Li, Y. He, Hailong Jiao.
2025, International Solid-State Circuits Conference (ISSCC)
Abstract
Three-dimensional (3D) point clouds are increasingly deployed across various emerging fields, such as autonomous driving, robots, drones, and virtual reality (VR) [1]–[6]. Point-based point-cloud neural networks (PNNs) [3]–[6] have demonstrated superior performance in point-cloud analysis, compared to both sparse 3D convolution-based networks [7], [8] and graph-based convolutional neural networks [9], [10]. Due to the high computational complexity, low parallelism, and frequent irregular external memory accesses, deploying PNNs in hardware is a great challenge. PNN hardware accelerators have been developed [11]–[20]. However, three key challenges remain unsolved in these accelerators, as illustrated in Fig. 23.4.1. 1) The inherent farthest point sampling (FPS) features serial computation and suffers from quadratic growth in inference latency with rising point counts. The existing uniform block-wise FPS techniques [13], [21] fail to achieve a well-balanced block segmentation, due to a typically non-uniform point distribution. 2) A large amount of redundant operations exist for both discarded points (DPs) and retained points (RPs) in FPS. These operations exist in the sampling operations of RPs ① as well as grouping ② convolution ③, and aggregation ④ for DPs, introducing unnecessary energy and latency costs. 3) The irregular memory accesses in the aggregation operation cause significant latency penalties. Channel-wise aggregation in [11] relieves irregularity, yet is unsuitable for large-scale point clouds, as the external memory access of features and the neighbor index table (NIT) is quadratically increased due to the iterative loading of features or the NIT.
C. Zhou, Yuzhe Fu, M. Liu, S. Qiu, G. Li, Y, He, Hailong Jiao.
2023, IEEE/ACM International Conference On Computer Aided Design [pdf][YouTube]
Abstract
Three-dimensional (3D) point cloud has been employed in a wide range of applications recently. As a powerful weapon for point cloud analysis, point-based point cloud neural networks (PNNs) have demonstrated superior performance with less computation complexity and parameters, compared to sparse 3D convolution-based networks and graph-based convolutional neural networks. However, point-based PNNs still suffer from high computational redundancy, large off-chip memory access, and low parallelism in hardware implementation, thereby hindering the applications on edge devices. In this paper, to address these challenges, an energy-efficient 3D point cloud neural network accelerator is proposed for on-chip edge computing. An efficient filter pruning scheme is used to skip the redundant convolution of pruned filters and zero-value feature channels. A block-wise multi-layer perceptron (MLP) fusion method is proposed to increase the on-chip reuse of features, thereby reducing off-chip memory access. A dual-stream blocking technique is proposed for higher parallelism while maintaining inference accuracy. Implemented in an industrial 28-nm CMOS technology, the proposed accelerator achieves an effective energy efficiency of 12.65 TOPS/W and 0.13 mJ/frame energy consumption for PointNeXt-S at 100 MHz, 0.9 V supply voltage, and 8-bit data width. Compared to the state-of-the-art point cloud neural network accelerators, the proposed accelerator enhances the energy efficiency by up to 66.6× and reduces the energy consumption per frame by up to 70.2×.
Sagitta: An Energy-Efficient Sparse 3D-CNN Accelerator for Real-Time 3D Understanding
C. Zhou, M. Liu, S. Qiu, X. Cao, Yuzhe Fu, Y. He, Hailong Jiao.
2023, IEEE Internet of Things Journal [pdf]

P. Dong, Z. Chen, Z. Li, Yuzhe Fu, L. Chen, Fengwei An.
2021, IEEE Transactions on Circuits and Systems I: Regular Papers [pdf]
📃 Patent
- A high-precision approximate calculation device for softmax function, 2024, CN Patent.
- Low-power-consumption stereo matching system and method for acquiring depth information, 2020, CN Patent, CN112070821A / WO2022021912A1
💻 Reviewer
- TCSVT-IEEE Transactions on Circuits and Systems for Video Technology (2024)
- ELL-Electronics Letters (2022, 2025)
- AICAS-IEEE International Conference on Artificial Intelligence Circuits and Systems (2023, 2025)
- ICME-IEEE International Conference on Multimedia & Expo (2025)
🍀 Tape Out
- An energy-efficient pipelined and configurable 3D point cloud-based neural network accelerator is being designed in TSMC 28-nm HPC technology with an area of 2.0 mm×1.5 mm and is taped out in July 2023.
- A 4.5 TOPS/W sparse 3D-CNN accelerator for real-time 3D understanding was fabricated in UMC 55-nm low-power CMOS technology with an area of 4.2 mm×3.6 mm in August 2020.
🎖 Honors and Awards
- 2021 Excellent Graduate Award, Southern University of Science and Technology
- 2021 Best Presentation Award in IEEE CASS Shanghai and Shenzhen Joint Workshop
- 2020 National Scholarship, Ministry of Education of the PRC (The highest scholarship for Chinese undergraduates)
- 2018, 2019 The First Prize of Outstanding Students in SUSTech (Top 5% in SUSTech)
About Me:
- I am a easy-going and spirited individual with a passion for life. My enthusiasm not only drives my own life but also positively influences those around me.
- Interests and Hobbies: fitness, jogging, swimming, photography, traveling. (Here is a short 📸 video about my graduation trip in 2021.)