I am currently a Master student in the School of Electronic and Computer Engineering of Peking University, VLSI Lab under the supervision of Prof. Hailong Jiao. I obtained my BEng degree in Microelectronic Science and Engineering from Southern University of Science and Technology in 2021, advised by Prof. Fengwei An. For now, my research realm focuses on algorithm-hardware co-design.
E-mail: fuyz@stu.pku.edu.cn

๐Ÿ“– Educations

  • 2021.09 - Present, Master in School of Electronic and Computer Engineering, Peking University.
  • 2019.08 - 2019.12, Global Access Program, University of California, Berkeley.
  • 2017.09 - 2021.06, BEng in School of Microelectronics, Southern University of Science and Technology.

๐ŸŽ– Honors and Awards

  • 2021 Excellent Graduate Award, Southern University of Science and Technology
  • 2021 Best Presentation Award in IEEE CASS Shanghai and Shenzhen Joint Workshop
  • 2020 National Scholarship, Ministry of Education of the PRC (The highest scholarship for Chinese undergraduates)
  • 2020 Shenzhen Longsys Electronics Company Award (Top 2% in School of Microelectronics)
  • 2019 The First Prize of Outstanding Students in SUSTech (Top 5% in SUSTech)
  • 2018 The First Prize of Outstanding Students in SUSTech (Top 5% in SUSTech)

๐Ÿ”ฅ News

  • 2024.04: ย ๐ŸŽ‰๐ŸŽ‰ Our paper โ€œSoftAct: A High-Precision Softmax Architecture for Transformers with Nonlinear Functions Supportโ€ has been accepted by IEEE TCSVT!
  • 2024.02: ย ๐ŸŽ‰๐ŸŽ‰ Our paper โ€œAdjustable Multi-Stream Block-Wise Farthest Point Sampling Acceleration in Point Cloud Analysisโ€ has been accepted by IEEE TCAS-II!
  • 2023.10: ย ๐ŸŽ‰๐ŸŽ‰ Yuzhe Fu presents the work on 3D Point Cloud Neural Network Accelerator at IEEE/ACM ICCAD 2023!
  • 2023.08: ย ๐ŸŽ‰๐ŸŽ‰ Our paper โ€œSagitta: An Energy-Efficient Sparse 3D-CNN Accelerator for Real-Time 3D Understandingโ€ has been accepted by IEEE IOTJ!
  • 2023.07: ย ๐ŸŽ‰๐ŸŽ‰ Our paper โ€œAn Energy-Efficient 3D Point Cloud Neural Network Accelerator with Efficient Filter Pruning, MLP Fusion, and Dual-Stream Samplingโ€ has been accepted by IEEE/ACM ICCAD 2023!

๐Ÿ“ Publications

TCSVT
sym

SoftAct: A High-Precision Softmax Architecture for Transformers Supporting Nonlinear Functions

Yuzhe Fu, Changchun Zhou, Tianling Huang, Eryi Han, Yifan He, Hailong Jiao.

2024, IEEE Transactions on Circuits and Systems for Video Technology [pdf]

Abstract Transformer-based deep learning networks are revolutionizing our society. The convolution and attention co-designed (CAC) Transformers have demonstrated superior performance compared to the conventional Transformer-based networks. However, CAC Transformer networks contain various nonlinear functions, such as softmax and complex activation functions, which require high precision hardware design yet typically with significant cost in area and power consumption. To address these challenges, SoftAct, a compact and high-precision algorithm-hardware co-designed architecture, is proposed to implement both softmax and nonlinear activation functions in CAC Transformer accelerators. An improved softmax algorithm with penalties is proposed to maintain precision in hardware. A stage-wise full zero detection method is developed to skip redundant computation in softmax. A compact and reconfigurable architecture with a symmetrically designed linear fitting module is proposed to achieve nonlinear functions. The SoftAct architecture is designed in an industrial 28-nm CMOS technology with the MobileViT-xxs network as the benchmark. Compared with the state of the art, SoftAct improves up to 5.87% network accuracy, 153.2ร— area efficiency, and 1435ร— overall efficiency.
TCAS-II
sym

Adjustable Multi-Stream Block-Wise Farthest Point Sampling Acceleration in Point Cloud Analysis

Changchun Zhou#, Yuzhe Fu#, Yanzhe Ma, Eryi Han, Yifan He, Hailong Jiao.

2024, IEEE Transactions on Circuits and Systems II: Express Briefs (# with equal contribution) [pdf]

Abstract Point cloud is increasingly used in a variety of applications. Farthest Point Sampling (FPS) is typically employed for down-sampling to reduce the size of point cloud and enhance the representational capability by preserving contour points in point cloud analysis. However, due to low parallelism and high computational complexity, high energy consumption and long latency are caused, which becomes a bottleneck of hardware acceleration. In this brief, we propose an adjustable multi-stream block-wise FPS algorithm, adjusted by four configurable parameters, according to hardware and accuracy requirements. A unified hardware architecture with one parameter is designed to implement the adjustable multi-stream block-wise FPS algorithm. Furthermore, we present a rapid searching algorithm to select the optimal configuration of the five parameters. Designed in an industrial 28-nm CMOS technology, the proposed hardware architecture achieves a latency of 0.005 (1.401) ms and a frame energy consumption of 0.09 (27.265) ยตJ/frame for 1 k (24 k) input points at 200 MHz and 0.9 V supply voltage. Compared to the state of the art, the proposed hardware architecture reduces the latency by up to 99.9%, saves the energy consumption by up to 99.5%, and improves the network accuracy by up to 9.34%.
ICCAD
sym

An Energy-Efficient 3D Point Cloud Neural Network Accelerator with Efficient Filter Pruning, MLP Fusion, and Dual-Stream Sampling

Changchun Zhou, Yuzhe Fu, Min Liu, Siyuan Qiu, Ge Li, Yifan He, Hailong Jiao.

2023, IEEE/ACM International Conference On Computer Aided Design [pdf][YouTube]

Abstract Three-dimensional (3D) point cloud has been employed in a wide range of applications recently. As a powerful weapon for point cloud analysis, point-based point cloud neural networks (PNNs) have demonstrated superior performance with less computation complexity and parameters, compared to sparse 3D convolution-based networks and graph-based convolutional neural networks. However, point-based PNNs still suffer from high computational redundancy, large off-chip memory access, and low parallelism in hardware implementation, thereby hindering the applications on edge devices. In this paper, to address these challenges, an energy-efficient 3D point cloud neural network accelerator is proposed for on-chip edge computing. An efficient filter pruning scheme is used to skip the redundant convolution of pruned filters and zero-value feature channels. A block-wise multi-layer perceptron (MLP) fusion method is proposed to increase the on-chip reuse of features, thereby reducing off-chip memory access. A dual-stream blocking technique is proposed for higher parallelism while maintaining inference accuracy. Implemented in an industrial 28-nm CMOS technology, the proposed accelerator achieves an effective energy efficiency of 12.65 TOPS/W and 0.13 mJ/frame energy consumption for PointNeXt-S at 100 MHz, 0.9 V supply voltage, and 8-bit data width. Compared to the state-of-the-art point cloud neural network accelerators, the proposed accelerator enhances the energy efficiency by up to 66.6ร— and reduces the energy consumption per frame by up to 70.2ร—.
IOTJ
sym

Sagitta: An Energy-Efficient Sparse 3D-CNN Accelerator for Real-Time 3D Understanding

Changchun Zhou, Min Liu, Siyuan Qiu, Xugang Cao, Yuzhe Fu, Yifan He, Hailong Jiao.

2023, IEEE Internet of Things Journal [pdf]

TCAS-I
sym

A 4.29 nJ/pixel stereo depth coprocessor with pixel level pipeline and region optimized semi-global matching for IoT application

Pingcheng Dong, Zhuoyu Chen, Zhuoao Li, Yuzhe Fu, Lei Chen, Fengwei An.

2021, IEEE Transactions on Circuits and Systems I: Regular Papers [pdf]

๐Ÿ“ƒ Patent

๐Ÿš€ Skills

  • Familiar with Verilog HDL, PyTorch, Intel Distiller (Model Compression), Cadence (Genus and NCSim), Vivado.
  • Knowledgeable in Python, JAVA, MATLAB, Shell, Makefile.

0D4D876E About Me:

  • I am a easy-going and spirited individual with a passion for life. My enthusiasm not only drives my own life but also positively influences those around me.
  • Interests and Hobbies: fitness, jogging, swimming, photography, traveling. (Here is a short ๐Ÿ“ธ video about my graduation trip in 2021.)