集成电路与嵌入式系统2026,Vol.26Issue(3):81-89,9.DOI:10.20193/j.ices2097-4191.2025.0109
一种多精度可重构张量计算单元的设计
Design of multi-precision reconfigurable tensor computing unit
摘要
Abstract
With the rapid development of artificial intelligence and deep learning applications,tensor computing urgently demands high-efficiency and multi-precision computing hardware accelerators.The traditional general-purpose processors face energy efficiency bottle-necks when processing large-scale matrix multiplication operations,while existing dedicated accelerators often lack flexibility in support-ing diverse data precision and hybrid computing modes.This paper presents a multi-precision and mixed-precision tensor processing unit(TPU),designed based on a reconfigurable architecture,which supports five data formats(INT4,INT8,FP16,BF16,FP32)and two hybrid modes(FP16+FP32,BF16+FP32).It is capable of efficiently performing matrix multiplication and accumulation across three different dimensions(m16n16k16,m32n8k16,m8n32k16).By incorporating a reconfigurable computing array,dynamic data flow con-trol,multi-mode buffer design,and a unified floating-point processing unit,the design achieves high hardware reuse and significantly improved computational efficiency.Synthesized on the VCU118 FPGA platform at 251.13 MHz,it delivers a peak theoretical perform-ance of 257.16 GOPS/GFLOPS(INT4/INT8/FP16/BF16)and 64.29 GFLOPS(FP32).This design is well-suited for applications such as deep learning inference,autonomous driving,and medical imaging,where both computational efficiency and flexibility are critical.关键词
张量处理单元/多精度计算/可重构架构/矩阵乘法/硬件复用Key words
tensor processing unit/multi-precision computation/reconfigurable architecture/matrix multiplication/hardware reutilization分类
信息技术与安全科学引用本文复制引用
胡湘宏,梁克龙,尹飞跃,冯兆樟,林元妙,蔡述庭,熊晓明..一种多精度可重构张量计算单元的设计[J].集成电路与嵌入式系统,2026,26(3):81-89,9.基金项目
国家自然科学基金(62301165) (62301165)
广州市科技计划项目(2023B01J0007) (2023B01J0007)