电子与封装2025,Vol.25Issue(9):48-55,8.DOI:10.16257/j.cnki.1681-1070.2025.0103
高效率LSTM硬件加速器设计与实现
Design and Implementation of High Efficiency LSTM Hardware Accelerator
摘要
Abstract
Compared to traditional recurrent neural networks(RNNs),long short-term memory(LSTM)networks increase multiple gating units and memory cells,effectively addressing the issues of gradient vanishing and gradient explosion encountered by traditional RNNs.Due to the advantage in handling complex sequential dependencies,LSTM networks have been widely applied in natural language processing(NLP)tasks such as machine translation,sentiment analysis,and text classification.With the increasing complexity of intelligent applications and the number of layers and hidden layer nodes in LSTM networks,the requirements for storage capacity,memory access bandwidth,and processing performance of end side processing devices have also dramatically increased.The characteristics of the LSTM algorithm are analyzed and a highly parallel pipeline computation unit is designed.A multi-level shared data path method is proposed,and optimization and control of the hardware implementation process of the LSTM algorithm are carried out.A hardware accelerator for LSTM is designed,achieving a peak computing power of 2.144 TOPS.The accelerator is physically implemented based on fin field-effect transistor technology.Chip-level test results after tape-out demonstrate that the LSTM hardware accelerator achieves an operational efficiency exceeding 95%,with processing performance per TOPS reaching more than 2.8 times that of the NVIDIA GTX 1080 Ti GPU.关键词
长短期记忆网络/并行流水/硬件加速/运算簇Key words
long short-term memory network/parallel pipeline/hardware acceleration/computational cluster分类
信息技术与安全科学引用本文复制引用
陈铠,贺傍,滕紫珩,傅玉祥,李世平..高效率LSTM硬件加速器设计与实现[J].电子与封装,2025,25(9):48-55,8.基金项目
国家自然科学基金企业创新发展联合基金(U21B2032) (U21B2032)