集成电路与嵌入式系统2025,Vol.25Issue(6):1-13,13.DOI:10.20193/j.ices2097-4191.2025.0023
大模型FPGA推理实现技术综述与未来挑战
A survey on hardware accelerator for large model inference on FPGA
摘要
Abstract
In recent years,with the widespread application of large models(such as GPT,LLaMA,DeepSeek,etc.),the computing power requirements and energy efficiency issues in the reasoning stage have become increasingly prominent.Although traditional GPU solutions can provide high throughput,they face challenges in power consumption,real-time performance and cost.FPGAs have become an important alternative for large model reasoning deployment with their customizable architecture,low latency determinism and high energy efficiency.This paper systematically reviews the network structure of large models and the reasoning implementation technology of large models on FPGAs,covering three major directions:hardware architecture adaptation,algorithm-hardware co-optimization and system-level challenges.At the hardware level,the focus is on the design of computing units and storage level optimization strategies;at the algorithm level,key technologies such as model compression,dynamic quantization and compiler optimization are analyzed.At the system level,challenges such as multi-FPGA expansion,thermal management and emerging storage-computing integrated architectures are discussed.In addition,this paper summarizes the limitations of the current FPGA reasoning ecosystem(such as insufficient tool chain maturity)and looks forward to future trends,including chiplet heterogeneous integration,photonic computing fusion and the es-tablishment of a standardized evaluation system.The research results show that the architectural flexibility of FPGA gives it a unique advantage in the field of efficient reasoning of large models,but interdisciplinary collaboration is still needed to promote the implementa-tion of the technology.关键词
大语言模型/硬件加速/FPGA/存算一体架构/TransformerKey words
large language models/hardware accelerator/FPGA/integrated storage and computing architecture/Transformer分类
电子信息工程引用本文复制引用
黄思晓,彭皓翔,施旭,苏志锋,黄明强,余浩..大模型FPGA推理实现技术综述与未来挑战[J].集成电路与嵌入式系统,2025,25(6):1-13,13.基金项目
国家重点研发计划(2021YFE0204000) (2021YFE0204000)
深圳市科创委—2021深圳市高层次人才孔雀团队项目(KQTD20200820113051096) (KQTD20200820113051096)
深圳市科创委—基础研究重点项目(JCYJ20220818100217038) (JCYJ20220818100217038)