高技术通讯2025,Vol.35Issue(7):698-710,13.DOI:10.3772/j.issn.1002-0470.2025.07.003
面向多模型工作负载的弹性计算加速器架构研究
An elastic computing accelerator architecture for multi-model workloads
摘要
Abstract
When multi-model workloads are deployed on the current deep neural networks(DNN)accelerator,the quality of service of them is degraded.To tackle this problem,this paper proposes a new accelerator architecture EnsBoost-er,which can provide a cost-effective parallel execution mode for the efficient reasoning for the integrated model.Firstly,the elastic systolic array is designed,and the larger systolic array is divided into several smaller systolic sub-arrays to meet the flexibility and scalability requirements of the parallel implementation of the integrated model.Secondly,a spatial-temporal reuse resource allocation strategy is proposed,which can make full use of spatial-tempo-ral sharing to improve the efficiency of the underlying computing resources.Finally,a hierarchical scheduling mech-anism is proposed:at the coarse-grained level,early exit scheduling is used to reduce the computational burden of integrated model reasoning;at the fine-grained level,the preemptive scheduling mechanism is used to preempt idle computing resources by using the complementarity and data locality of the integration model to maximize the utiliza-tion of hardware resources and bandwidth.The evaluation using a set of different workload benchmarks shows that the throughput and energy efficiency of EnsBooster are significantly improved.关键词
深度神经网络加速器/集成学习/多模型工作负载/弹性计算/脉动阵列/抢占调度Key words
deep neural network(DNN)accelerator/ensemble learning/multi-model workloads/elastic com-puting/systolic array/preemptive scheduling引用本文复制引用
张军,王兴宾,苏玉兰..面向多模型工作负载的弹性计算加速器架构研究[J].高技术通讯,2025,35(7):698-710,13.基金项目
湖北省自然科学基金面上项目(2022CFB325)和国家自然科学基金面上项目(62272459)资助. (2022CFB325)