| 注册
首页|期刊导航|高技术通讯|面向多模型工作负载的弹性计算加速器架构研究

面向多模型工作负载的弹性计算加速器架构研究

张军 王兴宾 苏玉兰

高技术通讯2025,Vol.35Issue(7):698-710,13.
高技术通讯2025,Vol.35Issue(7):698-710,13.DOI:10.3772/j.issn.1002-0470.2025.07.003

面向多模型工作负载的弹性计算加速器架构研究

An elastic computing accelerator architecture for multi-model workloads

张军 1王兴宾 2苏玉兰2

作者信息

  • 1. 湖北文理学院智慧交通研究院 襄阳 441053
  • 2. 中国科学院信息工程研究所 北京 100093
  • 折叠

摘要

Abstract

When multi-model workloads are deployed on the current deep neural networks(DNN)accelerator,the quality of service of them is degraded.To tackle this problem,this paper proposes a new accelerator architecture EnsBoost-er,which can provide a cost-effective parallel execution mode for the efficient reasoning for the integrated model.Firstly,the elastic systolic array is designed,and the larger systolic array is divided into several smaller systolic sub-arrays to meet the flexibility and scalability requirements of the parallel implementation of the integrated model.Secondly,a spatial-temporal reuse resource allocation strategy is proposed,which can make full use of spatial-tempo-ral sharing to improve the efficiency of the underlying computing resources.Finally,a hierarchical scheduling mech-anism is proposed:at the coarse-grained level,early exit scheduling is used to reduce the computational burden of integrated model reasoning;at the fine-grained level,the preemptive scheduling mechanism is used to preempt idle computing resources by using the complementarity and data locality of the integration model to maximize the utiliza-tion of hardware resources and bandwidth.The evaluation using a set of different workload benchmarks shows that the throughput and energy efficiency of EnsBooster are significantly improved.

关键词

深度神经网络加速器/集成学习/多模型工作负载/弹性计算/脉动阵列/抢占调度

Key words

deep neural network(DNN)accelerator/ensemble learning/multi-model workloads/elastic com-puting/systolic array/preemptive scheduling

引用本文复制引用

张军,王兴宾,苏玉兰..面向多模型工作负载的弹性计算加速器架构研究[J].高技术通讯,2025,35(7):698-710,13.

基金项目

湖北省自然科学基金面上项目(2022CFB325)和国家自然科学基金面上项目(62272459)资助. (2022CFB325)

高技术通讯

OA北大核心

1002-0470

访问量0
|
下载量0
段落导航相关论文