首页|期刊导航|计算机工程与科学|面向多核向量加速器的卷积神经网络推理和训练向量化方法

面向多核向量加速器的卷积神经网络推理和训练向量化方法OA北大核心CSTPCD

Convolutional neural network inference and training vectorization method for multicore vector accelerators

中文摘要

英文摘要

随着以卷积神经网络为代表的深度学习得到广泛应用,神经网络模型中的计算量也急速增长,推动了深度学习加速器的发展.如何针对加速器硬件的体系结构特性进行加速和优化神经网络模型的性能成为研究热点.针对自主设计的多核向量加速器 FT-M7004 上的 VGG 网络模型推理和训练算法,分别提出了卷积、池化和全连接等核心算子的向量化映射方法,采用 SIMD向量化、DMA 双缓冲传输和权值共享等优化策略,充分发挥了向量加速器的体系结构优势,取得了较高的计算效率.实验结果表明,在FT-M7004 平台上,卷积层推理和训练的平均计算效率分别达到了 86.62%和 69.63%;全连接层推理和训练的平均计算效率分别达到了 93.17%和 81.98%;VGG网络模型在 FT-M7004 上的推理计算效率超过GPU平台 20%以上.

With the widespread application of deep learning,represented by convolutional neural net-works(CNNs),the computational requirements of neural network models have increased rapidly,driv-ing the development of deep learning accelerators.The research focus has shifted to how to accelerate and optimize the performance of neural network models based on the architectural characteristics of ac-celerators.For the VGG network model inference and training algorithms on the independently designed multi core vector accelerator FT-M7004,vectorized mapping methods for core operators such as convo-lution,pooling,and fully connected layers are proposed.Optimization strategies,including SIMD vec-torization,DMA double-buffered transfer,and weight sharing,are employed to fully exploit the archi-tectural advantages of the vector accelerator,achieving high computational efficiency.Experimental re-sults indicate that on the FT-M7004 platform,the average computational efficiency for convolution layer inference and training is 86.62%and 69.63%,respectively;for fully connected layer inference and training,the average computational efficiency reaches 93.17%and 81.98%,respectively.The inference computational efficiency of the VGG network model on FT-M7004 exceeds that on the GPU platform by over 20%.

作者：陈杰;李程;刘仲

作者单位：国防科技大学计算机学院,湖南长沙 410073

分类：计算机与自动化

中文关键词：多核向量加速器卷积神经网络推理算法训练算法

英文关键词：multicore vector acceleratorconvolutional neural networkinference algorithmtraining algorithm

刊名：《计算机工程与科学》 2024 (004)

页码/页数：580-589 / 10

基金： 并行与分布处理国家重点实验室基金(2021-KJWPDL-11)

DOI：10.3969/j.issn.1007-130X.2024.04.002

面向多核向量加速器的卷积神经网络推理和训练向量化方法OA北大核心CSTPCD

Convolutional neural network inference and training vectorization method for multicore vector accelerators

评论