国防科技大学学报2024,Vol.46Issue(1):103-112,10.DOI:10.11887/j.cn.202401011
多核数字信号处理卷积算法并行优化
Parallel optimization of convolution algorithm on multi-core DSP
摘要
Abstract
According to the characteristics of the heterogeneous multi-core DSP(digital signal processing)chip independently developed by National University of Defense Technology and the characteristics of the convolution algorithm,a high-performance multi-core parallel convolution implementation scheme for multi-core DSP architecture was proposed.A feature graph level multi-core parallel scheme is proposed for 1×1 convolution.For convolutions with kernels larger than 1,a window level multi-core parallel optimization design was proposed,and an element-wise vectorization based intra-core parallel optimization implementation was proposed.The experimental results show that the proposed parallel optimization method can reach a maximum single core computing efficiency of 64.95%.When the bandwidth is limited,the parallel expansion efficiency of multi-core can still reach 48.36%~88.52%.Compared with E5-2640 CPU,the execution performance on the typical network ResNet50 achieves 5.39x performance acceleration.关键词
多核DSP/卷积神经网络/卷积算法/并行优化Key words
multi-core DSP/CNNs/convolutional algorithms/parallel optimization分类
信息技术与安全科学引用本文复制引用
许金伟,王庆林,李娅琳,姜晶菲,高蕾,李荣春,李东升..多核数字信号处理卷积算法并行优化[J].国防科技大学学报,2024,46(1):103-112,10.基金项目
国家自然科学基金资助项目(61732018) (61732018)