| 注册
首页|期刊导航|计算机工程与科学|NM-SpMM:面向国产异构向量处理器的半结构化稀疏矩阵乘算法

NM-SpMM:面向国产异构向量处理器的半结构化稀疏矩阵乘算法

姜晶菲 何源宏 许金伟 许诗瑶 钱希福

计算机工程与科学2024,Vol.46Issue(7):1141-1150,10.
计算机工程与科学2024,Vol.46Issue(7):1141-1150,10.DOI:10.3969/j.issn.1007-130X.2024.07.001

NM-SpMM:面向国产异构向量处理器的半结构化稀疏矩阵乘算法

NM-SpMM:A semi-structured sparse matrix multiplication algorithm for domestic heterogeneous vector processors

姜晶菲 1何源宏 1许金伟 1许诗瑶 1钱希福1

作者信息

  • 1. 国防科技大学计算机学院并行与分布计算全国重点实验室,湖南 长沙 410073
  • 折叠

摘要

Abstract

Deep neural networks have achieved excellent results in natural language processing,com-puter vision and other fields.Due to the growth of the scale of data processed by intelligent applications and the rapid development of large models,the inference performance of deep neural networks is in-creasingly demanding.N∶M semi-structured sparse scheme has become one of the hot technologies to balance the computing power demand and application effect.The domestic heterogeneous vector proces-sor FT-M7032 provides more space for data parallelism and instruction parallelism development in intel-ligent model processing.In order to address the challenges of N∶M semi-structured sparse model com-putation with various sparse patterns,a flexible configurable sparse matrix multiplication algorithm NM-SpMM is proposed for FT-M7032.NM-SpMM designs an efficient compressed offset address sparse encoding format COA,which avoids the impact of semi-structured parameter configuration on sparse da-ta access.Based on the COA,NM-SpMM performs fine-grained optimization of sparse matrix multipli-cation in different dimensions.The experimental results on FT-M7032 single core show that NM-SpMM can obtain 1.73~21.00 times speedup compared to dense matrix multiplication,and 0.04~1.04 times speedup compared to NVIDIA V100 GPU with CuSPARSE.

关键词

深度神经网络/图形处理器/向量处理器/稀疏矩阵乘/流水线

Key words

deep neural network/graphics processing unit/vector processor/sparse matrix multiplica-tion/pipeline

分类

信息技术与安全科学

引用本文复制引用

姜晶菲,何源宏,许金伟,许诗瑶,钱希福..NM-SpMM:面向国产异构向量处理器的半结构化稀疏矩阵乘算法[J].计算机工程与科学,2024,46(7):1141-1150,10.

计算机工程与科学

OA北大核心CSTPCD

1007-130X

访问量0
|
下载量0
段落导航相关论文