|国家科技期刊平台
首页|期刊导航|计算机工程与科学|一种不规则稀疏矩阵的SpMV方法

一种不规则稀疏矩阵的SpMV方法OA北大核心CSTPCD

An irregular sparse matrix SpMV method

中文摘要英文摘要

稀疏矩阵-向量乘法SpMV是高性能计算领域的关键算子之一,在新兴的深度学习领域中有着重要应用.现有SpMV算子通常采用行列相等的稀疏矩阵,而对于不规则形状稀疏矩阵(行数与列数不等)的研究仍存在空缺,值得进一步深入探讨.相比于行列相等的稀疏矩阵,不规则形状稀疏矩阵凭借其行数与列数不对等的稀疏特点具有进一步优化的空间.因此,针对这种行数与列数不对等的不规则形状稀疏矩阵建立SpMV性能模型,分析得到其出现性能瓶颈的原因在于缓存和内存之间数据交互的带宽不足.同时做了以下2个方面的优化工作:(1)基于常用稀疏矩阵CSR存储格式,提出新型RCSR存储格式,其针对CSR存储格式中一个制约性能的数组进行了变换和压缩,使得SpMV更加高效;(2)结合国产处理器的SIMD指令扩展设计了基于RCSR格式的SpMV优化算法.在国产飞腾处理器上分别使用规则和不规则稀疏矩阵进行测试,在规则稀疏矩阵的情况下,通过采用RCSR存储格式和SIMD加速指令集,以GFLOPS为性能指标,实现了平均83.35%的性能提升;在不规则稀疏矩阵的情况下,性能提升与行列比相关,在行列不对等加剧时,具有更为明显的优化效果.

Sparse matrix-vector multiplication(SpMV)is one of the key operators in the field of high performance computing and also has significant applications in emerging deep learning domains.Existing research on SpMV often focuses on square sparse matrices,while there is still a lack of in-depth explora-tion for irregularly shaped sparse matrices(with unequal numbers of rows and columns).The character-istic of unequal numbers of rows and columns results in different storage features for these sparse matri-ces compared to square sparse matrices,leaving room for further optimization.Therefore,this paper establishes an SpMV performance model for irregularly shaped sparse matrices with unequal rows and columns,and analyzes that the performance bottleneck is caused by insufficient bandwidth for data exchange between cache and memory.At the same time,this paper carried out the following two opti-mization tasks:(1)Based on the commonly used CSR storage format for sparse matrices,a new RCSR storage format is proposed,which transforms and compresses a performance-limiting array in the CSR storage format,making SpMV more efficient;(2)An optimized SpMV algorithm based on the RCSR format is designed in conjunction with the SIMD instruction set extension of domestic processors.This paper tests regular and irregular sparse matrices on domestic Phytium processors.For regular sparse matrices,the comprehensive application of RCSR storage format,SIMD instructions,and OpenMP par-allelization technology increases GFLOPS by 83.35%on average.For irregular sparse matrices,the performance improvement is related to the row-to-column ratio,and when the row-to-column ratio is not equal,the optimization effect is more obvious.

施禹;董攀;张利军

国防科技大学计算机学院,湖南 长沙 410073中国人民解放军32228部队24分队,福建 福州 350101

计算机与自动化

稀疏矩阵不规则矩阵向量乘法多核性能性能优化

sparse matrixirregular matrixvector multiplicationmulticore performanceperform-ance optimization

《计算机工程与科学》 2024 (007)

1175-1184 / 10

国防科技重点实验室稳定支持基金(WDZC20235250111);国家自然科学基金(62002371);国防科技大学基金(ZK21-17)

10.3969/j.issn.1007-130X.2024.07.005

评论