计算机工程与科学2012,Vol.34Issue(7):78-83,6.DOI:10.3969/j.issn.1007-130X.2012.07.014
基于CUDA编程模型的稀疏对角矩阵向量乘优化
Optimization of Sparse Diagonal Matrix-Vector Multiplication Based on the CUDA Program Model
摘要
Abstract
Sparse matrix-vector multiplication is often an important computational kernel in many scientific applications. This paper faces the n-diagonal sparse matrix, uses the CUDA program model and describes a new compress format of sparse matrix based on the DIA compress format (CDIA), and gives each thread fine-grained task distribution. In order to fulfill the characteristics of the align access of memory in CUDA, we transpose the compress matrix and design a fine-grained algorithm and program and do some optimization to the program. In the data experiment, our best implementation achieves up to 39. 6Gflop/s in single-precision and 19. 6Gflop/s in double-precision, and enhances the performance by about 7. 6% and 17. 4% that of Nathan Bell's and Michael Garland's respectively.关键词
GPU/CDIA/CUDA/稀疏矩阵向量乘Key words
GPU/CDIA/CUDA/sparse matrix-vector multiplication分类
信息技术与安全科学引用本文复制引用
秦晋,龚春叶,胡庆丰,刘杰..基于CUDA编程模型的稀疏对角矩阵向量乘优化[J].计算机工程与科学,2012,34(7):78-83,6.基金项目
国家自然科学基金资助项目(60673150,60970033) (60673150,60970033)
国家863计划资助项目(2008AA01Z137) (2008AA01Z137)