| 注册
首页|期刊导航|集成电路与嵌入式系统|面向稀疏矩阵向量乘法的GPU性能建模和算法优化

面向稀疏矩阵向量乘法的GPU性能建模和算法优化

MA Chengyu LI Suolan LIU Yinuo ZHAO Wenzhe REN Pengju XIA Tian

集成电路与嵌入式系统2026,Vol.26Issue(1):5-11,7.
集成电路与嵌入式系统2026,Vol.26Issue(1):5-11,7.DOI:10.20193/j.ices2097-4191.2025.0081

面向稀疏矩阵向量乘法的GPU性能建模和算法优化

GPU performance modeling and algorithm optimization for SpMV

MA Chengyu 1LI Suolan 2LIU Yinuo 1ZHAO Wenzhe 1REN Pengju 1XIA Tian3

作者信息

  • 1. College of Artificial Intelligence,Xi'an Jiaotong University,Xi'an 710049,China
  • 2. China Academy of Launch Vehicle Technology,Beijing 100076,China
  • 3. College of Artificial Intelligence,Xi'an Jiaotong University,Xi'an 710049,China||Laboratory for Advanced Computing and Intelligence Engineering,Wuxi 214083,China
  • 折叠

摘要

Abstract

To address the performance bottleneck of Sparse Matrix-Vector Multiplication(SpMV)on GPU platforms,this paper proposes an optimization algorithm based on row re-segmentation and its accompanying performance evaluation model.The method first establi-shes a quantitative mapping relationship between matrix row lengths and computational resource allocation.By setting dynamic thresh-olds,the original matrix is partitioned into long-row and short-row submatrices,which are then computed using thread-level and thread-block-level parallel strategies respectively.This approach effectively alleviates the inherent conflict between GPU SIMT execution char-acteristics and irregular data distribution in sparse matrices.To quantify the additional overhead introduced during preprocessing,per-formance penalty models for Atomic Conflict and Padding are developed,transforming extra memory access and computation into com-putable cost functions.Building upon these models,a parameter space search algorithm is constructed that rapidly identifies optimal pre-processing parameters within predefined parameter sets by leveraging pre-acquired hardware performance metrics and matrix non-zero element distribution information.The experimental results demonstrate that the proposed optimization algorithm outperforms traditional GPU sparse computation library cuSPARSE across multiple benchmark sparse matrix datasets,achieving performance improvements of up to 1.26×and 1.17×in specific scenarios.Furthermore,the parameter search process incurs low overhead,and the method exhibits strong generalizability,demonstrating adaptability to diverse input matrices and GPU hardware architectures.

关键词

GPU性能建模/并行算法优化/稀疏矩阵/SpMV

Key words

GPU performance modeling/parallel algorithm optimization/sparse matrix/SpMV

分类

信息技术与安全科学

引用本文复制引用

MA Chengyu,LI Suolan,LIU Yinuo,ZHAO Wenzhe,REN Pengju,XIA Tian..面向稀疏矩阵向量乘法的GPU性能建模和算法优化[J].集成电路与嵌入式系统,2026,26(1):5-11,7.

集成电路与嵌入式系统

1009-623X

访问量0
|
下载量0
段落导航相关论文