计算机工程与科学2025,Vol.47Issue(7):1170-1180,11.DOI:10.3969/j.issn.1007-130X.2025.07.004
基于Tensor Cores的新型GPU架构的高性能Cholesky分解
High performance Tholesky factorization on emerging GPU architectures using Tensor Cores
石璐 1邹高远 1伍思琦 1张少帅1
作者信息
- 1. 电子科技大学计算机科学与工程学院(网络空间安全学院),四川成都 611731
- 折叠
摘要
Abstract
The general matrix-matrix multiplications(GEMMs)can achieve highly optimized per-formance on Tensor Cores.However,due to its limited parallelism,the existing implementations of Cholesky factorization fail to reach most of the peak performance of Tensor Cores.This paper studies a recursive Cholesky factorization algorithm that recursively subdivides diagonal blocks,generating a large number of GEMMs operations between non-diagonal blocks.This algorithm enables the extraction of a higher proportion of the peak performance of Tensor Cores for internal symmetric Rank-K update(SYRK)and triangular solve matrix(TRSM)operations.Experimental results show that the recursive Cholesky decomposition algorithm proposed in this paper achieves speedups of 1.72 × and 1.62× com-pared to the MAGMA/cuSOLVER algorithms on FP32 and FP16,respectively.关键词
Cholesky分解/高性能计算/数值线性代数/通用图形处理器(GPGPU)Key words
Cholesky factorization/high performance computing/numerical linear algebra/general-purpose computing on graphics processing units(GPGPU)分类
信息技术与安全科学引用本文复制引用
石璐,邹高远,伍思琦,张少帅..基于Tensor Cores的新型GPU架构的高性能Cholesky分解[J].计算机工程与科学,2025,47(7):1170-1180,11.