| 注册
首页|期刊导航|计算机工程与科学|基于Tensor Cores的新型GPU架构的高性能Cholesky分解

基于Tensor Cores的新型GPU架构的高性能Cholesky分解

石璐 邹高远 伍思琦 张少帅

计算机工程与科学2025,Vol.47Issue(7):1170-1180,11.
计算机工程与科学2025,Vol.47Issue(7):1170-1180,11.DOI:10.3969/j.issn.1007-130X.2025.07.004

基于Tensor Cores的新型GPU架构的高性能Cholesky分解

High performance Tholesky factorization on emerging GPU architectures using Tensor Cores

石璐 1邹高远 1伍思琦 1张少帅1

作者信息

  • 1. 电子科技大学计算机科学与工程学院(网络空间安全学院),四川成都 611731
  • 折叠

摘要

Abstract

The general matrix-matrix multiplications(GEMMs)can achieve highly optimized per-formance on Tensor Cores.However,due to its limited parallelism,the existing implementations of Cholesky factorization fail to reach most of the peak performance of Tensor Cores.This paper studies a recursive Cholesky factorization algorithm that recursively subdivides diagonal blocks,generating a large number of GEMMs operations between non-diagonal blocks.This algorithm enables the extraction of a higher proportion of the peak performance of Tensor Cores for internal symmetric Rank-K update(SYRK)and triangular solve matrix(TRSM)operations.Experimental results show that the recursive Cholesky decomposition algorithm proposed in this paper achieves speedups of 1.72 × and 1.62× com-pared to the MAGMA/cuSOLVER algorithms on FP32 and FP16,respectively.

关键词

Cholesky分解/高性能计算/数值线性代数/通用图形处理器(GPGPU)

Key words

Cholesky factorization/high performance computing/numerical linear algebra/general-purpose computing on graphics processing units(GPGPU)

分类

信息技术与安全科学

引用本文复制引用

石璐,邹高远,伍思琦,张少帅..基于Tensor Cores的新型GPU架构的高性能Cholesky分解[J].计算机工程与科学,2025,47(7):1170-1180,11.

计算机工程与科学

OA北大核心

1007-130X

访问量1
|
下载量0
段落导航相关论文