| 注册
首页|期刊导航|华南理工大学学报(自然科学版)|基于Matrix Core的高性能多维FFT设计与优化

基于Matrix Core的高性能多维FFT设计与优化

陆璐 祝松祥 田卿燕 林海山 郭逸劼

华南理工大学学报(自然科学版)2025,Vol.53Issue(3):20-30,11.
华南理工大学学报(自然科学版)2025,Vol.53Issue(3):20-30,11.DOI:10.12141/j.issn.1000-565X.240035

基于Matrix Core的高性能多维FFT设计与优化

Design and Optimization of High-Performance Multi-Dimensional FFT Based on Matrix Core

陆璐 1祝松祥 2田卿燕 3林海山 3郭逸劼2

作者信息

  • 1. 华南理工大学 计算机科学与工程学院,广东 广州 510006||鹏城实验室,广东 深圳 518000
  • 2. 华南理工大学 计算机科学与工程学院,广东 广州 510006
  • 3. 广东省隧道工程安全与应急保障技术及装备企业重点实验室,广东 广州 510440
  • 折叠

摘要

Abstract

Fast Fourier transform(FFT)algorithm finds widespread application in scientific computing and related fields.To fully leverage the computational power of the GPU and further enhance the performance of FFT calculations,this paper proposed a high-performance multi-dimensional FFT computation scheme based on the Matrix Core for the matrix form of Stockham FFT.In terms of computational optimization,this scheme utilizes Matrix Core to accelerate matrix multiplications in FFT computation while leveraging compiler intrinsic instructions to perform small-grained matrix multiply-accumulate operations,enabling Matrix Core to support FFT computations of more sizes.To mini-mize memory access,the proposed scheme directly performs matrix element-wise multiplication operations in the registers according to the distribution pattern of Matrix Core's data across thread registers.It also mitigates bank conflicts by reordering data in shared memory,adopts a double-buffering strategy to alleviate access bottlenecks,and proposes an efficient matrix transposition strategy to accelerate multidimensional FFT computations.In this paper,the proposed scheme was compared to the well-known high-performance FFT computation libraries rocFFT and VkFFT on the AMD MI250 GPU platform.The results demonstrate that the proposed scheme outperforms rocFFT and VkFFT in terms of average computational performance for 1-dimensional,2-dimensional,and 3-dimensional FFTs on the AMD MI250 GPU platform.For 3D FFT calculation,this method has an average performance that is 1.5 times faster than rocFFT and 2.0 times faster than VkFFT,demonstrating significant performance improvements.

关键词

图形处理器/Matrix Core/快速傅里叶变换/矩阵乘法

Key words

graphics processing unit/Matrix Core/fast Fourier transform/matrix multiplication

分类

信息技术与安全科学

引用本文复制引用

陆璐,祝松祥,田卿燕,林海山,郭逸劼..基于Matrix Core的高性能多维FFT设计与优化[J].华南理工大学学报(自然科学版),2025,53(3):20-30,11.

基金项目

广东省重点领域研发计划项目(2022B0101070001) Supported by the Key-Area R&D Program of Guangdong Province(2022B0101070001) (2022B0101070001)

华南理工大学学报(自然科学版)

OA北大核心

1000-565X

访问量0
|
下载量0
段落导航相关论文