首页|期刊导航|现代电子技术|GPU矩阵乘法和FFT算法的性能优化

GPU矩阵乘法和FFT算法的性能优化

李晓雯崔翔

现代电子技术2013，Vol.36Issue(4)：80-84,5.

GPU矩阵乘法和FFT算法的性能优化

Performance optimization of matrix multiplication and FFT in GPU

李晓雯 ¹崔翔²

作者信息

1. 防空兵学院指挥控制系,河南郑州450000
2. 河南大学计算机与信息工程学院,河南开封475003
折叠

摘要

Abstract

The optimization technique of GPU program performance is investigated for obtaining the common method to de-sign many-core GPU high-performance program. The authors' experiences in improving the performance of two key algorithms: single - precision matrix - matrix multiplication subprogram (SGEMM of BLAS) and single - precision FFT using CUDA are dis-cussed in this paper. The former is computation-intensive, while the latter is memory bandwidth or communication-intensive. The peak speed of 393 Gflops was achieved on NVIDIA GeForce GTX280 GPU for the former. It is about 5% faster than the CUB-LAS 2.0 library. Better FFT performance was obtained for a range of dimensions. Some common principles are discussed for the design and implementation of many-core algorithms.

关键词

GPU程序设计/矩阵乘法/快速傅里叶变换/性能优化技术

Key words

GPU programming/matrix multiplication/FFT/performance optimization technique

分类

信息技术与安全科学

引用本文复制引用

李晓雯,崔翔..GPU矩阵乘法和FFT算法的性能优化[J].现代电子技术,2013,36(4):80-84,5.

基金项目

国家"863"高技术研究发展计划项目基金(2012AA010902) （2012AA010902）

国家自然科学基金资助项目(61240045 （）

10571178) （）

现代电子技术

OACSTPCD

ISSN：1004-373X

访问量0

下载量0

段落导航