计算机科学与探索2019,Vol.13Issue(10):1654-1663,10.
申威众核处理器上的三对角并行求解器
Parallel Tridiagonal Solver on Sunway Many-Core Processors*
摘要
Abstract
Tridiagonal solver is an important numeric kernel that is widely used in scientific and engineering applications. Many highly optimized parallel algorithms on mainstream hardware platforms, such as CPU and GPU, have been proposed. However, on the Chinese domestically-made Sunway 26010 many-core processor, there is no such an algorithm that utilizes its unique hardware characteristics to maximize the performance. A Sunway-oriented distributive cyclic reduction algorithm (swDCR) is proposed in this paper, to solve a large number of small tridia-gonal equations. swDCR uses multiple CPEs (computation processing element) to solve each equation in parallel, combines the caches of multiple CPEs to store all the intermediate data in caches, and transmits data among CPEs using register communication. By well-designed thread-level data partition, the optimization effect of vectorization is maximized. swDCR outperforms MPE (management processing element) Thomas algorithm by 43.9 times in single precision and 36.7 times in double precision, and outperforms CPE Thomas algorithm by 2.07 times in both single and double precision. It achieves an effective bandwidth of 24 GB/s on one core group of Sunway 26010 processor.关键词
三对角/申威众核处理器/循环消去(CR)算法Key words
tridiagonal/Sunway many-core processor/cyclic reduction (CR) algorithm分类
信息技术与安全科学引用本文复制引用
刘侃,王欣亮,许平,薛巍..申威众核处理器上的三对角并行求解器[J].计算机科学与探索,2019,13(10):1654-1663,10.基金项目
The National Key Research and Development Program of China under Grant Nos. 2017YFA0604500, 2016YFA0602100 (国家重点研发计划) (国家重点研发计划)
the National Natural Science Foundation of China under Grant Nos. 91530323, 41776010 (国家自然科学基金). (国家自然科学基金)