首页|期刊导航|数据与计算发展前沿|多GPU平台上三维格子Boltzmann方法的并行化实现

多GPU平台上三维格子Boltzmann方法的并行化实现

向星孙培杰张华海王利民

数据与计算发展前沿2025，Vol.7Issue(5)：16-27,12.

数据与计算发展前沿2025，Vol.7Issue(5)：16-27,12.DOI:10.11871/jfdc.issn.2096-742X.2025.05.002

多GPU平台上三维格子Boltzmann方法的并行化实现

Parallel Implementation of Three-Dimensional Lattice Boltzmann Method on Multi-GPU Platforms

向星 ¹孙培杰 ²张华海 ¹王利民³

作者信息

1. 中国科学院大学,化学工程学院,北京 100049
2. 中国科学院大学,化学工程学院,北京 100049||中国科学院过程工程研究所,介科学与工程全国重点实验室,北京 100190
3. 中国科学院大学,化学工程学院,北京 100049||中国石油大学(北京),化学工程与环境学院,北京 102249
折叠

摘要

Abstract

[Objective]The shift in computational paradigms driven by large-scale scientific computing problems has pro-pelled the development of general-purpose graphics processing units(GPGPU).The emerging lattice Boltzmann method in computational fluid dynamics(CFD)demonstrates significant advantages in computational efficiency and parallel scalability when coupled with advanced physical models.[Methods]This study designs and optimiz-es a parallel algorithm for the three-dimensional lattice Boltzmann method(D3Q19),considering three-dimen-sional domain decomposition and distributed data communication.[Results]Numerical verification and accuracy tests were conducted on three-dimensional flow benchmark cases with different grid scales on a domestic hetero-geneous acceleration computing platform.High-fidelity transient simulations were achieved,capturing the un-steady evolution of three-dimensional vortex structures at different time steps.In performance tests with a single GPU at different grid scales,the impact of data communication on parallel performance was discussed.In strong/weak scalability tests,two sets of control experiments were conducted:single-node single-GPU and single-node four-GPU setups,to investigate the differences in inter-node/intra-node data communication.The single-node sin-gle-GPU setup achieved a maximum computational grid scale of approximately 2.15 billion,using a total of 128 GPUs across 128 nodes,with a runtime of 262.119 seconds,parallel performance of 81.927 GLUPS(Giga Lattice Updates Per Second,1 GLUPS=103 MLUPS),and parallel efficiency of 94.76%.The single-node four-GPU set-up reached a maximum computational grid scale of approximately 8.59 billion,using 512 GPUs across 128 nodes,with parallel performance of 241.185 GLUPS and parallel efficiency of 69.71%.[Conclusions]The paral-lel implementation method proposed in this study achieves linear speedup and good parallel scalability,demon-strating the potential for efficient simulation on exascale supercomputing systems.

关键词

图形处理器/格子Boltzmann方法/扩展性测试/大规模并行计算/三维Taylor-Green涡流

Key words

graphics processing unit/lattice Boltzmann method/scalability testing/large scale parallel computing/three-di-mensional Taylor-Green vortex flow

引用本文复制引用

向星,孙培杰,张华海,王利民..多GPU平台上三维格子Boltzmann方法的并行化实现[J].数据与计算发展前沿,2025,7(5):16-27,12.

基金项目

国家自然科学基金(52476162) （52476162）

光合基金A类项目(202302015420) （202302015420）

中国科学院战略性先导研究专项(XDA0390501) （XDA0390501）

过程工程研究所前沿基础研究项目(QYJC-2023-01) （QYJC-2023-01）

国家自然科学基金重点项目(T2394501) （T2394501）

数据与计算发展前沿

ISSN：2096-742X

访问量0

下载量0

段落导航