数据与计算发展前沿2025,Vol.7Issue(5):65-87,23.DOI:10.11871/jfdc.issn.2096-742X.2025.05.006
FlowAware:一种支持AI for Science任务的模型分布式自动并行方法
FlowAware:A Feature-Aware Automated Model Parallelization Method for AI-for-Science Tasks
摘要
Abstract
[Objective]This study aims to address the inefficiency of AI-for-Science tasks caused by the design and implementation challenges of applying the distributed parallel computing strategies to deep learning models,as well as their inefficient execution.[Methods]We propose an automatic distributed parallelization method for AI-for-Science tasks,called FlowAware.Based on the AI-for-Science framework JAX,this approach thoroughly analyzes task characteristics,operator structures,and data flow properties of deep learn-ing models.By incorporating cluster topology information,it constructs a search space for distributed parallel computing strategies.Guided by load balancing and communication optimization objectives,FlowAware automat-ically identifies optimal distributed parallel computing strategies for AI models.[Results]Comparative experi-ments conducted on both GPU-like accelerator clusters and GPU clusters demonstrated that FlowAware achieves a throughput improvement of up to 7.8×compared to Alpa.[Conclusions]FlowAware effectively enhances the search efficiency of distributed parallel computing strategies for AI models in scientific computing tasks and sig-nificantly improves their computational performance.关键词
AI for Science/深度学习/分布式并行计算Key words
AI for Science/deep learning/distributed parallel computing引用本文复制引用
曾艳,梁迦隽,李欣,吴宝福,易广政,黄成创,邱扬,陈越,万健,胡帆,金思聪..FlowAware:一种支持AI for Science任务的模型分布式自动并行方法[J].数据与计算发展前沿,2025,7(5):65-87,23.基金项目
This work was supported by the National Key Research and Development Program of China(2023YF-B3001501) (2023YF-B3001501)
the National Natural Science Foundation of China(NSFC)(62302133) (NSFC)
the Key Research and Development Program of Zhejiang Province(2024C01026) (2024C01026)
the Yangtze River Delta Project(2023Z-Y1068) (2023Z-Y1068)
Hangzhou Key Research Plan Project(2024SZD1A02) (2024SZD1A02)
the GHfund A(202302019816). (202302019816)