| 注册
首页|期刊导航|数据与计算发展前沿|FlowAware:一种支持AI for Science任务的模型分布式自动并行方法

FlowAware:一种支持AI for Science任务的模型分布式自动并行方法

曾艳 梁迦隽 李欣 吴宝福 易广政 黄成创 邱扬 陈越 万健 胡帆 金思聪

数据与计算发展前沿2025,Vol.7Issue(5):65-87,23.
数据与计算发展前沿2025,Vol.7Issue(5):65-87,23.DOI:10.11871/jfdc.issn.2096-742X.2025.05.006

FlowAware:一种支持AI for Science任务的模型分布式自动并行方法

FlowAware:A Feature-Aware Automated Model Parallelization Method for AI-for-Science Tasks

曾艳 1梁迦隽 1李欣 1吴宝福 1易广政 1黄成创 1邱扬 1陈越 1万健 2胡帆 3金思聪1

作者信息

  • 1. 杭州电子科技大学,浙江 杭州 310018
  • 2. 杭州电子科技大学,浙江 杭州 310018||浙江科技大学,浙江 杭州 310023
  • 3. 浙江曙光信息技术有限公司,浙江 杭州 310013
  • 折叠

摘要

Abstract

[Objective]This study aims to address the inefficiency of AI-for-Science tasks caused by the design and implementation challenges of applying the distributed parallel computing strategies to deep learning models,as well as their inefficient execution.[Methods]We propose an automatic distributed parallelization method for AI-for-Science tasks,called FlowAware.Based on the AI-for-Science framework JAX,this approach thoroughly analyzes task characteristics,operator structures,and data flow properties of deep learn-ing models.By incorporating cluster topology information,it constructs a search space for distributed parallel computing strategies.Guided by load balancing and communication optimization objectives,FlowAware automat-ically identifies optimal distributed parallel computing strategies for AI models.[Results]Comparative experi-ments conducted on both GPU-like accelerator clusters and GPU clusters demonstrated that FlowAware achieves a throughput improvement of up to 7.8×compared to Alpa.[Conclusions]FlowAware effectively enhances the search efficiency of distributed parallel computing strategies for AI models in scientific computing tasks and sig-nificantly improves their computational performance.

关键词

AI for Science/深度学习/分布式并行计算

Key words

AI for Science/deep learning/distributed parallel computing

引用本文复制引用

曾艳,梁迦隽,李欣,吴宝福,易广政,黄成创,邱扬,陈越,万健,胡帆,金思聪..FlowAware:一种支持AI for Science任务的模型分布式自动并行方法[J].数据与计算发展前沿,2025,7(5):65-87,23.

基金项目

This work was supported by the National Key Research and Development Program of China(2023YF-B3001501) (2023YF-B3001501)

the National Natural Science Foundation of China(NSFC)(62302133) (NSFC)

the Key Research and Development Program of Zhejiang Province(2024C01026) (2024C01026)

the Yangtze River Delta Project(2023Z-Y1068) (2023Z-Y1068)

Hangzhou Key Research Plan Project(2024SZD1A02) (2024SZD1A02)

the GHfund A(202302019816). (202302019816)

数据与计算发展前沿

2096-742X

访问量0
|
下载量0
段落导航相关论文