计算机科学与探索2024,Vol.18Issue(1):111-126,16.DOI:10.3778/j.issn.1673-9418.2209026
深度学习编译器模型训练负载均衡优化方法
Deep Learning Compiler Load Balancing Optimization Method for Model Training
摘要
Abstract
For computing-intensive artificial intelligence(AI)training tasks,the computational graph is more com-plex,and data loading,task division of the computational graph,and load balancing of task scheduling have become the key factors affecting the computing performance.This paper proposes three optimization methods to make the task scheduling of model training in deep learning compilers reach the load balance state.Firstly,the load balance between CPU and back-end computing devices is realized by automatically establishing an efficient pipeline for data loading and model training,which improves the overall energy efficiency of the system.Secondly,the layered opti-mization technology of computational graph is used to realize the load balance of computational graph when the back-end devices are scheduling.Finally,this paper improves the resource utilization of back-end devices by auto-matically establishing efficient pipeline between layers.Experimental results show that the proposed optimization method achieves the system load balancing in the process of automatically mapping the training tasks to underlying hardware devices.Compared with traditional deep learning frameworks and compilers such as TensorFlow,nGraph,etc.,this paper achieves 2%~10%performance improvement in the training of different AI models,and the overall power consumption of the training system can be reduced by more than 10%.关键词
模型训练/编译器优化/负载均衡/分层调度/自动流水Key words
model training/compiler optimization/load balancing/hierarchical scheduling/automatic pipelining分类
信息技术与安全科学引用本文复制引用
王丽,高开,赵雅倩,李仁刚,曹芳,郭振华..深度学习编译器模型训练负载均衡优化方法[J].计算机科学与探索,2024,18(1):111-126,16.基金项目
科技部-科技创新2030——"新一代人工智能"重大项目(2021ZD0113000).This work was supported by the New Generation Artificial Intelligence Key Project of Science and Technology Innovation 2030 Sup-ported by the Ministry of Science and Technology of China(2021ZD0113000). (2021ZD0113000)