湖北民族大学学报(自然科学版)2025,Vol.43Issue(1):34-40,7.DOI:10.13501/j.cnki.42-1908/n.2025.03.005
基于自适应分层梯度压缩的分布式训练通信优化方法
Distributed Training Communication Optimization Method Based on Adaptive Hierarchical Gradient Compression
摘要
Abstract
To address the issues in the context of distributed machine learning of high communication overhead and low model training efficiency caused by frequent transmission of parameters and gradients between multiple computing nodes and parameter server nodes,a communication optimization method based on adaptive layered gradient compression(ALGC)was proposed.Firstly,an appropriate compression threshold was set for each layer of the neural network,and layers exceeding this threshold were selectively compressed.Secondly,a sparse threshold was separately set for each layer selected for compression and dynamically adjusted to achieve adaptive compression of gradient transmission for each layer.Finally,computation and communication were overlapped,and the parameter server aggregates the gradients and gradient residuals of each layer to update the global model.The results showed that the training accuracy of the ALGC method could reach up to 95.07%,and it achieved the minimum convergence time and the maximum speedup ratio.The ALGC method played a significant role in improving the model training speed and reducing communication overhead while ensuring the model training accuracy.关键词
分布式机器学习/梯度压缩/参数服务器/稀疏化/通信优化Key words
distributed machine learning/gradient compression/parameter server/sparsification/communication optimization分类
计算机与自动化引用本文复制引用
王晓晓,朱晓娟..基于自适应分层梯度压缩的分布式训练通信优化方法[J].湖北民族大学学报(自然科学版),2025,43(1):34-40,7.基金项目
安徽省高校省级自然科学研究重点项目(KJ2020A0300). (KJ2020A0300)