通信学报2025,Vol.46Issue(2):1-17,17.DOI:10.11959/j.issn.1000-436x.2025027
数据处理单元赋能的智算中心网络拥塞控制机制
DPU empowered intelligent congestion control mechanism for the intelligent computing center network
摘要
Abstract
Addressing the issue of frequent network congestion due to high-frequency interactions between intelligent computing center clusters,which compromised the real-time performance of intelligent services,a congestion control model driven by deep reinforcement learning algorithm was constructed with the data processing unit(DPU).By inte-grating pruning and quantization,the model was lightweighted.Moreover,the model was transformed into the efficient gradient-boosted decision tree through knowledge distillation method,allowing for precise matching of control actions with real-time network conditions.Simulation results show that the proposed mechanism is demonstrated to outperform existing methods in terms of generalization capability and control effectiveness.The network's effective throughput and fairness index JAIN are increased by more than 10.8%and 8.9%,respectively,across various experimental scenarios.P99 end-to-end latency and packet loss rate are reduced by more than 17.31%and 11.47%,respectively.The completion time of data flow transfer tasks in parallel computing scenarios is decreased by more than 11.23%.Additionally,rapid re-sponse capabilities to sudden changes in network status are exhibited.关键词
拥塞控制/多智能体深度强化学习/智算中心网络/远程直接内存访问网络/数据处理单元Key words
congestion control/multi-agent deep reinforcement learning/intelligent computing center network/remote di-rect memory access network/data processing unit分类
计算机与自动化引用本文复制引用
陈锦前,郭少勇,刘畅,亓峰,邱雪松..数据处理单元赋能的智算中心网络拥塞控制机制[J].通信学报,2025,46(2):1-17,17.基金项目
国家自然科学基金资助项目(No.62322103) (No.62322103)
北京市自然科学基金资助项目(No.4232009) (No.4232009)
中央高校基本科研业务费专项资金资助项目(No.2023ZCTH11) The National Natural Science Foundation of China(No.62322103),The Natural Science Foundation of Beijing(No.4232009),The Foundation of Central University Basic Research Projects(No.2023ZCTH11) (No.2023ZCTH11)