多GPU系统的高速互联技术与拓扑发展现状研究OA北大核心CSTPCD
Research on the Development Status of High Speed Interconnection Technologies and Topologies of Multi-GPU Systems
多GPU系统通过横向扩展实现性能提升,以满足人工智能日趋复杂的算法和持续激增的数据所带来的不断增长的计算需求.对于多GPU系统而言,处理器间的互联带宽以及系统的拓扑是决定系统性能的关键因素.在传统的基于PCIe的多GPU系统中,PCIe带宽是限制系统性能的瓶颈.当前,面向GPU的高速互联技术成为解决多GPU系统带宽限制问题的有效方法.本文首先介绍了传统多GPU系统所采用的PCIe互联技术及其典型拓扑,然后以Nvidia NVLink、AMD Infinity Fabric Link、Intel Xe Link、壁仞科技BLink为例,对国内外代表性GPU厂商的面向GPU的高速互联技术及其拓扑进行了梳理分析,最后讨论了关于互联技术的研究启示.
Multi GPU systems achieve performance improvement through scaling out to meet the ever-in-creasing computation demand brought about by increasingly complex algorithms and the continuously increasing data in artificial intelligence.The interconnection bandwidth between processors,as well as topologies of sys-tems are the key factors that determine the performance of multi-GPU systems.In traditional PCIe-based multi-GPU systems,the PCIe bandwidth is the bottleneck that limits system performance.GPU-oriented high speed interconnection technologies become an effective method to solve the bandwidth limitation problem of multi-GPU systems at present.This article first introduces the PCIe interconnection technology and the typical topologies used in traditional multi-GPU systems.Then taking Nvidia NVLink,AMD Infinity Fabric Link,Intel Xe Link,and Biren Technology BLink as examples,GPU-oriented high speed interconnection technologies and topologies of representative GPU vendors at home and abroad are reviewed and analyzed.Finally,the research implication of interconnection technologies is discussed.
崔晨;吴迪;陶业荣;赵艳丽
中国人民解放军 63891 部队,河南 洛阳 471003
武器工业
多GPU系统高速互联技术拓扑互联带宽数据中心
multi-GPU systemhigh speed interconnection technologytopologyinterconnection band-widthdata center
《航空兵器》 2024 (001)
23-31 / 9
评论