计算机工程与科学2025,Vol.47Issue(3):392-399,8.DOI:10.3969/j.issn.1007-130X.2025.03.002
基于天河互连的并行文件系统网络驱动
Parallel file system network driver based on Tianhe inter-connection system
摘要
Abstract
Parallel file system is an essential component of the software stack in high performance computing systems.The driver designed for high-speed networks is a crucial aspect of parallel file sys-tems in providing efficient data access.A parallel file network driver based on the Tianhe high-speed in-terconnect network(TH-Express),named GLND,has been designed and implemented.GLND has been optimized specifically in three areas:parallelization,communication protocol,and fault tolerance.It achieves high throughput through VP-level parallelism combined with appropriately balanced pipeline partitioning.It adaptively selects the underlying communication protocol based on factors such as mes-sage size differences,implementing a NUMA-aware memory management mechanism.Additionally,an adaptively adjustable timeout mechanism is employed to avoid the impact of abnormal timeouts at the software layer on the completion of communication operations.Experimental results show that under the same hardware conditions,GLND improves write bandwidth by an average of 23.69%and read bandwidth by an average of 79.25%compared to TCP.关键词
并行文件系统/互连网络/网络编程接口Key words
parallel file system/interconnect/network programming interface分类
计算机与自动化引用本文复制引用
董勇,邬会军,杨梨花,张伟,王睿伯,周恩强..基于天河互连的并行文件系统网络驱动[J].计算机工程与科学,2025,47(3):392-399,8.基金项目
国家重点研发计划(2021YFB0300101) (2021YFB0300101)
国防科技大学HPCL重点实验室项目(202101-03) (202101-03)