首页|期刊导航|信阳师范学院学报(自然科学版)|基于HDF5的多层次结构并行IO算法

基于HDF5的多层次结构并行IO算法OACSTPCD

Multilevel Structure Parallel IO Algorithm Based on HDF5

中文摘要英文摘要

针对大规模数据输入输出的应用场景,提出了一种基于层次存储格式 HDF5(Hierarchical Data Format 5)的多层次并行IO(Input/Output)方案.该并行IO方案分为节点间和节点内两层:节点间以节点为单位IO数据并允许节点内部协同或独立工作,根据节点内部的工作方式分别提出了多层次并行IO算法和多层次哨兵并行IO算法,以有效提升IO效率并避免输出文件冗余.考虑异构计算和纯CPU计算两个典型应用场景,分别在曙光平台和Intel平台进行最大核数为4096、最大数据量为256G的多组实验.结果表明,多层次并行IO算法IO效率提高了1.97~25.87倍,多层次哨兵并行IO算法IO效率提高了6.53~9.36倍,且输出文件数量减少到多区并行IO算法的1/4和1/32.

A multi-level parallel IO(Input/Output)scheme based on Hierarchical Data Format(HDF5)was proposed for large-scale data input and output applications.The parallel IO scheme was divided into two layers:Inter-node IO data was taken as unit,intra-node IO data was allowed to work cooperatively or independently.According to the internal working mode of nodes,a multi-level parallel IO algorithm and a multi-level sentinel parallel IO algorithm were proposed respectively,which could effectively improve IO efficiency and avoid redundancy of output files.Considering the two typical application scenarios of heterogeneous computing and pure CPU computing,multi-group experiments with a maximum of 4096 cores and 256G data were carried out on Shuguang platform and Intel platform,respectively.The results showed that the IO efficiency of multi-level parallel IO algorithm was increased by 1.97~25.87 times.The IO efficiency of multi-level sentinel parallel IO algorithm was increased by 6.53~9.36 times,and the number of output files was reduced to 1/4 and 1/32 of the number of parallel IO algorithms.

马文鹏;翟环欣;李瑞莹;袁武

信阳师范大学 计算机与信息技术学院,河南 信阳 464000信阳师范大学 计算机与信息技术学院,河南 信阳 464000信阳艺术职业学院 信息与区块链技术学院,河南 信阳 464000中国科学院 计算机网络信息中心,北京 100083||中国科学院大学,北京 100049

计算机与自动化

层次存储格式大规模并行计算并行IO数据存储

Hierarchical Data Format(HDF5)massively parallel computingparallel IOdata storage

《信阳师范学院学报(自然科学版)》 2024 (4)

433-441,9

国家重点研发计划项目(2020YFB1709500)河南省重点研发与推广专项(科技攻关)(222102210162)

10.3969/j.issn.1003-0972.2024.04.003

评论