计算机工程与科学2018,Vol.40Issue(1):10-14,5.DOI:10.3969/j.issn.1007-130X.2018.01.002
面向国产异构系统的HPL异构协同设计
Orchestrating HPL between CPU and China accelerator
摘要
Abstract
HPL is a Linpack benchmark package widely used in high performance computing test.Matrix is divided into sub-matrix and distributed into computing elements in traditional HPL algorithm.However,it is ineffective for China Accelerator because of a specified interface on matrix multiplication built in China Accelerator.Thus,dPEM (delicate Partition and Encapsulation on Matrix) is advised to expose a friendly testing configuration environment.Furthermore,we propose OA4MM (Orchestrating Algorithm for Matrix multiplication) based on heterogeneous system composed of CPU and China Accelerator.Experimental results validate dPEM and OA4MM on CPU + China Accelerator.OA4MM can promote productivity up to 10% in comparison to heterogeneous HPL.关键词
HPL/国产加速器/矩阵分布细致划分与封装/异构协同矩阵乘调度Key words
HPL/China accelerator/delicate partition and encapsulation on matrix/orchestrating algorithm for matrix multiplication分类
信息技术与安全科学引用本文复制引用
甘新标,孙燎原,刘杰,雄成伟,黄嘉昆..面向国产异构系统的HPL异构协同设计[J].计算机工程与科学,2018,40(1):10-14,5.基金项目
国家重点研发计划(2017YFB0202104) (2017YFB0202104)
国家自然科学基金(61602495,61402039,11401580,11665012) (61602495,61402039,11401580,11665012)
计算机软件新技术国家重点实验室(南京大学)开放课题(KFKT2016B25) (南京大学)
国防科技大学预研计划(ZK16-03-06) (ZK16-03-06)
国家重点实验室专项基金(Y62612A87S) (Y62612A87S)
中国科学院光谱成像技术重点实验室开放基金(LIST201602D) (LIST201602D)