高技术通讯2025,Vol.35Issue(12):1263-1276,14.DOI:10.3772/j.issn.1002-0470.2025.12.001
海光深度计算处理器上分析模型驱动的矩阵乘性能优化
Analytical model-driven matrix multiplication optimization on the Hygon deep compute unit
摘要
Abstract
This paper presents an optimization method for dense matrix multiplication on the domestic Hygon deep com-pute unit(DCU)based on analytical model.High-performance algorithm implementations require precise mapping of software optimizations to hardware characteristics.Across various central processing unit(CPU)architectures,analytical models have been proven to be effective optimization methods.They enable the determination of software parameters based on different architecture parameters,achieving performance comparable to expert-tuned imple-mentations.The domestic Hygon DCU accelerator is one of the successful representatives of domestic high-perform-ance chips,which is of great significance for the autonomy and controllability of domestic chips.However,algo-rithm optimization on the DCU accelerator lacks guidance and faces challenges such as key algorithm parameter de-termination,low performance,and excessive reliance on experience.In this paper,we take the optimization of ma-trix multiplication as a case study and propose a matrix multiplication analytical model for the Hygon DCU architec-ture.Firstly,the general architectural features of Hygon DCU and matrix multiplication algorithms are modeled from hardware and algorithm perspectives respectively.Based on this,the proposed approach in this paper establi-shes the connection between algorithm parameter selection and underlying hardware architecture from three aspects:bandwidth analysis,latency analysis,and resource analysis.This enables quick determination of key algorithm pa-rameters for different types of matrix multiplication on different architecture DCUs.Experimental results show that the algorithm parameters derived from the analytical model are consistent with those selected by experts,and the performance of the model-driven optimized matrix multiplication can achieve comparable performance with expert implementation.The research on performance optimization of matrix multiplication driven by analytical models not only provides reference for other dense computation optimizations on the domestic Hygon DCU,but also offers a fea-sible approach for methodizing implicit optimization experiences.关键词
矩阵乘优化/分析模型/海光深度计算处理器Key words
matrix multiplication optimization/analytical model/Hygon deep compute unit引用本文复制引用
水超洋,谭光明..海光深度计算处理器上分析模型驱动的矩阵乘性能优化[J].高技术通讯,2025,35(12):1263-1276,14.基金项目
国家重点研发计划(2018YFB0204400,2016YFB0201305,2016YFB0200803,2016YFB0200300),国家自然科学基金(61972377,61432018,61702483),中国科学院战略性先导科技专项(C类)(XDC01030000)和中国科学院前沿科学重点研究计划(QYZDJ-SSW-JSC035)资助项目. (2018YFB0204400,2016YFB0201305,2016YFB0200803,2016YFB0200300)