集成电路与嵌入式系统2025,Vol.25Issue(11):15-23,9.DOI:10.20193/j.ices2097-4191.2025.0073
基于矩阵扩展指令的GEMM性能优化
Performance optimization of GEMM based on matrix expansion instructions
摘要
Abstract
The rapid development of Generative Artificial Intelligence(GenAI),driven by breakthroughs in Deep Learning(DL)and Large Language Model(LLM)technologies,has imposed increasingly stringent requirements on the performance and energy efficiency of underlying computing hardware platforms and their executing algorithms.As a fundamental operation,General Matrix Multiplication(GEMM)supports the vast majority of computing tasks in the training and inference of deep neural networks.Therefore,the efficiency of matrix multiplication kernels directly and profoundly affects core indicators such as model training duration,inference latency,and as-sociated operational costs-factors that are crucial to the practical deployment and scalability of AI solutions.Currently,there is room for improvement in the optimization of matrix extension instructions in the field of artificial intelligence,and optimizing matrix operation al-gorithms for domestic microprocessors is of great significance.This paper focuses on performance optimization of GEMM based on ma-trix extension instructions for domestic microprocessors.The operational efficiency of GEMM is improved through instruction optimiza-tion,pipeline adjustment,outer product extension,and other aspects,and the correctness and feasibility of the optimization scheme are verified through tests.Experimental results show that this optimization method can improve the operational efficiency of single-precision floating-point matrix multiplication by more than 10%.关键词
浮点矩阵乘法/矩阵扩展指令/流水线优化/时钟周期数/人工智能Key words
floating-point matrix multiplication/matrix extension instructions/pipeline optimization/number of clock cycles/artificial in-telligence分类
计算机与自动化引用本文复制引用
张文元,邓全,谢占梅,陶静,雷国庆,王永文..基于矩阵扩展指令的GEMM性能优化[J].集成电路与嵌入式系统,2025,25(11):15-23,9.基金项目
科技创新类湖湘青年英才项目(2024RC3116) (2024RC3116)
国自科青年项目(NSFC-62202481) (NSFC-62202481)
国防科技大学科研计划项目(ZK22-05) (ZK22-05)