计算机工程与应用2026,Vol.62Issue(8):34-47,14.DOI:10.3778/j.issn.1002-8331.2506-0314
大模型时代下的注意力机制优化综述
Review of Attention Mechanism in Era of Large Language Models
摘要
Abstract
Since the attention mechanism was proposed,the Transformer architecture based on the attention mechanism has quickly established the core position of large models.The large language model has ushered in a new development direction and promoted fruitful results in many fields such as natural language processing and computer vision.In recent years,with the rapid development of large models,the parameter scale of the model has continued to grow,and the tradi-tional Transformer architecture has been difficult to meet the requirements of large model training.In addition to accumu-lating computing power,the adjustment of the model architecture and further exploration of the attention mechanism are effective ways to solve this challenge and have become a hot topic of research.This paper first introduces the traditional Transformer architecture and the current status of research on the Transformer architecture and its variants in recent years,and analyzes the principles of its core self-attention mechanism and the bottlenecks it faces.Subsequently,the improve-ments of the attention module in recent years are analyzed and summarized.Then,taking DeepSeek as an example,the core technical path of the Transformer-based MoE architecture and the multi-head latent attention mechanism(MLA)behind its explosion are explored.Finally,the current status of research on optimizing the attention mechanism is summa-rized and future research directions are prospected.关键词
大语言模型(LLM)/注意力机制/多头潜在注意力机制(MLA)/MoE架构Key words
large language model(LLM)/attention mechanism/multi-head latent attention mechanism(MLA)/MoE architecture分类
信息技术与安全科学引用本文复制引用
史登辉,奚雪峰,崔志明,朱润,王坚..大模型时代下的注意力机制优化综述[J].计算机工程与应用,2026,62(8):34-47,14.基金项目
国家自然科学基金(62176175,62372318) (62176175,62372318)
苏州市水利水务科技项目(2025004). (2025004)