| 注册
首页|期刊导航|计算机工程与应用|大模型时代下的注意力机制优化综述

大模型时代下的注意力机制优化综述

史登辉 奚雪峰 崔志明 朱润 王坚

计算机工程与应用2026,Vol.62Issue(8):34-47,14.
计算机工程与应用2026,Vol.62Issue(8):34-47,14.DOI:10.3778/j.issn.1002-8331.2506-0314

大模型时代下的注意力机制优化综述

Review of Attention Mechanism in Era of Large Language Models

史登辉 1奚雪峰 2崔志明 2朱润 3王坚4

作者信息

  • 1. 苏州科技大学 电子与信息工程学院,江苏 苏州 215000||苏州市虚拟现实智能交互及应用技术重点实验室,江苏 苏州 215000
  • 2. 苏州科技大学 电子与信息工程学院,江苏 苏州 215000||苏州市虚拟现实智能交互及应用技术重点实验室,江苏 苏州 215000||苏州科技大学 智慧城市研究院,江苏 苏州 215000
  • 3. 昆山市数据局,江苏 昆山 215301
  • 4. 昆山市公安局,江苏 昆山 215301
  • 折叠

摘要

Abstract

Since the attention mechanism was proposed,the Transformer architecture based on the attention mechanism has quickly established the core position of large models.The large language model has ushered in a new development direction and promoted fruitful results in many fields such as natural language processing and computer vision.In recent years,with the rapid development of large models,the parameter scale of the model has continued to grow,and the tradi-tional Transformer architecture has been difficult to meet the requirements of large model training.In addition to accumu-lating computing power,the adjustment of the model architecture and further exploration of the attention mechanism are effective ways to solve this challenge and have become a hot topic of research.This paper first introduces the traditional Transformer architecture and the current status of research on the Transformer architecture and its variants in recent years,and analyzes the principles of its core self-attention mechanism and the bottlenecks it faces.Subsequently,the improve-ments of the attention module in recent years are analyzed and summarized.Then,taking DeepSeek as an example,the core technical path of the Transformer-based MoE architecture and the multi-head latent attention mechanism(MLA)behind its explosion are explored.Finally,the current status of research on optimizing the attention mechanism is summa-rized and future research directions are prospected.

关键词

大语言模型(LLM)/注意力机制/多头潜在注意力机制(MLA)/MoE架构

Key words

large language model(LLM)/attention mechanism/multi-head latent attention mechanism(MLA)/MoE architecture

分类

信息技术与安全科学

引用本文复制引用

史登辉,奚雪峰,崔志明,朱润,王坚..大模型时代下的注意力机制优化综述[J].计算机工程与应用,2026,62(8):34-47,14.

基金项目

国家自然科学基金(62176175,62372318) (62176175,62372318)

苏州市水利水务科技项目(2025004). (2025004)

计算机工程与应用

1002-8331

访问量0
|
下载量0
段落导航相关论文