| 注册
首页|期刊导航|计算机科学与探索|大语言模型压缩综述

大语言模型压缩综述

郭晋阳 贺昌义 杨戈 刘祥龙

计算机科学与探索2026,Vol.20Issue(1):1-20,20.
计算机科学与探索2026,Vol.20Issue(1):1-20,20.DOI:10.3778/j.issn.1673-9418.2504069

大语言模型压缩综述

Survey of Model Compression for Large Language Model

郭晋阳 1贺昌义 2杨戈 1刘祥龙2

作者信息

  • 1. 北京航空航天大学复杂关键软件环境全国重点实验室,北京 100191||北京航空航天大学人工智能学院,北京 100191
  • 2. 北京航空航天大学复杂关键软件环境全国重点实验室,北京 100191
  • 折叠

摘要

Abstract

Large language models(LLMs)have attracted considerable attention in recent years due to their strong cognitive capabilities and widespread applications in various fields.However,their tremendous demand for computation and memory makes it difficult to deploy them in resource-constrained scenarios.Model compression and acceleration techniques have thus emerged as critical approaches to reduce computational complexity and memory usage while maintaining model performance.This paper presents a comprehensive survey of recent advances in LLM compression and acceleration methods,aiming to grasp the current development status and future trends of the entire field,promote the advancement of LLM compression and acceleration technologies,and facilitate their application and implementation in both industry and academia.It begins by outlining the challenges LLMs face in terms of computational and storage overhead.Then,it categorizes and reviews the main technical approaches,including model pruning,quantization,knowledge distillation,and low-rank decomposition,highlighting their core principles,representative methods,and cutting-edge developments.In addition,the paper provides a detailed discussion on evaluation metrics such as inference latency,accuracy retention,and deployment cost,establishing a multidimensional evaluation framework.Finally,it explores the promising future directions of LLM compression methods,aiming to guide future research and industrial deployment of compressed LLMs.

关键词

人工智能/大语言模型/模型压缩与加速

Key words

artificial intelligence/large language model/model compression and acceleration

分类

信息技术与安全科学

引用本文复制引用

郭晋阳,贺昌义,杨戈,刘祥龙..大语言模型压缩综述[J].计算机科学与探索,2026,20(1):1-20,20.

基金项目

北京市科技计划项目(Z231100010323002) (Z231100010323002)

国家自然科学基金(62306025,92367204) (62306025,92367204)

CCF-百度松果基金.This work was supported by the Beijing Municipal Science and Technology Project(Z231100010323002),the National Natural Science Foundation of China(62306025,92367204),and the CCF-Baidu Open Fund. (Z231100010323002)

计算机科学与探索

1673-9418

访问量0
|
下载量0
段落导航相关论文