首页|期刊导航|大数据|大语言模型长文本推断优化技术综述

大语言模型长文本推断优化技术综述

陶伟王健宗张旭龙瞿晓阳

大数据2025，Vol.11Issue(6)：72-94,23.

大数据2025，Vol.11Issue(6)：72-94,23.DOI:10.11959/j.issn.2096-0271.2025067

大语言模型长文本推断优化技术综述

Long-context inference optimization for large language models:a survey

陶伟 ¹王健宗 ²张旭龙 ²瞿晓阳²

作者信息

1. 平安科技(深圳)有限公司,广东深圳 518063||华中科技大学,湖北武汉 430070
2. 平安科技(深圳)有限公司,广东深圳 518063
折叠

摘要

Abstract

With the rapid development of large language model(LLM)technology,the demand for processing long-text inputs has been increasing.However,long-text inference faces challenges such as high memory consumption and latency.To improve the efficiency of LLMs in long-text inference,a comprehensive review and analysis of existing optimization techniques were conducted.The study first revealed three key factors that affect efficiency:the first is the huge model size,the second is the attention mechanism operation with quadratic computational complexity,and the third is the autoregressive decoding strategy.These factors together restrict the overall performance of the model.Subsequently,a taxonomy was proposed,categorizing optimization techniques into model optimization,computation optimization,and system optimization,with detailed introductions to key technologies such as quantization,sparse attention,and operator fusion.The research results demonstrate that these optimization techniques can effectively enhance the performance of long-text inference.Finally,future research directions were outlined,emphasizing the importance of further optimizing LLMs for long-text inference to meet the growing demands of context length.

关键词

大语言模型/长文本推断/模型优化/计算优化/系统优化

Key words

large language model/long-context inference/model optimization/computation optimization/system optimization

分类

信息技术与安全科学

引用本文复制引用

陶伟,王健宗,张旭龙,瞿晓阳..大语言模型长文本推断优化技术综述[J].大数据,2025,11(6):72-94,23.

基金项目

广东省重点领域研发计划"新一代人工智能"重大专项(No.2021B0101400003) The Key Research and Development Program of Guangdong Province(No.2021B0101400003) （No.2021B0101400003）

大数据

ISSN：2096-0271

访问量5

下载量0

段落导航