大数据2025,Vol.11Issue(6):72-94,23.DOI:10.11959/j.issn.2096-0271.2025067
大语言模型长文本推断优化技术综述
Long-context inference optimization for large language models:a survey
摘要
Abstract
With the rapid development of large language model(LLM)technology,the demand for processing long-text inputs has been increasing.However,long-text inference faces challenges such as high memory consumption and latency.To improve the efficiency of LLMs in long-text inference,a comprehensive review and analysis of existing optimization techniques were conducted.The study first revealed three key factors that affect efficiency:the first is the huge model size,the second is the attention mechanism operation with quadratic computational complexity,and the third is the autoregressive decoding strategy.These factors together restrict the overall performance of the model.Subsequently,a taxonomy was proposed,categorizing optimization techniques into model optimization,computation optimization,and system optimization,with detailed introductions to key technologies such as quantization,sparse attention,and operator fusion.The research results demonstrate that these optimization techniques can effectively enhance the performance of long-text inference.Finally,future research directions were outlined,emphasizing the importance of further optimizing LLMs for long-text inference to meet the growing demands of context length.关键词
大语言模型/长文本推断/模型优化/计算优化/系统优化Key words
large language model/long-context inference/model optimization/computation optimization/system optimization分类
计算机与自动化引用本文复制引用
陶伟,王健宗,张旭龙,瞿晓阳..大语言模型长文本推断优化技术综述[J].大数据,2025,11(6):72-94,23.基金项目
广东省重点领域研发计划"新一代人工智能"重大专项(No.2021B0101400003) The Key Research and Development Program of Guangdong Province(No.2021B0101400003) (No.2021B0101400003)