计算机工程2025,Vol.51Issue(10):27-36,10.DOI:10.19678/j.issn.1000-3428.0070644
基于自适应张量交换和重算的大模型推理优化
Inference Optimization for Large Models Based on Adaptive Tensor Swapping and Recomputation
摘要
Abstract
Large Language Models(LLM)have demonstrated outstanding performance in natural language processing tasks.However,their extremely large parameter scales pose a significant challenge because the limited capacity of GPU memory becomes a performance bottleneck for inference tasks.To address this issue in the context of LLM inference services,this study proposes AdaptiveLLM,which enables the adaptive selection of offloading strategies between tensor swapping and tensor recomputation based on the characteristics of inference task workloads.To evaluate the characteristics of inference task workloads,AdaptiveLLM establishes a black-box Machine Learning(ML)model through an operator-level computational complexity analysis to predict the overhead of tensor recomputation.It also predicts the overhead of tensor swapping by conducting a fine-grained analysis of KV Cache memory usage.For the adaptive selection of offloading strategies,AdaptiveLLM designs a cost-aware memory optimization strategy specifically for the pre-emption scheduling phase.When GPU memory is insufficient,it opts for the offloading approach with a lower overhead.For the initiation scheduling phase,it devises a fairness-based user-request scheduling strategy.When GPU memory is available,it schedules more user requests in accordance with the principle of fairness.Experimental results indicate that,compared with currently widely used LLM inference benchmark frameworks,AdaptiveLLM achieves an overall increase in throughput while reducing the average weighted turnaround time,thereby realizing fair scheduling.关键词
大语言模型/推理/张量交换/张量重算/吞吐率/公平性Key words
Large Language Models(LLM)/inference/tensor swapping/tensor recomputation/throughput/fairness分类
计算机与自动化引用本文复制引用
梁绪宁,王思琪,杨海龙,栾钟治,刘轶,钱德沛..基于自适应张量交换和重算的大模型推理优化[J].计算机工程,2025,51(10):27-36,10.基金项目
国家重点研发计划(2023YFB3001801) (2023YFB3001801)
国家自然科学基金(62322201,62072018,U23B2020) (62322201,62072018,U23B2020)
中央高校基本科研业务费专项资金(YWF-23-L-1121,JKF-20240198) (YWF-23-L-1121,JKF-20240198)
复杂软件全国重点实验室项目(SKLSDE-2023ZX-05). (SKLSDE-2023ZX-05)