电子学报2025,Vol.53Issue(11):3880-3893,14.DOI:10.12263/DZXB.20250411
面向工业场景的边-云协同大语言模型细粒度推理任务卸载
Fine-Grained Inference Task Offloading for Large Language Model in Industrial Edge-Cloud Collaborative Scenarios
摘要
Abstract
Large language model(LLM)has exhibited exceptional performance in inference.However,achieving re-al-time and high-efficiency inference in complex industrial scenarios remains a significant challenge.Traditional centralized cloud-based inference architectures are constrained by the latency of long chain of thought(CoT)reasoning and transmis-sion bottleneck,rendering them inadequate to meet the stringent low-latency requirements of complex industrial inference.Conversely,although lightweight LLM deployed on the edge can achieve rapid response,limited inference capabilities also compromise the inference quality.Therefore,edge-cloud collaborative inference emerges as an inevitable choice.However,single-modal LLM struggle to accommodate modality-specific characteristics and diverse task requirements,while the wide-spread applicability of multimodal LLM is limited by the high computational costs.Moreover,directly employing an LLM for complex inference often leads to hallucinations,undermining inference reliability.To address the issues,a fine-grained LLM inference task offloading framework based on edge-cloud collaboration is proposed in this paper.Specifically,light-weight and modality-specialized LLM are deployed on the edge to efficiently process simple tasks with minimal latency,while a powerful multimodal deep LLM resides in the cloud to execute complex logical reasoning tasks,ensuring inference quality.Complex LLM inference is decomposed into three stages and modeled as a directed acyclic graph(DAG).With this representation,the communication and inference models are constructed,and the LLM inference is formulated as a minimi-zation problem of the weighted sum between overall inference latency and cost.With the proof that the investigated problem can be transferred into a discrete Markov decision process(MDP),considering the complex interactions between subtask fea-tures and dynamic system resource states,a solution named UCB-COMA,integrating the upper confidence bound(UCB)-based action selection mechanism with counterfactual multi-agent policy gradient(COMA),is designed to enable joint opti-mization of subtask scheduling order and executing position of inference subtask.Experimental results demonstrate that the performance of UCB-COMA is superior to that of comparison schemes.关键词
大语言模型/边-云协同/任务卸载/深度强化学习/工业物联网Key words
large language model/edge-cloud collaboration/task offloading/deep reinforcement learning/industrial internet of things分类
信息技术与安全科学引用本文复制引用
廖玲玲,陶铭,谢仁平,张引,袁华强..面向工业场景的边-云协同大语言模型细粒度推理任务卸载[J].电子学报,2025,53(11):3880-3893,14.基金项目
国家自然科学基金(No.62572122,No.62572099) National Natural Science Foundation of China(No.62572122,No.62572099) (No.62572122,No.62572099)