首页|期刊导航|电子学报|面向工业场景的边-云协同大语言模型细粒度推理任务卸载

面向工业场景的边-云协同大语言模型细粒度推理任务卸载

廖玲玲陶铭谢仁平张引袁华强

电子学报2025，Vol.53Issue(11)：3880-3893,14.

电子学报2025，Vol.53Issue(11)：3880-3893,14.DOI:10.12263/DZXB.20250411

面向工业场景的边-云协同大语言模型细粒度推理任务卸载

Fine-Grained Inference Task Offloading for Large Language Model in Industrial Edge-Cloud Collaborative Scenarios

廖玲玲 ¹陶铭 ¹谢仁平 ¹张引 ²袁华强¹

作者信息

1. 东莞理工学院计算机科学与技术学院(网络空间安全学院),广东东莞 523808
2. 电子科技大学信息与通信工程学院,四川成都 611731
折叠

摘要

Abstract

Large language model(LLM)has exhibited exceptional performance in inference.However,achieving re-al-time and high-efficiency inference in complex industrial scenarios remains a significant challenge.Traditional centralized cloud-based inference architectures are constrained by the latency of long chain of thought(CoT)reasoning and transmis-sion bottleneck,rendering them inadequate to meet the stringent low-latency requirements of complex industrial inference.Conversely,although lightweight LLM deployed on the edge can achieve rapid response,limited inference capabilities also compromise the inference quality.Therefore,edge-cloud collaborative inference emerges as an inevitable choice.However,single-modal LLM struggle to accommodate modality-specific characteristics and diverse task requirements,while the wide-spread applicability of multimodal LLM is limited by the high computational costs.Moreover,directly employing an LLM for complex inference often leads to hallucinations,undermining inference reliability.To address the issues,a fine-grained LLM inference task offloading framework based on edge-cloud collaboration is proposed in this paper.Specifically,light-weight and modality-specialized LLM are deployed on the edge to efficiently process simple tasks with minimal latency,while a powerful multimodal deep LLM resides in the cloud to execute complex logical reasoning tasks,ensuring inference quality.Complex LLM inference is decomposed into three stages and modeled as a directed acyclic graph(DAG).With this representation,the communication and inference models are constructed,and the LLM inference is formulated as a minimi-zation problem of the weighted sum between overall inference latency and cost.With the proof that the investigated problem can be transferred into a discrete Markov decision process(MDP),considering the complex interactions between subtask fea-tures and dynamic system resource states,a solution named UCB-COMA,integrating the upper confidence bound(UCB)-based action selection mechanism with counterfactual multi-agent policy gradient(COMA),is designed to enable joint opti-mization of subtask scheduling order and executing position of inference subtask.Experimental results demonstrate that the performance of UCB-COMA is superior to that of comparison schemes.

关键词

大语言模型/边-云协同/任务卸载/深度强化学习/工业物联网

Key words

large language model/edge-cloud collaboration/task offloading/deep reinforcement learning/industrial internet of things

分类

信息技术与安全科学

引用本文复制引用

廖玲玲,陶铭,谢仁平,张引,袁华强..面向工业场景的边-云协同大语言模型细粒度推理任务卸载[J].电子学报,2025,53(11):3880-3893,14.

基金项目

国家自然科学基金(No.62572122,No.62572099) National Natural Science Foundation of China(No.62572122,No.62572099) （No.62572122,No.62572099）

电子学报

OACSCD

ISSN：0372-2112

访问量0

下载量0

段落导航