无线电工程2026,Vol.56Issue(2):213-221,9.DOI:10.3969/j.issn.1003-3106.2026.02.003
HMAPPO:无线供能边缘计算网络长时吞吐量最大化方法
HMAPPO:Long-term Throughput Maximization Method for Wireless Powered Edge Computing Networks
摘要
Abstract
With the continuous deployment of Internet of Things(IoT)and 5G networks,the computational load and sustainable energy demand of edge sensor devices have increased significantly.By integrating Wireless Power Transfer(WPT)and Mobile Edge Computing(MEC),Wireless-Powered MEC(WP-MEC)provides a promising solution for extending the power supply lifetime of edge devices and enhancing overall system computing capability.However,previous works focus on single-time-slot resource optimization or single-cell network models,leading to low resource utilization efficiency and significant deviations from practical scenarios.To address this issue,the optimization of a multi-cell and multi-time-slot WP-MEC network based on Non-Orthogonal Multiple Access(NOMA)is focused by jointly optimizing energy transmission time,task offloading strategies,and power allocation,and energy accumulation gain to maximize the long-term system throughput is fully leveraged.To enable efficient resource scheduling in complex and dynamic networks,a Heterogeneous Multi-Agent Proximal Policy Optimization(HMAPPO)algorithm is proposed.By introducing a hierarchical structure with a controller agent and device agents,HMAPPO achieves cooperative optimization between global energy transfer time,local task offloading,and power allocation.Unlike value-function-based approaches such as Multi-Agent Soft Actor-Critic(MASAC)or Multi-Agent Twin Delayed Deep Deterministic Policy Gradient(MATD3),HMAPPO adopts a proximal policy optimization mechanism that constrains changes between successive policies.This makes it more suitable for multi-slot energy dynamics and continuous action spaces,thereby achieving higher training stability in WP-MEC networks.Simulation results demonstrate that the proposed algorithm achieves performance comparable to that of the centralized Proximal Policy Optimization(PPO)while realizing distributed optimization,with a performance gap of less than 3.3%.Moreover,the algorithm exhibits performance under varying conditions of different numbers of cells,devices,and device distances,verifying its superior generalization and scalability.关键词
移动边缘计算/无线供能网络/非正交多址接入/长时吞吐量优化/多智能体强化学习Key words
MEC/wireless powered network/NOMA/long-term throughout maximization/multi-agent reinforcement learning分类
信息技术与安全科学引用本文复制引用
郭羽婕,张志飞,张煜,刘彤,熊轲..HMAPPO:无线供能边缘计算网络长时吞吐量最大化方法[J].无线电工程,2026,56(2):213-221,9.基金项目
国家自然科学基金(62571028) National Natural Science Foundation of China(62571028) (62571028)