通信学报2024,Vol.45Issue(5):101-114,14.DOI:10.11959/j.issn.1000-436x.2024100
基于行为克隆的高通量卫星通信频谱资源分配
Spectrum resource allocation for high-throughput satellite communications based on behavior cloning
摘要
Abstract
In high-throughput multi-beam satellite systems,the dimensionality of the spectrum resource allocation prob-lem increased drastically with the number of satellite beams and service users,which caused an exponential rise in the complexity of the solution.To address the challenge,a two-stage algorithm that combined behavior cloning(BC)with deep reinforcement learning(DRL)was proposed.In the first stage,the strategy network was pretrained using existing decision data from satellite operation through behavior cloning,which mimicked expert behavior to reduce blind explora-tion and accelerate algorithm convergence.In the second stage,the strategy network was further optimized using the proximal policy optimization(PPO),and a convolutional block attention module(CBAM)was employed to better extract the user traffic features,thereby enhancing overall algorithm performance.Simulation results demonstrate that the pro-posed algorithm outperforms the benchmark algorithms in terms of convergence speed and algorithm stability,and also delivers superior performance in system delay,average system satisfaction,and spectrum efficiency.关键词
高通量卫星/行为克隆/深度强化学习/近端策略优化/卷积注意力模块Key words
high-throughput satellite/behavior cloning/deep reinforcement learning/proximal policy optimization/con-volutional block attention module分类
信息技术与安全科学引用本文复制引用
秦浩,李双益,赵迪,孟昊炜,宋彬..基于行为克隆的高通量卫星通信频谱资源分配[J].通信学报,2024,45(5):101-114,14.基金项目
国家自然科学基金资助项目(No.62071354,No.62201419) (No.62071354,No.62201419)
陕西省重点研发计划基金资助项目(No.2022ZDLGY05-08) The National Natural Science Foundation of China(No.62071354,No.62201419),The Key Research and Devel-opment Program of Shaanxi Province(No.2022ZDLGY05-08) (No.2022ZDLGY05-08)