|国家科技期刊平台
首页|期刊导航|电子科技大学学报|深度强化学习下连续和离散相位RIS毫米波通信

深度强化学习下连续和离散相位RIS毫米波通信OACSTPCD

Continuous vs Discrete:Phase Performance Comparison of RIS-Assisted Millimeter Wave Communication Based on Deep Reinforcement Learning

中文摘要英文摘要

在分布式智能反射面(RIS)辅助多用户毫米波(mmWave)系统中,利用深度强化学习(DRL)理论学习并调整基站发射波束赋形矩阵和RIS相位偏转矩阵,联合优化发射波束赋形和相位偏转,实现加权和速率最大化.即在离散动作空间中,设计了功率码本与相位码本,提出了用深度Q网络(DQN)算法进行优化发射波束赋形与RIS相位偏转矩阵;在连续动作空间中,采用双延迟策略梯度(TD3)算法进行优化发射波束赋形与RIS相位偏转矩阵.仿真分析比较了在不同码本比特数下离散动作空间和连续动作空间下系统的加权和速率.与传统的凸优化算法以及迫零波束赋形随机相位偏转算法进行了对比,强化学习算法的和速率性能有明显提升,连续的TD3算法的和速率超过凸优化算法 23.89%,在码本比特数目为 4时,离散的DQN算法性能也优于传统的凸优化算法.

In this paper,in the distributed Reconfigurable Intelligence Surface(RIS)assisted multi-user millimeter wave(mmWave)system,the deep reinforcement learning(DRL)theory is used to learn and adjust transmit beamforming matrix at the base station and phase shift matrix at the RIS,and jointly optimize the transmit beamforming matrix and phase shift matrix to maximize the weighted sum-rate.Specifically,in the discrete action space,we first design the power codebook and the phase codebook,and propose the Deep Q Network(DQN)algorithm to optimize the beamforming matrix and phase shift matrix;then,in the continuous action space,the Twin Delayed Deep Deterministic(TD3)policy gradient algorithm is used to optimize the beamforming matrix and phase shift matrix.The weighted sum-rates of the system in discrete action space and continuous action space with different number of codebook bits are compare through simulation.In addition,compared with the traditional convex optimization algorithm and the zero-forcing precoding with a random PBF algorithm,the sum-rate performance of DRL algorithm is significantly improved,and the sum-rate of the continuous TD3 algorithm exceeds the convex optimization algorithm by 23.89%,and the performance of the discrete DQN algorithm exceeds the traditional convex optimization algorithm when the number of codebook bits is 4.

胡浪涛;杨瑞;刘全金;吴建岚;嵇文;吴磊

安庆师范大学电子工程与智能制造学院,安庆 246133

电子信息工程

深度Q网络(DQN)深度强化学习双延迟策略梯度毫米波智能反射面

deep Q network(DQN)deep reinforcement learningdelayed deep deterministic policy gradientmillimeter wavereconfigurable intelligence surface

《电子科技大学学报》 2024 (001)

面向可视化重建的多视光场图像压缩方法研究

50-59 / 10

国家自然科学基金(62171002);安徽省教育厅自然科学基金(KJ2020A0497)

10.12178/1001-0548.2022285

评论