基于强化学习的光网络自适应高效率RWA算法(特邀)OA北大核心CSTPCD
A Reinforcement Learning based Adaptive and Efficient RWA in All Optical Networks
[目的]在光网络中,目前基于深度强化学习(DRL)的路由与波长分配(RWA)算法大多依赖于K最短路径(KSP)路由算法提前计算备选路径,这种方法缺乏灵活性和动态性,当网络拓扑变化时需要为所有节点对重新进行KSP计算.为了解决上述问题,文章提出了基于DRL的自适应高效率(ADE)-RWA框架.[方法]文章提出的 ADE-RWA框架的关键点及创新之处在于训练过程中,DRL的智能体在动作选择时从当前节点出发选择最优单步连接,而不是预先计算的完整路径,从而可以动态实时依据网络状态的变化做出路由策略的改变,以适应链路故障等网络拓扑改变的情况.ADE-RWA 框架的另一关键点在于DRL训练过程不仅训练其决策神经网络,并且每次成功分配一条路径时,都将其以路由表(LUT)的形式存储.在 DRL训练收敛后,网络路径的 LUT信息也将完整.此时使用 LUT 直接查找可用路径,可有效降低计算成本,提高 RWA 算法效率.此外,LUT查找与DRL训练能够实现动态切换.文章所提算法通过动态感知网络状态的变化,当网络故障引起拓扑变化时,切换到DRL阶段继续进行训练,保证了 LUT的动态更新.[结果]实验结果表明,与KSP-首次拟合(FF)和路由调制和频谱分配的深度学习(DeepRMSA)算法相比,文章所提 ADE-RWA 框架的阻塞率分别降低了 36%和 30%;且当发生链路故障时,ADE-RWA框架可以迅速适应网络拓扑的变化.[结论]文章提出了基于 DRL 的 ADE-RWA 框架,实现了动态光网络拓扑下的自适应 RWA,并降低了计算成本.
[Objective]Recent research efforts on Routing and Wavelength Assignment(RWA)for all optical networks are fo-cused on Deep Reinforcement Learning(DRL)based algorithms.The DRL based RWA algorithms are mostly rely on the K Shortest Paths(KSP)routing to calculate candidate paths in advance,hence the DRL agent can choose possible actions from the precomputed paths.These KSP based models lack of flexibility and dynamicity,since they need to re-calculate the KSP for all the node pairs once the topology changes occur.To address this issue,this paper proposes an Adaptive and Efficient(ADE)-RWA algorithm based on DRL.[Methods]The key points and innovations of the ADE-RWA lie in that during the training process,the DRL agent takes actions in a step-by-step way instead of selecting from the precomputed K complete paths.Therefore,the routing strategies are dynamically adjustable in training even under the case of topology changes.It is because that the actions are open for the agent to take without concerning the limitations of the K fixed paths.Moreover,the ADE-RWA records the successfully assigned routes during the training in a LookUp Table(LUT).The algorithm turns to LUT checking for finding the available routes once the DRL training is converged,since at that time the LUT has acquired enough information for the RWA from the DRL training.The LUT based routing can effectively reduce the computational costs and improve the efficiency of RWA.In addition,the DRL training phase and LUT routing phase are real-time switchable.The al-gorithm turns to the DRL training phase when a link failure caused topology change occurs,and turns back to LUT checking when the model training is converged again.[Results]Experimental results show that compared with KSP-First Fit(FF)and Deep Reinforcement Learning Framework for Routing,Modulation and Spectrum Assignment(DeepRMSA),the blocking probability of ADE-RWA is reduced by 36%and 30%respectively.When a link failure occurs,the algorithm can quickly adapt to the changes in network topology.[Conclusion]The proposed DRL based RWA framework ADE-RWA can achieve adaptive routing and wavelength allocation under dynamic network conditions with low computational cost.
刘兆洋;潘必韬
北京邮电大学 信息与通信工程学院,北京 100876
电子信息工程
波长路由光网络路由与波长分配深度强化学习数字孪生
wavelength-routed optical networkRWADRLdigital twin
《光通信研究》 2024 (005)
62-69 / 8
国家自然科学基金资助项目(6220010519)
评论