|国家科技期刊平台
首页|期刊导航|中国空间科学技术(中英文)|回合制轨道博弈中MCTS算法的改进与应用

回合制轨道博弈中MCTS算法的改进与应用OA北大核心CSTPCD

Improvement and application of MCTS in turn-based orbital games

中文摘要英文摘要

航天器回合制追逃博弈中的变轨感知延迟使得微分对策法求解困难,基于深度强化学习的博弈算法可解释性弱,在工程上的运用仍存在风险.针对航天器回合制追逃博弈问题,提出了一种预测价值积累的蒙特卡洛树搜索(PVA-MCTS)算法.该算法基于航天器轨道运动的可预知性,对博弈过程中的决策价值进行预测并积累,解决了航天器回合制追逃博弈奖励稀疏、时间跨度大的问题,采用的自适应扩展方法提升了学习效率.将其用于求解航天器回合制追逃博弈问题,并与蒙特卡洛树搜索(MCTS)算法求解得到的结果对比,结果表明PVA-MCTS算法对追踪航天器和逃逸航天器分别有约27.6%的追捕用时缩短和约6.8%的逃逸时间延长.该算法的提出可加快推进后续轨道博弈技术在非合作目标接近、碰撞规避等领域应用的落实落地.

The sensing delay of orbit change in turn-based orbit pursuit-evasion game brings difficulties to differential game approaches,and deep reinforcement learning-based algorithms are still risky for engineering applications due to the inexplicability.The predictive-value-accumulate Monte Carlo tree search(PVA-MCTS)algorithm is proposed for the turn-based orbit pursuit-evasion game.Based on the predictability of spacecraft orbital motion,the algorithm predicts and accumulates the decision value in the game.This solves the problem of sparse reward and large time span in the turn-based orbit pursuit-evasion game,and improves the learning efficiency.It is used to solve the turn-based orbit pursuit-evasion game,and compared with the results obtained by Monte Carlo tree search(MCTS)algorithm.The results show that the PVA-MCTS algorithm reduces the pursuit time by about 27.6%and increases the escape time by about 6.8%for pursuer and evader respectively.The PVA-MCTS algorithm is realistic for the application of orbital game in the fields of non-cooperative target approaching and collision avoidance.

郑鑫宇;张轶;周杰;唐佩佳;彭升人;党朝辉

中国空间技术研究院 钱学森空间技术实验室,北京 100094西北工业大学 航天学院,西安 710072

航天器追逃回合制追逃博弈蒙特卡洛树搜索变轨感知延迟预测价值积累

pursuit-evasion of spacecraftturn-based pursuit-evasion gameMonte Carlo tree searchsensing delay of orbit changepredictive value accumulate

《中国空间科学技术(中英文)》 2024 (005)

75-82 / 8

国家自然科学基金(12172288)

10.16708/j.cnki.1000-758X.2024.0075

评论