中国空间科学技术(中英文)2024,Vol.44Issue(5):75-82,8.DOI:10.16708/j.cnki.1000-758X.2024.0075
回合制轨道博弈中MCTS算法的改进与应用
Improvement and application of MCTS in turn-based orbital games
摘要
Abstract
The sensing delay of orbit change in turn-based orbit pursuit-evasion game brings difficulties to differential game approaches,and deep reinforcement learning-based algorithms are still risky for engineering applications due to the inexplicability.The predictive-value-accumulate Monte Carlo tree search(PVA-MCTS)algorithm is proposed for the turn-based orbit pursuit-evasion game.Based on the predictability of spacecraft orbital motion,the algorithm predicts and accumulates the decision value in the game.This solves the problem of sparse reward and large time span in the turn-based orbit pursuit-evasion game,and improves the learning efficiency.It is used to solve the turn-based orbit pursuit-evasion game,and compared with the results obtained by Monte Carlo tree search(MCTS)algorithm.The results show that the PVA-MCTS algorithm reduces the pursuit time by about 27.6%and increases the escape time by about 6.8%for pursuer and evader respectively.The PVA-MCTS algorithm is realistic for the application of orbital game in the fields of non-cooperative target approaching and collision avoidance.关键词
航天器追逃/回合制追逃博弈/蒙特卡洛树搜索/变轨感知延迟/预测价值积累Key words
pursuit-evasion of spacecraft/turn-based pursuit-evasion game/Monte Carlo tree search/sensing delay of orbit change/predictive value accumulate分类
航空航天引用本文复制引用
郑鑫宇,张轶,周杰,唐佩佳,彭升人,党朝辉..回合制轨道博弈中MCTS算法的改进与应用[J].中国空间科学技术(中英文),2024,44(5):75-82,8.基金项目
国家自然科学基金(12172288) (12172288)