| 注册
首页|期刊导航|自动化学报|基于行为预测和策略融合的轨道博弈决策方法

基于行为预测和策略融合的轨道博弈决策方法

王英杰 袁利 黄煌 耿远卓

自动化学报2026,Vol.52Issue(3):451-462,12.
自动化学报2026,Vol.52Issue(3):451-462,12.DOI:10.16383/j.aas.c250268

基于行为预测和策略融合的轨道博弈决策方法

A Decision Method for Orbital Game Based on Behavior Prediction and Strategy Fusion

王英杰 1袁利 2黄煌 1耿远卓1

作者信息

  • 1. 北京控制工程研究所 北京 100094||空间智能控制技术全国重点实验室 北京 100094
  • 2. 空间智能控制技术全国重点实验室 北京 100094||中国空间技术研究院 北京 100094
  • 折叠

摘要

Abstract

The high uncertainty and behavioral diversity of evasion strategies in the orbital pursuit-evasion game pose significant challenges to the generalization capability of pursuit strategies.Although deep reinforcement learn-ing can enhance the pursuer's performance,the policy network often produces suboptimal or even invalid decisions when facing evasion strategies that deviate from the training distribution.To address this issue,this paper pro-poses a decision method for orbital game based on behavior prediction and strategy fusion,named predictor-actor-critic with fusion.During the training phase,a set of diverse evasion strategies is modeled using a prediction-guided approach combined with the artificial potential field method.Based on the traditional actor-critic framework,a pre-dictor-actor-critic algorithm is developed by introducing a prediction network,and a corresponding pursuit sub-policy is trained for each type of evasion strategy.The prediction network estimates the evader's actions,and the similarity between predicted and actual actions is used to quantify the matching degree between each sub-policy and the unknown evasion strategy.During the execution phase,the fusion module takes the evader's historical ac-tions and pursuit sub-policies'prediction outputs as input,dynamically evaluates matching degree,and selects the most appropriate sub-policy for decision-making.Experimental results demonstrate that the prediction network ef-fectively evaluates the adaptability of sub-policy to unknown evasion strategies,and the fusion module significantly enhances the generalization capability and reliability of the pursuer when confronted with diverse evasion strategies.

关键词

轨道追逃博弈/深度强化学习/行为预测/策略融合

Key words

orbital pursuit-evasion game/deep reinforcement learning/behavior prediction/strategy fusion

引用本文复制引用

王英杰,袁利,黄煌,耿远卓..基于行为预测和策略融合的轨道博弈决策方法[J].自动化学报,2026,52(3):451-462,12.

基金项目

国家自然科学基金(62303047,U21B6001),空间智能控制技术全国重点实验室开放基金课题(2024-CXPT-GF-JJ-012-05)资助Supported by National Natural Science Foundation of China(62303047,U21B6001)and National Key Laboratory of Space In-telligent Control(2024-CXPT-GF-JJ-012-05) (62303047,U21B6001)

自动化学报

0254-4156

访问量0
|
下载量0
段落导航相关论文