| 注册
首页|期刊导航|计算机科学与探索|基于潜在状态分布GPT的离线多智能体强化学习方法

基于潜在状态分布GPT的离线多智能体强化学习方法

盛蕾 陈希亮 赖俊

计算机科学与探索2024,Vol.18Issue(8):2169-2179,11.
计算机科学与探索2024,Vol.18Issue(8):2169-2179,11.DOI:10.3778/j.issn.1673-9418.2307034

基于潜在状态分布GPT的离线多智能体强化学习方法

Offline Multi-agent Reinforcement Learning Method Based on Latent State Dis-tribution GPT

盛蕾 1陈希亮 1赖俊1

作者信息

  • 1. 中国人民解放军陆军工程大学 指挥控制工程学院,南京 210007
  • 折叠

摘要

Abstract

Offline pre-training of the basic model through decision Transformer can effectively solve the problems of low sampling efficiency and scalability of online multi-agent reinforcement learning,but this generative pre-training Transformer method performs poorly in multi-agent tasks where individual rewards are difficult to define and the dataset cannot cover the optimal strategy.To solve this problem,a multi-agent reinforcement learning algorithm inte-grating offline pre-training and online fine-tuning is proposed by using latent state distribution to improve the deci-sion Transformer.The algorithm uses autoencoder and one-hot coding methods to generate discrete latent state repre-sentations,which retain some important information in the original state space.The decision Transformer of genera-tive pre-training is improved through latent temporary abstraction,similar to the data gain technique,which solves the problem of extrapolation error caused by offline datasets that do not fully cover the state space to a certain ex-tent.Centralized training and decentralized execution are used to solve the reliability distribution problem of agents during online fine-tuning.Through the multi-agent policy gradient algorithm that encourages exploration,the collab-orative strategy is further explored in downstream tasks.Finally,experiments are carried out on the StarCraft simula-tion platform,and compared with the baseline algorithm,the scores are higher and the generalization ability is stron-ger in tasks with less or even no offline trajectory data.

关键词

离线多智能体强化学习/分布式学习/表示学习/大语言模型

Key words

offline multi-agent reinforcement learning/distributed learning/representation learning/large language model

分类

信息技术与安全科学

引用本文复制引用

盛蕾,陈希亮,赖俊..基于潜在状态分布GPT的离线多智能体强化学习方法[J].计算机科学与探索,2024,18(8):2169-2179,11.

基金项目

国家自然科学基金(61806221). This work was supported by the National Natural Science Foundation of China(61806221). (61806221)

计算机科学与探索

OA北大核心CSTPCD

1673-9418

访问量0
|
下载量0
段落导航相关论文