计算机科学与探索2024,Vol.18Issue(8):2169-2179,11.DOI:10.3778/j.issn.1673-9418.2307034
基于潜在状态分布GPT的离线多智能体强化学习方法
Offline Multi-agent Reinforcement Learning Method Based on Latent State Dis-tribution GPT
摘要
Abstract
Offline pre-training of the basic model through decision Transformer can effectively solve the problems of low sampling efficiency and scalability of online multi-agent reinforcement learning,but this generative pre-training Transformer method performs poorly in multi-agent tasks where individual rewards are difficult to define and the dataset cannot cover the optimal strategy.To solve this problem,a multi-agent reinforcement learning algorithm inte-grating offline pre-training and online fine-tuning is proposed by using latent state distribution to improve the deci-sion Transformer.The algorithm uses autoencoder and one-hot coding methods to generate discrete latent state repre-sentations,which retain some important information in the original state space.The decision Transformer of genera-tive pre-training is improved through latent temporary abstraction,similar to the data gain technique,which solves the problem of extrapolation error caused by offline datasets that do not fully cover the state space to a certain ex-tent.Centralized training and decentralized execution are used to solve the reliability distribution problem of agents during online fine-tuning.Through the multi-agent policy gradient algorithm that encourages exploration,the collab-orative strategy is further explored in downstream tasks.Finally,experiments are carried out on the StarCraft simula-tion platform,and compared with the baseline algorithm,the scores are higher and the generalization ability is stron-ger in tasks with less or even no offline trajectory data.关键词
离线多智能体强化学习/分布式学习/表示学习/大语言模型Key words
offline multi-agent reinforcement learning/distributed learning/representation learning/large language model分类
信息技术与安全科学引用本文复制引用
盛蕾,陈希亮,赖俊..基于潜在状态分布GPT的离线多智能体强化学习方法[J].计算机科学与探索,2024,18(8):2169-2179,11.基金项目
国家自然科学基金(61806221). This work was supported by the National Natural Science Foundation of China(61806221). (61806221)