| 注册
首页|期刊导航|哈尔滨工业大学学报(英文版)|Multiagent reinforcement learning through merging individually learned value functions

Multiagent reinforcement learning through merging individually learned value functions

ZHANG Hua-xiang HUANG Shang-teng

哈尔滨工业大学学报(英文版)2005,Vol.12Issue(3):346-350,5.
哈尔滨工业大学学报(英文版)2005,Vol.12Issue(3):346-350,5.

Multiagent reinforcement learning through merging individually learned value functions

Multiagent reinforcement learning through merging individually learned value functions

ZHANG Hua-xiang 1HUANG Shang-teng2

作者信息

  • 1. Information and Management School, Shandong Normal University, Jinan 250014,China
  • 2. Dept. of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200030, China
  • 折叠

摘要

Abstract

In cooperative multiagent systems, to learn the optimal policies of multiagents is very difficult. As the numbers of states and actions increase exponentially with the number of agents, their action policies become more intractable. By learning these value functions, an agent can learn its optimal action policies for a task. If a task can be decomposed into several subtasks and the agents have learned the optimal value functions for each subtask, this knowledge can be helpful for the agents in learning the optimal action policies for the whole task when they are acting simultaneously. When merging the agents' independently learned optimal value functions,a novel multiagent online reinforcement learning algorithm LU-Q is proposed. By applying a transformation to the individually learned value functions, the constraints on the optimal value functions of each subtask are loosened. In each learning iteration process in algorithm LU-Q, the agents ' joint action set in a state is processed. Some actions of that state are pruned from the available action set according to the defined multiagent value function in LU-Q. As the items of the available action set of each state are reduced gradually in the iteration process of LU-Q, the convergence of the value functions is accelerated. LU-Q's effectiveness, soundness and convergence are analyzed, and the experimental results show that the learning performance of LU-Q is better than the performance of standard Q learning.

关键词

reinforcement learning/multiagent/value function

Key words

reinforcement learning/multiagent/value function

分类

信息技术与安全科学

引用本文复制引用

ZHANG Hua-xiang,HUANG Shang-teng..Multiagent reinforcement learning through merging individually learned value functions[J].哈尔滨工业大学学报(英文版),2005,12(3):346-350,5.

哈尔滨工业大学学报(英文版)

1005-9113

访问量2
|
下载量0
段落导航相关论文