首页|期刊导航|控制理论与应用|深度强化学习进展:从AlphaGo到AlphaGo Zero

深度强化学习进展:从AlphaGo到AlphaGo Zero

唐振韬邵坤赵冬斌朱圆恒

控制理论与应用2017，Vol.34Issue(12)：1529-1546,18.

控制理论与应用2017，Vol.34Issue(12)：1529-1546,18.DOI:10.7641/CTA.2017.70808

深度强化学习进展:从AlphaGo到AlphaGo Zero

Recent progress of deep reinforcement learning:from AlphaGo to AlphaGo Zero

唐振韬 ¹邵坤 ²赵冬斌 ³朱圆恒³

作者信息

1. 中国科学院自动化研究所复杂系统管理与控制国家重点实验室,北京100190
2. 中国科学院大学,北京100190
折叠

摘要

Abstract

In the early 2016,the defeat of Lee Sedol by AlphaGo became the milestone of artificial intelligence.Since then,deep reinforcement learning(DRL),which is the core technique of AlphaGo,has received widespread attention,and has gained fruitful results in both theory and applications.In the sequel,AlphaGo Zero,a simplified version of AlphaGo, masters the game of Go by self-play without human knowledge.As a result,AlphaGo Zero completely surpasses AlphaGo, and enriches humans'understanding of DRL.DRL combines the advantages of deep learning and reinforcement learning, so it is able to perform well in high-dimensional state-action space, with an end-to-end structure combining perception and decision together.In this paper, we present a survey on the remarkable process made by DRL from AlphaGo to AlphaGo Zero.We first review the main algorithms that contribute to the great success of DRL,including DQN,A3C, policy-gradient,and other algorithms and their extensions.Then,detailed introduction and discussion on AlphaGo Zero are given and its great promotion on artificial intelligence is also analyze.The progress of applications with DRL in such areas as games,robotics,natural language processing,smart driving,intelligent health care,and related resources are also presented.In the end,we discuss the future development of DRL,and the inspiration on other potential areas related to artificial intelligence.

关键词

深度强化学习/AlphaGoZero/深度学习/强化学习/人工智能

Key words

deep reinforcement learning/AlphaGo Zero/deep learning/reinforcement learning/artificial intelligence

分类

信息技术与安全科学

引用本文复制引用

唐振韬,邵坤,赵冬斌,朱圆恒..深度强化学习进展:从AlphaGo到AlphaGo Zero[J].控制理论与应用,2017,34(12):1529-1546,18.

基金项目

国家自然科学基金项目(61603382,61573353,61533017)资助.Supported by the National Natural Science Foundation of China(61603382,61573353,61533017). （61603382,61573353,61533017）

控制理论与应用

OA北大核心CSCDCSTPCD

ISSN：1000-8152

访问量6

下载量0

段落导航