首页|期刊导航|重庆理工大学学报|结合A2C和手牌估值方法的麻将博弈研究

结合A2C和手牌估值方法的麻将博弈研究

衣御寒王亚杰吴燕燕刘松张兴慧蒋传禹

重庆理工大学学报2024，Vol.38Issue(9)：154-161,8.

重庆理工大学学报2024，Vol.38Issue(9)：154-161,8.DOI:10.3969/j.issn.1674-8425(z).2024.05.020

结合A2C和手牌估值方法的麻将博弈研究

Research on mahjong game combining A2C with hand value evaluation method

衣御寒 ¹王亚杰 ¹吴燕燕 ¹刘松 ¹张兴慧 ¹蒋传禹¹

作者信息

1. 沈阳航空航天大学工程训练中心, 沈阳 110136
折叠

摘要

Abstract

To address the underutilizing hand information in popular mahjong, this paper designs a hand valuation method and a basic mahjong program ( MJE) .Mahjong AI ( MJE-RL) is designed by using the deep reinforcement learning approach to further improve its gaming ability.First, the training data of deep learning is generated by MJE' s self-play.Second, the best model is selected as the pre-training model of reinforcement learning, according to the results of training set, test set and comparison experiment.Finally, the Advantage Actor-Critic ( A2C) model is employed as the main framework of reinforcement learning.The well-trained deep learning model is used as the Actor to make decisions, and the game ability of mahjong AI is constantly improved by playing between MJE-RL and MJE.Our experimental results indicate the winning rate of MJE-RL is 4 .08％ higher than that of MJE and the rate of Win by Discard is 3.02％ lower than that of MJE.Meanwhile, it is shown that MJE-RL markedly improves both offensive and defensive fronts, demonstrating improved overall strength of mahjong AI.

关键词

麻将/非完备信息/深度强化学习/A2C

Key words

popular mahjong/incomplete information/deep reinforcement learning/A2C

分类

信息技术与安全科学

引用本文复制引用

衣御寒,王亚杰,吴燕燕,刘松,张兴慧,蒋传禹..结合A2C和手牌估值方法的麻将博弈研究[J].重庆理工大学学报,2024,38(9):154-161,8.

基金项目

辽宁省兴辽英才计划项目(XLYC1906003) （XLYC1906003）

重庆理工大学学报

OA北大核心

ISSN：1674-8425

访问量8

下载量0

段落导航