| 注册
首页|期刊导航|计算机与现代化|融合策略价值网络的高效棋类游戏算法

融合策略价值网络的高效棋类游戏算法

周毅 田永谌 邱宇峰 高华

计算机与现代化Issue(1):86-93,8.
计算机与现代化Issue(1):86-93,8.DOI:10.3969/j.issn.1006-2475.2025.01.014

融合策略价值网络的高效棋类游戏算法

Efficient Board Games Algorithm with Integrated Strategy Value Network

周毅 1田永谌 1邱宇峰 2高华1

作者信息

  • 1. 武汉科技大学信息科学与工程学院,湖北 武汉 430081
  • 2. 宝信软件(武汉)有限公司,湖北 武汉 430080
  • 折叠

摘要

Abstract

Board games always have been a focus of deep reinforcement learning research due to their complex board configura-tions and rules,which require a lot of time to find optimal solutions.Current algorithms for chess games use action probability distribution-based methods for action selection during self-play,which leads to inefficient exploration and exploitation.They also require separate neural network computations for strategy and value,resulting in low sample usage and long training times.This paper proposes an efficient chess game algorithm that combines strategy-value networks,replacing the original action selec-tion method with the Geng-Bellman maximum value method.It balances exploration and exploitation in action search using ε-greedy and simulated annealing algorithms.Experimental results show that compared to various classical chess game algorithms,the proposed algorithm achieves a win rate of over 90%against traditional algorithms.Moreover,using Gumbel-max method dur-ing training leads to significantly higher Elo ratings compared to traditional action selection methods with low Monte Carlo simula-tion counts.With training reaching 3000 Elo ratings,the proposed algorithm can save 50%of the time.

关键词

棋类游戏/蒙特卡洛树搜索/耿贝尔最大值方法/ε-greedy算法/模拟退火算法

Key words

board games/Monte Carlo tree search/Gumbel-max method/ε-greedy algorithm/simulated annealing algorithm

分类

信息技术与安全科学

引用本文复制引用

周毅,田永谌,邱宇峰,高华..融合策略价值网络的高效棋类游戏算法[J].计算机与现代化,2025,(1):86-93,8.

基金项目

国家自然科学基金资助项目(62372343) (62372343)

计算机与现代化

1006-2475

访问量0
|
下载量0
段落导航相关论文