Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex

Takada, Kei; Iizuka, Hiroyuki; Yamamoto, Masahito

doi:10.1109/TG.2019.2893343


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Information Science and Technology / Faculty of Information Science and Technology >
Peer-reviewed Journal Articles, etc >

Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex

Files in This Item:

takada_paper.pdf

725.13 kB

PDF

View/Open

Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/77885

Title:	Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex
Authors:	Takada, Kei Browse this author
	Iizuka, Hiroyuki Browse this author →KAKEN DB
	Yamamoto, Masahito Browse this author →KAKEN DB
Keywords:	Hex
	policy function
	reinforcement learning
	value function
Issue Date:	Mar-2020
Publisher:	IEEE (Institute of Electrical and Electronics Engineers)
Journal Title:	IEEE Transactions on Games
Volume:	12
Issue:	1
Start Page:	63
End Page:	73
Publisher DOI:	10.1109/TG.2019.2893343
Abstract:	Recently, the use of reinforcement-learning algorithms has been proposed to create value and policy functions, and their effectiveness has been demonstrated using Go, Chess, and Shogi. In previous studies, the policy function was trained to predict the search probabilities of each move output by Monte Carlo tree search; thus, a number of simulations were required to obtain the search probabilities. We propose a reinforcement-learning algorithm with game of self-play to create value and policy functions such that the policy function is trained directly from the game results without the search probabilities. In this study, we use Hex, a board game developed by Piet Hein, to evaluate the proposed method. We demonstrate the effectiveness of the proposed learning algorithm in terms of the policy function accuracy, and play a tournament with the proposed computer Hex algorithm DeepEZO and 2017 world-champion programs. The tournament results demonstrate that DeepEZO outperforms all programs. DeepEZO achieved a winning percentage of 79.3% against the world-champion program MoHex2.0 under the same search conditions on $13 \times 13$ board. We also show that the highly accurate policy functions can be created by training the policy functions to increase the number of moves to be searched in the loser position.
Rights:	© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Type:	article (author version)
URI:	http://hdl.handle.net/2115/77885
Appears in Collections:	情報科学院・情報科学研究院 (Graduate School of Information Science and Technology / Faculty of Information Science and Technology) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Submitter: 山本雅人

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University