ベイジアンネットを利用した強化学習エージェントの方策改善

北越, 大輔; 塩谷, 浩之; 栗原, 正仁


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Information Science and Technology / Faculty of Information Science and Technology >
Peer-reviewed Journal Articles, etc >

ベイジアンネットを利用した強化学習エージェントの方策改善

Files in This Item:

kitakoshi2003ipsj-final.pdf

714.42 kB

PDF

View/Open

Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/14563

Title:	ベイジアンネットを利用した強化学習エージェントの方策改善
Other Titles:	An Improvement of Reinforcement Learning Agent’s Policy by using a Bayesian Network
Authors:	北越, 大輔¹ Browse this author
	塩谷, 浩之² Browse this author
	栗原, 正仁³ Browse this author →KAKEN DB
Authors(alt):	Kitakoshi, Daisuke¹
	Shioya, Hiroyuki²
	Kurihara, Masahito³
Issue Date:	Nov-2003
Publisher:	情報処理学会
Journal Title:	情報処理学会論文誌
Volume:	44
Issue:	11
Start Page:	2884
End Page:	2894
Abstract:	機械学習の一つである強化学習は，報酬を利用して方策を最適化することで，エージェントを環境に適応させることを目的とする．本論文では，強化学習エージェントが得た知識を利用して，方策を改善する手法を提案する．我々はエージェントの知識として確率モデルの一つであるベイジアンネットを用い，その構造は，学習中のエージェントの入出力系列，および報酬をサンプルデータとした情報理論的モデル選択手法によって構築される．本研究において構築されるベイジアンネットは，エージェントの入出力と報酬についての確率的依存関係を表現する．本手法におけるエージェントの方策は，ベイジアンネットの構造（確率的知識）を利用した教師あり学習によって改善される．確率的知識を用いた方策の改善機構を導入することで，強化学習エージェントはより効率的な方策の獲得を可能とする．提案手法の特徴について議論するため，エージェント追跡問題を取り上げて計算機実験を行う．さらに，ベイジアンネットシステムによるエージェントの方策情報表現についても論じる．
Abstract:	Reinforcement learning is a kind of machine learning. It aims to optimize an agent’s policy by adapting the agent to a given environment according to rewards. In this paper, we propose a method for improving policies by using knowledge, in which reinforcement learning agents obtain. We use a Bayesian Network as knowledge of an agent. Its structure is decided by a model selection method based on information theory using series of an agent’s inputoutput and rewards as sample data. A Bayesian Network constructed in our study represents stochastic dependences between input-output and rewards. In our proposed method, policies are improved by supervised learning using the structure of Bayesian Network (i.e. stochastic knowledge). Introducing the mechanism of improving policies makes reinforcement learning agents acquire more effective policies. We carry out simulations in the pursuit problem in order to discuss the characteristics of our proposed method. Furthermore, we discuss the information about agents' policies represented by the Bayesian Network system.
Rights:	ここに掲載した著作物の利用に関する注意: 本著作物の著作権は（社）情報処理学会に帰属します。本著作物は著作権者である情報処理学会の許可のもとに掲載するものです。ご利用に当たっては「著作権法」ならびに「情報処理学会倫理綱領」に従うことをお願いいたします。
Rights:	The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). This material is published on this web site with the agreement of the author (s) and the IPSJ. Please be complied with Copyright Law of Japan and the Code of Ethics of the IPSJ if any users wish to reproduce, make derivative work, distribute or make available to the public any part or whole thereof.
Relation:	http://www.ipsj.or.jp/
Relation:	http://www.ipsj.or.jp/01kyotsu/chosakuken/copyright.html
Type:	article (author version)
URI:	http://hdl.handle.net/2115/14563
Appears in Collections:	情報科学院・情報科学研究院 (Graduate School of Information Science and Technology / Faculty of Information Science and Technology) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Submitter: 栗原正仁

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University