Algorithms for Adversarial Bandit Problems with Multiple Plays

Uchiya, Taishi; Nakamura, Atsuyoshi; Kudo, Mineichi

doi:10.1007/978-3-642-16108-7_30


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Information Science and Technology / Faculty of Information Science and Technology >
Peer-reviewed Journal Articles, etc >

Algorithms for Adversarial Bandit Problems with Multiple Plays

Files in This Item:

LNCS6331_375-389.pdf

197.84 kB

PDF

View/Open

Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/47057

Title:	Algorithms for Adversarial Bandit Problems with Multiple Plays
Authors:	Uchiya, Taishi Browse this author
	Nakamura, Atsuyoshi Browse this author →KAKEN DB
	Kudo, Mineichi Browse this author
Keywords:	multi-armed bandit problem
	adversarial bandit problem
	online learning
Issue Date:	2010
Publisher:	Springer Berlin / Heidelberg
Citation:	Algorithmic Learning Theory (21st International Conference, ALT 2010, Canberra, Australia, October 6-8, 2010. Proceedings), ed. by Marcus Hutter; Frank Stephan; Vladimir Vovk; Thomas Zeugmann, ISBN: 978-3-642-16107-0, (Lecture Notes in Computer Science; 6331/2010), pp. 375-389
Publisher DOI:	10.1007/978-3-642-16108-7_30
Abstract:	Adversarial bandit problems studied by Auer et al. [4] are multi-armed bandit problems in which no stochastic assumption is made on the nature of the process generating the rewards for actions. In this paper, we extend their theories to the case where k(≥ 1) distinct actions are selected at each time step. As algorithms to solve our problem, we analyze an extension of Exp3 [4] and an application of a bandit online linear optimization algorithm [1] in addition to two existing algorithms (Exp3, ComBand [5]) in terms of time and space efficiency and the regret for the best fixed action set. The extension of Exp3, called Exp3. M, performs best with respect to both the measures: it runs in O(K(log k + 1)) time and O(K) space, and suffers at most O(√kTK log(K/k)) regret, where K is the number of possible actions and T is the number of iterations. The upper bound of the regret we proved for Exp3. M is an extension of that proved by Auer et al. for Exp 3.
Rights:	The original publication is available at www.springerlink.com
Type:	bookchapter (author version)
URI:	http://hdl.handle.net/2115/47057
Appears in Collections:	情報科学院・情報科学研究院 (Graduate School of Information Science and Technology / Faculty of Information Science and Technology) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Submitter: 中村篤祥

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University