HUSCAP logo Hokkaido Univ. logo

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Information Science and Technology / Faculty of Information Science and Technology >
Peer-reviewed Journal Articles, etc >

Algorithms for Adversarial Bandit Problems with Multiple Plays

Files in This Item:
LNCS6331_375-389.pdf197.84 kBPDFView/Open
Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/47057

Title: Algorithms for Adversarial Bandit Problems with Multiple Plays
Authors: Uchiya, Taishi Browse this author
Nakamura, Atsuyoshi Browse this author →KAKEN DB
Kudo, Mineichi Browse this author
Keywords: multi-armed bandit problem
adversarial bandit problem
online learning
Issue Date: 2010
Publisher: Springer Berlin / Heidelberg
Citation: Algorithmic Learning Theory (21st International Conference, ALT 2010, Canberra, Australia, October 6-8, 2010. Proceedings), ed. by Marcus Hutter; Frank Stephan; Vladimir Vovk; Thomas Zeugmann, ISBN: 978-3-642-16107-0, (Lecture Notes in Computer Science; 6331/2010), pp. 375-389
Publisher DOI: 10.1007/978-3-642-16108-7_30
Abstract: Adversarial bandit problems studied by Auer et al. [4] are multi-armed bandit problems in which no stochastic assumption is made on the nature of the process generating the rewards for actions. In this paper, we extend their theories to the case where k(≥ 1) distinct actions are selected at each time step. As algorithms to solve our problem, we analyze an extension of Exp3 [4] and an application of a bandit online linear optimization algorithm [1] in addition to two existing algorithms (Exp3, ComBand [5]) in terms of time and space efficiency and the regret for the best fixed action set. The extension of Exp3, called Exp3. M, performs best with respect to both the measures: it runs in O(K(log k + 1)) time and O(K) space, and suffers at most O(√kTK log(K/k)) regret, where K is the number of possible actions and T is the number of iterations. The upper bound of the regret we proved for Exp3. M is an extension of that proved by Auer et al. for Exp 3.
Rights: The original publication is available at www.springerlink.com
Type: bookchapter (author version)
URI: http://hdl.handle.net/2115/47057
Appears in Collections:情報科学院・情報科学研究院 (Graduate School of Information Science and Technology / Faculty of Information Science and Technology) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Submitter: 中村 篤祥

Export metadata:

OAI-PMH ( junii2 , jpcoar_1.0 )

MathJax is now OFF:


 

 - Hokkaido University