Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Information Science and Technology / Faculty of Information Science and Technology >
Peer-reviewed Journal Articles, etc >
Algorithms for Adversarial Bandit Problems with Multiple Plays
Title: | Algorithms for Adversarial Bandit Problems with Multiple Plays |
Authors: | Uchiya, Taishi Browse this author | Nakamura, Atsuyoshi Browse this author →KAKEN DB | Kudo, Mineichi Browse this author |
Keywords: | multi-armed bandit problem | adversarial bandit problem | online learning |
Issue Date: | 2010 |
Publisher: | Springer Berlin / Heidelberg |
Citation: | Algorithmic Learning Theory (21st International Conference, ALT 2010, Canberra, Australia, October 6-8, 2010. Proceedings), ed. by Marcus Hutter; Frank Stephan; Vladimir Vovk; Thomas Zeugmann, ISBN: 978-3-642-16107-0, (Lecture Notes in Computer Science; 6331/2010), pp. 375-389 |
Publisher DOI: | 10.1007/978-3-642-16108-7_30 |
Abstract: | Adversarial bandit problems studied by Auer et al. [4] are multi-armed bandit problems in which no stochastic assumption is made on the nature of the process generating the rewards for actions. In this paper, we extend their theories to the case where k(≥ 1) distinct actions are selected at each time step. As algorithms to solve our problem, we analyze an extension of Exp3 [4] and an application of a bandit online linear optimization algorithm [1] in addition to two existing algorithms (Exp3, ComBand [5]) in terms of time and space efficiency and the regret for the best fixed action set. The extension of Exp3, called Exp3. M, performs best with respect to both the measures: it runs in O(K(log k + 1)) time and O(K) space, and suffers at most O(√kTK log(K/k)) regret, where K is the number of possible actions and T is the number of iterations. The upper bound of the regret we proved for Exp3. M is an extension of that proved by Auer et al. for Exp 3. |
Rights: | The original publication is available at www.springerlink.com |
Type: | bookchapter (author version) |
URI: | http://hdl.handle.net/2115/47057 |
Appears in Collections: | 情報科学院・情報科学研究院 (Graduate School of Information Science and Technology / Faculty of Information Science and Technology) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)
|
Submitter: 中村 篤祥
|