An online minimax optimal algorithm for adversarial multiarmed bandit problem

Gökçesu, KaanKozat, Süleyman Serdar2019-02-212019-02-2120182162-237Xhttp://hdl.handle.net/11693/50275We investigate the adversarial multiarmed bandit problem and introduce an online algorithm that asymptotically achieves the performance of the best switching bandit arm selection strategy. Our algorithms are truly online such that we do not use the game length or the number of switches of the best arm selection strategy in their constructions. Our results are guaranteed to hold in an individual sequence manner, since we have no statistical assumptions on the bandit arm losses. Our regret bounds, i.e., our performance bounds with respect to the best bandit arm selection strategy, are minimax optimal up to logarithmic terms. We achieve the minimax optimal regret with computational complexity only log-linear in the game length. Thus, our algorithms can be efficiently used in applications involving big data. Through an extensive set of experiments involving synthetic and real data, we demonstrate significant performance gains achieved by the proposed algorithm with respect to the state-of-the-art switching bandit algorithms. We also introduce a general efficiently implementable bandit arm selection framework, which can be adapted to various applications.EnglishAdversarial multiarmed banditBig dataIndividual sequence mannerMinimax optimalSwitching banditAn online minimax optimal algorithm for adversarial multiarmed bandit problemArticle10.1109/TNNLS.2018.28060062162-2388