Risk-averse allocation indices for multiarmed bandit problem

Date

2021-01-25

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

IEEE Transactions on Automatic Control

Print ISSN

0018-9286

Electronic ISSN

1558-2523

Publisher

IEEE

Volume

66

Issue

11

Pages

5522 - 5529

Language

English

Journal Title

Journal ISSN

Volume Title

Series

Abstract

In classical multiarmed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision-maker is risk-neutral. On the other hand, the decision-makers are risk-averse in some real-life applications. In this article, we design a new setting based on the concept of dynamic risk measures where the aim is to find a policy with the best risk-adjusted total discounted outcome. We provide a theoretical analysis of multiarmed bandit problem with respect to this novel setting and propose a priority-index heuristic which gives risk-averse allocation indices having a structure similar to Gittins index. Although an optimal policy is shown not always to have index-based form, empirical results express the excellence of this heuristic and show that with risk-averse allocation indices we can achieve optimal or near-optimal interpretable policies.

Course

Other identifiers

Book Title

Citation