Index policy for multiarmed bandit problem with dynamic risk measures

Malekipirbazari, Milad; Çavus, Özlem

Index policy for multiarmed bandit problem with dynamic risk measures

Limited Access

This item is unavailable until:
2025-08-06

Files

Index_policy_for_multiarmed_bandit_problem_with_dynamic_risk_measures.pdf (1.42 MB)

Date

2023-08-06

Authors

Malekipirbazari, Milad

Çavus, Özlem

BUIR Usage Stats

31
views

11
downloads

Citation Stats

Abstract

The multiarmed bandit problem (MAB) is a classic problem in which a finite amount of resources must be allocated among competing choices with the aim of identifying a policy that maximizes the expected total reward. MAB has a wide range of applications including clinical trials, portfolio design, tuning parameters, internet advertisement, auction mechanisms, adaptive routing in networks, and project management. The classical MAB makes the strong assumption that the decision maker is risk-neutral and indifferent to the variability of the outcome. However, in many real life applications, these assumptions are not met and decision makers are risk-averse. Motivated to resolve this, we study risk-averse control of the multiarmed bandit problem in regard to the concept of dynamic coherent risk measures to determine a policy with the best risk-adjusted total discounted return. In respect of this specific setting, we present a theoretical analysis based on Whittle’s retirement problem and propose a priority-index policy that reduces to the Gittins index when the level of risk-aversion converges to zero. We generalize the restart formulation of the Gittins index to effectively compute these risk-averse allocation indices. Numerical results exhibit the excellent performance of this heuristic approach for two well-known coherent risk measures of first-order mean-semideviation and mean-AVaR. Our experimental studies suggest that there is no guarantee that an index-based optimal policy exists for the risk-averse problem. Nonetheless, our risk-averse allocation indices can achieve optimal or near-optimal policies which in some instances are easier to interpret compared to the exact optimal policy.

Source Title

European Journal of Operational Research

Publisher

Elsevier BV

Keywords

Stochastic programming, Multiarmed bandit problem, Gittins index, Dynamic coherent risk measures, Risk-averse control

Permalink

https://hdl.handle.net/11693/114590

Published Version (Please cite this version)

https://doi.org/10.1016/j.ejor.2023.08.004

Rights

https://creativecommons.org/licenses/by-nc-nd/4.0/

Collections

Scholarly Publications - Industrial Engineering

Language

en

Type

Article

Full item page

Index policy for multiarmed bandit problem with dynamic risk measures

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Rights

Collections

Language

Type

Index policy for multiarmed bandit problem with dynamic risk measures

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Rights

Collections

Language

Type