Risk-averse multi-armed bandit problem

Malekipirbazari, Milad

Risk-averse multi-armed bandit problem

buir.advisor	Çavuş İyigün, Özlem
dc.contributor.author	Malekipirbazari, Milad
dc.date.accessioned	2021-08-19T11:47:29Z
dc.date.available	2021-08-19T11:47:29Z
dc.date.copyright	2021-08
dc.date.issued	2021-08
dc.date.submitted	2021-08-18
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Includes bibliographical references (pages 97-102).	en_US
dc.description.abstract	In classical multi-armed bandit problem, the aim is to ﬁnd a policy maximizing the expected total reward, implicitly assuming that the decision maker is risk-neutral. On the other hand, the decision makers are risk-averse in some real life applications. In this study, we design a new setting for the classical multi-armed bandit problem (MAB) based on the concept of dynamic risk measures, where the aim is to ﬁnd a policy with the best risk adjusted total discounted outcome. We provide theoretical analysis of MAB with respect to this novel setting, and propose two diﬀerent priority-index heuristics giving risk-averse allocation indices with structures similar to Gittins index. The ﬁrst proposed heuristic is based on Lagrangian duality and the indices are expressed as the Lagrangian multiplier corresponding to the activation constraint. In the second part, we present a theoretical analysis based on Whittle’s retirement problem and propose a gener-alized version of restart-in-state formulation of the Gittins index to compute the proposed risk-averse allocation indices. Finally, as a practical application of the proposed methods, we focus on optimal design of clinical trials and we apply our risk-averse MAB approach to perform risk-averse treatment allocation based on a Bayesian Bernoulli model. We evaluate the performance of our approach against other allocation rules, including ﬁxed randomization.	en_US
dc.description.statementofresponsibility	by Milad Malekipirbazari	en_US
dc.embargo.release	2022-02-18
dc.format.extent	x, 109 leaves ; 30 cm.	en_US
dc.identifier.itemid	B152866
dc.identifier.uri	http://hdl.handle.net/11693/76469
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Multi-armed bandit	en_US
dc.subject	Gittins index	en_US
dc.subject	Dynamic risk-aversion	en_US
dc.subject	Coherent risk measures	en_US
dc.subject	Markov decision process	en_US
dc.subject	Clinical trials	en_US
dc.title	Risk-averse multi-armed bandit problem	en_US
dc.title.alternative	Riskten kaçınan çok kollu haydut problemi	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Industrial Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Doctoral
thesis.degree.name	Ph.D. (Doctor of Philosophy)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 10414096.pdf
Size:: 1.79 MB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.69 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Graduate School of Engineering and Science