Risk-averse multi-armed bandit problem

buir.advisorÇavuş İyigün, Özlem
dc.contributor.authorMalekipirbazari, Milad
dc.date.accessioned2021-08-19T11:47:29Z
dc.date.available2021-08-19T11:47:29Z
dc.date.copyright2021-08
dc.date.issued2021-08
dc.date.submitted2021-08-18
dc.departmentDepartment of Industrial Engineeringen_US
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (Ph.D.): Bilkent University, Department of Industrial Engineering, İhsan Doğramacı Bilkent University, 2021.en_US
dc.descriptionIncludes bibliographical references (pages 97-102).en_US
dc.description.abstractIn classical multi-armed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision maker is risk-neutral. On the other hand, the decision makers are risk-averse in some real life applications. In this study, we design a new setting for the classical multi-armed bandit problem (MAB) based on the concept of dynamic risk measures, where the aim is to find a policy with the best risk adjusted total discounted outcome. We provide theoretical analysis of MAB with respect to this novel setting, and propose two different priority-index heuristics giving risk-averse allocation indices with structures similar to Gittins index. The first proposed heuristic is based on Lagrangian duality and the indices are expressed as the Lagrangian multiplier corresponding to the activation constraint. In the second part, we present a theoretical analysis based on Whittle’s retirement problem and propose a gener-alized version of restart-in-state formulation of the Gittins index to compute the proposed risk-averse allocation indices. Finally, as a practical application of the proposed methods, we focus on optimal design of clinical trials and we apply our risk-averse MAB approach to perform risk-averse treatment allocation based on a Bayesian Bernoulli model. We evaluate the performance of our approach against other allocation rules, including fixed randomization.en_US
dc.description.degreePh.D.en_US
dc.description.statementofresponsibilityby Milad Malekipirbazarien_US
dc.embargo.release2022-02-18
dc.format.extentx, 109 leaves ; 30 cm.en_US
dc.identifier.itemidB152866
dc.identifier.urihttp://hdl.handle.net/11693/76469
dc.language.isoEnglishen_US
dc.publisherBilkent Universityen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectMulti-armed banditen_US
dc.subjectGittins indexen_US
dc.subjectDynamic risk-aversionen_US
dc.subjectCoherent risk measuresen_US
dc.subjectMarkov decision processen_US
dc.subjectClinical trialsen_US
dc.titleRisk-averse multi-armed bandit problemen_US
dc.title.alternativeRiskten kaçınan çok kollu haydut problemien_US
dc.typeThesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
10414096.pdf
Size:
1.79 MB
Format:
Adobe Portable Document Format
Description:
Full printable version
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: