• About
  • Policies
  • What is open access
  • Library
  • Contact
Advanced search
      View Item 
      •   BUIR Home
      • University Library
      • Bilkent Theses
      • Theses - Department of Industrial Engineering
      • Dept. of Industrial Engineering - Ph.D. / Sc.D.
      • View Item
      •   BUIR Home
      • University Library
      • Bilkent Theses
      • Theses - Department of Industrial Engineering
      • Dept. of Industrial Engineering - Ph.D. / Sc.D.
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Risk-averse multi-armed bandit problem

      Thumbnail
      Embargo Lift Date: 2022-02-18
      View / Download
      1.8 Mb
      Author(s)
      Malekipirbazari, Milad
      Advisor
      Çavuş İyigün, Özlem
      Date
      2021-08
      Publisher
      Bilkent University
      Language
      English
      Type
      Thesis
      Item Usage Stats
      549
      views
      82
      downloads
      Abstract
      In classical multi-armed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision maker is risk-neutral. On the other hand, the decision makers are risk-averse in some real life applications. In this study, we design a new setting for the classical multi-armed bandit problem (MAB) based on the concept of dynamic risk measures, where the aim is to find a policy with the best risk adjusted total discounted outcome. We provide theoretical analysis of MAB with respect to this novel setting, and propose two different priority-index heuristics giving risk-averse allocation indices with structures similar to Gittins index. The first proposed heuristic is based on Lagrangian duality and the indices are expressed as the Lagrangian multiplier corresponding to the activation constraint. In the second part, we present a theoretical analysis based on Whittle’s retirement problem and propose a gener-alized version of restart-in-state formulation of the Gittins index to compute the proposed risk-averse allocation indices. Finally, as a practical application of the proposed methods, we focus on optimal design of clinical trials and we apply our risk-averse MAB approach to perform risk-averse treatment allocation based on a Bayesian Bernoulli model. We evaluate the performance of our approach against other allocation rules, including fixed randomization.
      Keywords
      Multi-armed bandit
      Gittins index
      Dynamic risk-aversion
      Coherent risk measures
      Markov decision process
      Clinical trials
      Permalink
      http://hdl.handle.net/11693/76469
      Collections
      • Dept. of Industrial Engineering - Ph.D. / Sc.D. 50
      Show full item record

      Browse

      All of BUIRCommunities & CollectionsTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsCoursesThis CollectionTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsCourses

      My Account

      Login

      Statistics

      View Usage StatisticsView Google Analytics Statistics

      Bilkent University

      If you have trouble accessing this page and need to request an alternate format, contact the site administrator. Phone: (312) 290 2976
      © Bilkent University - Library IT

      Contact Us | Send Feedback | Off-Campus Access | Admin | Privacy