Risk-averse allocation indices for multiarmed bandit problem

buir.contributor.authorMalekipirbazari, Milad
buir.contributor.authorÇavuş, Özlem
buir.contributor.orcidMalekipirbazari, Milad|0000-0002-3212-6498
buir.contributor.orcidÇavuş, Özlem|0000-0002-9901-0836
dc.citation.epage5529en_US
dc.citation.issueNumber11en_US
dc.citation.spage5522en_US
dc.citation.volumeNumber66en_US
dc.contributor.authorMalekipirbazari, Milad
dc.contributor.authorÇavuş, Özlem
dc.date.accessioned2022-01-27T10:30:18Z
dc.date.available2022-01-27T10:30:18Z
dc.date.issued2021-01-25
dc.departmentDepartment of Industrial Engineeringen_US
dc.description.abstractIn classical multiarmed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision-maker is risk-neutral. On the other hand, the decision-makers are risk-averse in some real-life applications. In this article, we design a new setting based on the concept of dynamic risk measures where the aim is to find a policy with the best risk-adjusted total discounted outcome. We provide a theoretical analysis of multiarmed bandit problem with respect to this novel setting and propose a priority-index heuristic which gives risk-averse allocation indices having a structure similar to Gittins index. Although an optimal policy is shown not always to have index-based form, empirical results express the excellence of this heuristic and show that with risk-averse allocation indices we can achieve optimal or near-optimal interpretable policies.en_US
dc.description.provenanceSubmitted by Evrim Ergin (eergin@bilkent.edu.tr) on 2022-01-27T10:30:18Z No. of bitstreams: 1 Risk-averse_allocation_indices_for_multiarmed_bandit_problem.pdf: 801124 bytes, checksum: 5d9223286b222e332419e0feed53a727 (MD5)en
dc.description.provenanceMade available in DSpace on 2022-01-27T10:30:18Z (GMT). No. of bitstreams: 1 Risk-averse_allocation_indices_for_multiarmed_bandit_problem.pdf: 801124 bytes, checksum: 5d9223286b222e332419e0feed53a727 (MD5) Previous issue date: 2021-01-25en
dc.identifier.doi10.1109/TAC.2021.3053539en_US
dc.identifier.eissn1558-2523
dc.identifier.issn0018-9286
dc.identifier.urihttp://hdl.handle.net/11693/76828
dc.language.isoEnglishen_US
dc.publisherIEEEen_US
dc.relation.isversionofhttps://doi.org/10.1109/TAC.2021.3053539en_US
dc.source.titleIEEE Transactions on Automatic Controlen_US
dc.subjectCoherent risk measuresen_US
dc.subjectDynamic allocation indexen_US
dc.subjectDynamic risk-aversionen_US
dc.subjectGittins indexen_US
dc.subjectMultiarmed bandit (MAB)en_US
dc.titleRisk-averse allocation indices for multiarmed bandit problemen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Risk-averse_allocation_indices_for_multiarmed_bandit_problem.pdf
Size:
782.35 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: