Robust optimization of multi-objective multi-armed bandits with contaminated bandit feedback
buir.advisor | Tekin, Cem | |
dc.contributor.author | Bozgan, Kerem | |
dc.date.accessioned | 2022-06-14T09:15:34Z | |
dc.date.available | 2022-06-14T09:15:34Z | |
dc.date.copyright | 2022-06 | |
dc.date.issued | 2022-06 | |
dc.date.submitted | 2022-06-13 | |
dc.description | Cataloged from PDF version of article. | en_US |
dc.description | Thesis (Master's): Bilkent University, Department of Electrical and Electronics Engineering, İhsan Doğramacı Bilkent University, 2022. | en_US |
dc.description | Includes bibliographical references (leaves 70-74). | en_US |
dc.description.abstract | Multi-objective multi-armed bandits (MO-MAB) is an important extension of the standard MAB problem that has found a wide variety of applications ranging from clinical trials to online recommender systems. We consider Pareto set identification problem in the adversarial MO-MAB setting, where at each arm pull, with probability ϵ ∈ (0,1/2), an adversary corrupts the reward samples by replacing the true samples with the samples from an arbitrary distribution of its choosing. Existing MO-MAB methods in the literature are incapable of handling such attacks unless there are strict restrictions on the contamination distributions. As a result, these methods perform poorly in practice where such restrictions on the adversary are not valid in general. To fill this gap in the literature, we propose two different robust, median-based optimization methods that can approximate the Pareto optimal set from contaminated samples. We prove a sample complexity bound of the form O(1/α^2 log(1/δ)) for the proposed methods, where α>0 and δ ∈ (0,1) are accuracy and confidence parameters, respectively, that can be set by the user according to his/her preference. This bound matches, in the worst case, the bounds from [1, Theorem 4] and [2, Theorem 3] that consider the adversary free setting. We compare the proposed methods with a mean-based method from the MO-MAB literature on real-world and synthetic experiments. Numerical results verify our theoretical expectations and show the importance of robust algorithm design in the adversarial setting. | en_US |
dc.description.provenance | Submitted by Betül Özen (ozen@bilkent.edu.tr) on 2022-06-14T09:15:34Z No. of bitstreams: 1 B153368.pdf: 955082 bytes, checksum: 10cfe2b925a9df86c7c8de6b50826521 (MD5) | en |
dc.description.provenance | Made available in DSpace on 2022-06-14T09:15:34Z (GMT). No. of bitstreams: 1 B153368.pdf: 955082 bytes, checksum: 10cfe2b925a9df86c7c8de6b50826521 (MD5) Previous issue date: 2022-06 | en |
dc.description.statementofresponsibility | by Kerem Bozgan | en_US |
dc.embargo.release | 2022-12-05 | |
dc.format.extent | xi, 81 leaves : charts ; 30 cm. | en_US |
dc.identifier.itemid | B153368 | |
dc.identifier.uri | http://hdl.handle.net/11693/92476 | |
dc.language.iso | English | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.subject | Multi-armed bandits | en_US |
dc.subject | Multi-objective optimization | en_US |
dc.subject | Robust optimization | en_US |
dc.subject | Adversarial attack | en_US |
dc.subject | Median based optimization | en_US |
dc.subject | Pareto set identification | en_US |
dc.title | Robust optimization of multi-objective multi-armed bandits with contaminated bandit feedback | en_US |
dc.title.alternative | Çoklu kollu çoklu hedefli haydutlarda dayanıklı öğrenme | en_US |
dc.type | Thesis | en_US |
thesis.degree.discipline | Electrical and Electronic Engineering | |
thesis.degree.grantor | Bilkent University | |
thesis.degree.level | Master's | |
thesis.degree.name | MS (Master of Science) |