Robust optimization of multi-objective multi-armed bandits with contaminated bandit feedback

buir.advisorTekin, Cem
dc.contributor.authorBozgan, Kerem
dc.date.accessioned2022-06-14T09:15:34Z
dc.date.available2022-06-14T09:15:34Z
dc.date.copyright2022-06
dc.date.issued2022-06
dc.date.submitted2022-06-13
dc.departmentDepartment of Electrical and Electronics Engineeringen_US
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (Master's): Bilkent University, Department of Electrical and Electronics Engineering, İhsan Doğramacı Bilkent University, 2022.en_US
dc.descriptionIncludes bibliographical references (leaves 70-74).en_US
dc.description.abstractMulti-objective multi-armed bandits (MO-MAB) is an important extension of the standard MAB problem that has found a wide variety of applications ranging from clinical trials to online recommender systems. We consider Pareto set identification problem in the adversarial MO-MAB setting, where at each arm pull, with probability ϵ ∈ (0,1/2), an adversary corrupts the reward samples by replacing the true samples with the samples from an arbitrary distribution of its choosing. Existing MO-MAB methods in the literature are incapable of handling such attacks unless there are strict restrictions on the contamination distributions. As a result, these methods perform poorly in practice where such restrictions on the adversary are not valid in general. To fill this gap in the literature, we propose two different robust, median-based optimization methods that can approximate the Pareto optimal set from contaminated samples. We prove a sample complexity bound of the form O(1/α^2 log(1/δ)) for the proposed methods, where α>0 and δ ∈ (0,1) are accuracy and confidence parameters, respectively, that can be set by the user according to his/her preference. This bound matches, in the worst case, the bounds from [1, Theorem 4] and [2, Theorem 3] that consider the adversary free setting. We compare the proposed methods with a mean-based method from the MO-MAB literature on real-world and synthetic experiments. Numerical results verify our theoretical expectations and show the importance of robust algorithm design in the adversarial setting.en_US
dc.description.degreeM.S.en_US
dc.description.statementofresponsibilityby Kerem Bozganen_US
dc.embargo.release2022-12-05
dc.format.extentxi, 81 leaves : charts ; 30 cm.en_US
dc.identifier.itemidB153368
dc.identifier.urihttp://hdl.handle.net/11693/92476
dc.language.isoEnglishen_US
dc.publisherBilkent Universityen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectMulti-armed banditsen_US
dc.subjectMulti-objective optimizationen_US
dc.subjectRobust optimizationen_US
dc.subjectAdversarial attacken_US
dc.subjectMedian based optimizationen_US
dc.subjectPareto set identificationen_US
dc.titleRobust optimization of multi-objective multi-armed bandits with contaminated bandit feedbacken_US
dc.title.alternativeÇoklu kollu çoklu hedefli haydutlarda dayanıklı öğrenmeen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
B153368.pdf
Size:
932.7 KB
Format:
Adobe Portable Document Format
Description:
Full printable version
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: