Bozgan, Kerem2022-06-142022-06-142022-062022-062022-06-13http://hdl.handle.net/11693/92476Cataloged from PDF version of article.Thesis (Master's): Bilkent University, Department of Electrical and Electronics Engineering, İhsan Doğramacı Bilkent University, 2022.Includes bibliographical references (leaves 70-74).Multi-objective multi-armed bandits (MO-MAB) is an important extension of the standard MAB problem that has found a wide variety of applications ranging from clinical trials to online recommender systems. We consider Pareto set identification problem in the adversarial MO-MAB setting, where at each arm pull, with probability ϵ ∈ (0,1/2), an adversary corrupts the reward samples by replacing the true samples with the samples from an arbitrary distribution of its choosing. Existing MO-MAB methods in the literature are incapable of handling such attacks unless there are strict restrictions on the contamination distributions. As a result, these methods perform poorly in practice where such restrictions on the adversary are not valid in general. To fill this gap in the literature, we propose two different robust, median-based optimization methods that can approximate the Pareto optimal set from contaminated samples. We prove a sample complexity bound of the form O(1/α^2 log(1/δ)) for the proposed methods, where α>0 and δ ∈ (0,1) are accuracy and confidence parameters, respectively, that can be set by the user according to his/her preference. This bound matches, in the worst case, the bounds from [1, Theorem 4] and [2, Theorem 3] that consider the adversary free setting. We compare the proposed methods with a mean-based method from the MO-MAB literature on real-world and synthetic experiments. Numerical results verify our theoretical expectations and show the importance of robust algorithm design in the adversarial setting.xi, 81 leaves : charts ; 30 cm.Englishinfo:eu-repo/semantics/openAccessMulti-armed banditsMulti-objective optimizationRobust optimizationAdversarial attackMedian based optimizationPareto set identificationRobust optimization of multi-objective multi-armed bandits with contaminated bandit feedbackÇoklu kollu çoklu hedefli haydutlarda dayanıklı öğrenmeThesisB153368