Robust optimization of multi-objective multi-armed bandits with contaminated bandit feedback

Bozgan, Kerem

Robust optimization of multi-objective multi-armed bandits with contaminated bandit feedback

Limited Access

This item is unavailable until:
2022-12-05

Files

B153368.pdf (932.7 KB)

Date

2022-06

Authors

Bozgan, Kerem

Advisor

Tekin, Cem

Publisher

Bilkent University

Language

English

Type

Thesis

Abstract

Multi-objective multi-armed bandits (MO-MAB) is an important extension of the standard MAB problem that has found a wide variety of applications ranging from clinical trials to online recommender systems. We consider Pareto set identification problem in the adversarial MO-MAB setting, where at each arm pull, with probability ϵ ∈ (0,1/2), an adversary corrupts the reward samples by replacing the true samples with the samples from an arbitrary distribution of its choosing. Existing MO-MAB methods in the literature are incapable of handling such attacks unless there are strict restrictions on the contamination distributions. As a result, these methods perform poorly in practice where such restrictions on the adversary are not valid in general. To fill this gap in the literature, we propose two different robust, median-based optimization methods that can approximate the Pareto optimal set from contaminated samples. We prove a sample complexity bound of the form O(1/α^2 log(1/δ)) for the proposed methods, where α>0 and δ ∈ (0,1) are accuracy and confidence parameters, respectively, that can be set by the user according to his/her preference. This bound matches, in the worst case, the bounds from [1, Theorem 4] and [2, Theorem 3] that consider the adversary free setting. We compare the proposed methods with a mean-based method from the MO-MAB literature on real-world and synthetic experiments. Numerical results verify our theoretical expectations and show the importance of robust algorithm design in the adversarial setting.