Robust optimization of multi-objective multi-armed bandits with contaminated bandit feedback

Limited Access
This item is unavailable until:
2022-12-05

Date

2022-06

Editor(s)

Advisor

Tekin, Cem

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

Print ISSN

Electronic ISSN

Publisher

Bilkent University

Volume

Issue

Pages

Language

English

Journal Title

Journal ISSN

Volume Title

Series

Abstract

Multi-objective multi-armed bandits (MO-MAB) is an important extension of the standard MAB problem that has found a wide variety of applications ranging from clinical trials to online recommender systems. We consider Pareto set identification problem in the adversarial MO-MAB setting, where at each arm pull, with probability ϵ ∈ (0,1/2), an adversary corrupts the reward samples by replacing the true samples with the samples from an arbitrary distribution of its choosing. Existing MO-MAB methods in the literature are incapable of handling such attacks unless there are strict restrictions on the contamination distributions. As a result, these methods perform poorly in practice where such restrictions on the adversary are not valid in general. To fill this gap in the literature, we propose two different robust, median-based optimization methods that can approximate the Pareto optimal set from contaminated samples. We prove a sample complexity bound of the form O(1/α^2 log(1/δ)) for the proposed methods, where α>0 and δ ∈ (0,1) are accuracy and confidence parameters, respectively, that can be set by the user according to his/her preference. This bound matches, in the worst case, the bounds from [1, Theorem 4] and [2, Theorem 3] that consider the adversary free setting. We compare the proposed methods with a mean-based method from the MO-MAB literature on real-world and synthetic experiments. Numerical results verify our theoretical expectations and show the importance of robust algorithm design in the adversarial setting.

Course

Other identifiers

Book Title

Citation

item.page.isversionof