Robust optimization of multi-objective multi-armed bandits with contaminated bandit feedback

Bozgan, Kerem

Robust optimization of multi-objective multi-armed bandits with contaminated bandit feedback

Available

The embargo period has ended, and this item is now available.

Files

B153368.pdf (932.7 KB)

Date

2022-06

Authors

Bozgan, Kerem

Advisor

Tekin, Cem

BUIR Usage Stats

3
views

63
downloads

Abstract

Multi-objective multi-armed bandits (MO-MAB) is an important extension of the standard MAB problem that has found a wide variety of applications ranging from clinical trials to online recommender systems. We consider Pareto set identification problem in the adversarial MO-MAB setting, where at each arm pull, with probability ϵ ∈ (0,1/2), an adversary corrupts the reward samples by replacing the true samples with the samples from an arbitrary distribution of its choosing. Existing MO-MAB methods in the literature are incapable of handling such attacks unless there are strict restrictions on the contamination distributions. As a result, these methods perform poorly in practice where such restrictions on the adversary are not valid in general. To fill this gap in the literature, we propose two different robust, median-based optimization methods that can approximate the Pareto optimal set from contaminated samples. We prove a sample complexity bound of the form O(1/α^2 log(1/δ)) for the proposed methods, where α>0 and δ ∈ (0,1) are accuracy and confidence parameters, respectively, that can be set by the user according to his/her preference. This bound matches, in the worst case, the bounds from [1, Theorem 4] and [2, Theorem 3] that consider the adversary free setting. We compare the proposed methods with a mean-based method from the MO-MAB literature on real-world and synthetic experiments. Numerical results verify our theoretical expectations and show the importance of robust algorithm design in the adversarial setting.

Keywords

Multi-armed bandits, Multi-objective optimization, Robust optimization, Adversarial attack, Median based optimization, Pareto set identification

Degree Discipline

Electrical and Electronic Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Permalink

http://hdl.handle.net/11693/92476

Collections

Graduate School of Engineering and Science

Language

English

Type

Thesis

Full item page

Robust optimization of multi-objective multi-armed bandits with contaminated bandit feedback

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Robust optimization of multi-objective multi-armed bandits with contaminated bandit feedback

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type