Multi-objective contextual multi-armed bandit with a dominant objective

Tekin, Cem; Turgay, Eralp

Multi-objective contextual multi-armed bandit with a dominant objective

Files

Multi-objective_Contextual_Multi-armed_Bandit_With_a_Dominant_Objective.pdf (2.31 MB)

Date

2018

Authors

Tekin, Cem

Turgay, Eralp

BUIR Usage Stats

1
views

21
downloads

Citation Stats

Abstract

We propose a new multi-objective contextual multiarmed bandit (MAB) problem with two objectives, where one of the objectives dominates the other objective. In the proposed problem, the learner obtains a random reward vector, where each component of the reward vector corresponds to one of the objectives and the distribution of the reward depends on the context that is provided to the learner at the beginning of each round. We call this problem contextual multi-armed bandit with a dominant objective (CMAB-DO). In CMAB-DO, the goal of the learner is to maximize its total reward in the non-dominant objective while ensuring that it maximizes its total reward in the dominant objective. In this case, the optimal arm given the context is the one that maximizes the expected reward in the non-dominant objective among all arms that maximize the expected reward in the dominant objective. First, we show that the optimal arm lies in the Pareto front. Then, we propose the multi-objective contextual multi-armed bandit algorithm (MOC-MAB), and define two performance measures: the 2-dimensional (2D) regret and the Pareto regret. We show that both the 2D regret and the Pareto regret of MOC-MAB are sublinear in the number of rounds. We also compare the performance of the proposed algorithm with other state-of-the-art methods in synthetic and real-world datasets. The proposed model and the algorithm have a wide range of real-world applications that involve multiple and possibly conflicting objectives ranging from wireless communication to medical diagnosis and recommender systems

Source Title

IEEE Transactions on Signal Processing

Publisher

IEEE

Keywords

Online learning, Contextual MAB, Multi-objective MAB, Dominant objective, Multi-dimensional regret, Pareto regret

Permalink

http://hdl.handle.net/11693/75968

Published Version (Please cite this version)

https://doi.org/10.1109/TSP.2018.2841822

Collections

Scholarly Publications - Electrical and Electronics Engineering

Language

English

Type

Article

Full item page

Multi-objective contextual multi-armed bandit with a dominant objective

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Multi-objective contextual multi-armed bandit with a dominant objective

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type