Multi-objective contextual bandits with a dominant objective

Tekin, Cem; Turgay, Eralp

Multi-objective contextual bandits with a dominant objective

Date

2017

Authors

Tekin, Cem

Turgay, Eralp

BUIR Usage Stats

2
views

72
downloads

Citation Stats

Abstract

In this paper, we propose a new contextual bandit problem with two objectives, where one of the objectives dominates the other objective. Unlike single-objective bandit problems in which the learner obtains a random scalar reward for each arm it selects, in the proposed problem, the learner obtains a random reward vector, where each component of the reward vector corresponds to one of the objectives. The goal of the learner is to maximize its total reward in the non-dominant objective while ensuring that it maximizes its reward in the dominant objective. In this case, the optimal arm given a context is the one that maximizes the expected reward in the non-dominant objective among all arms that maximize the expected reward in the dominant objective. For this problem, we propose the multi-objective contextual multi-armed bandit algorithm (MOC-MAB), and prove that it achieves sublinear regret with respect to the optimal context dependent policy. Then, we compare the performance of the proposed algorithm with other state-of-the-art bandit algorithms. The proposed contextual bandit model and the algorithm have a wide range of real-world applications that involve multiple and possibly conflicting objectives ranging from wireless communication to medical diagnosis and recommender systems.

Source Title

Proceedings of the IEEE 27th International Workshop on Machine Learning for Signal Processing, MLSP 2017

Publisher

IEEE

Keywords

Contextual bandits, Dominant objective, Multi-objective bandits, Online learning, Regret bounds, Artificial intelligence, Diagnosis, Learning systems, Wireless telecommunication systems, Signal processing

Permalink

http://hdl.handle.net/11693/37612

Published Version (Please cite this version)

http://dx.doi.org/10.1109/MLSP.2017.8168123

Collections

Scholarly Publications - Electrical and Electronics Engineering

Language

English

Type

Conference Paper

Full item page

Multi-objective contextual bandits with a dominant objective

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Multi-objective contextual bandits with a dominant objective

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type