Multi-objective contextual multi-armed bandit with a dominant objective

Tekin, Cem; Turgay, Eralp

Multi-objective contextual multi-armed bandit with a dominant objective

buir.contributor.author	Tekin, Cem
buir.contributor.author	Turgay, Eralp
dc.citation.epage	3813	en_US
dc.citation.issueNumber	14	en_US
dc.citation.spage	3799	en_US
dc.citation.volumeNumber	66	en_US
dc.contributor.author	Tekin, Cem
dc.contributor.author	Turgay, Eralp
dc.date.accessioned	2021-03-23T11:29:33Z
dc.date.available	2021-03-23T11:29:33Z
dc.date.issued	2018
dc.department	Department of Electrical and Electronics Engineering	en_US
dc.description.abstract	We propose a new multi-objective contextual multiarmed bandit (MAB) problem with two objectives, where one of the objectives dominates the other objective. In the proposed problem, the learner obtains a random reward vector, where each component of the reward vector corresponds to one of the objectives and the distribution of the reward depends on the context that is provided to the learner at the beginning of each round. We call this problem contextual multi-armed bandit with a dominant objective (CMAB-DO). In CMAB-DO, the goal of the learner is to maximize its total reward in the non-dominant objective while ensuring that it maximizes its total reward in the dominant objective. In this case, the optimal arm given the context is the one that maximizes the expected reward in the non-dominant objective among all arms that maximize the expected reward in the dominant objective. First, we show that the optimal arm lies in the Pareto front. Then, we propose the multi-objective contextual multi-armed bandit algorithm (MOC-MAB), and define two performance measures: the 2-dimensional (2D) regret and the Pareto regret. We show that both the 2D regret and the Pareto regret of MOC-MAB are sublinear in the number of rounds. We also compare the performance of the proposed algorithm with other state-of-the-art methods in synthetic and real-world datasets. The proposed model and the algorithm have a wide range of real-world applications that involve multiple and possibly conflicting objectives ranging from wireless communication to medical diagnosis and recommender systems	en_US
dc.description.sponsorship	This work was supported by the Scientific and Technological Research Council of Turkey (TUBITAK) under Grants 116C043 and 116E229	en_US
dc.identifier.doi	10.1109/TSP.2018.2841822	en_US
dc.identifier.issn	1053-587X
dc.identifier.uri	http://hdl.handle.net/11693/75968
dc.language.iso	English	en_US
dc.publisher	IEEE	en_US
dc.relation.isversionof	https://doi.org/10.1109/TSP.2018.2841822	en_US
dc.source.title	IEEE Transactions on Signal Processing	en_US
dc.subject	Online learning	en_US
dc.subject	Contextual MAB	en_US
dc.subject	Multi-objective MAB	en_US
dc.subject	Dominant objective	en_US
dc.subject	Multi-dimensional regret	en_US
dc.subject	Pareto regret	en_US
dc.title	Multi-objective contextual multi-armed bandit with a dominant objective	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Multi-objective_Contextual_Multi-armed_Bandit_With_a_Dominant_Objective.pdf
Size:: 2.31 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Publications - Electrical and Electronics Engineering