Multi-objective contextual multi-armed bandit with a dominant objective

buir.contributor.authorTekin, Cem
buir.contributor.authorTurgay, Eralp
dc.citation.epage3813en_US
dc.citation.issueNumber14en_US
dc.citation.spage3799en_US
dc.citation.volumeNumber66en_US
dc.contributor.authorTekin, Cem
dc.contributor.authorTurgay, Eralp
dc.date.accessioned2021-03-23T11:29:33Z
dc.date.available2021-03-23T11:29:33Z
dc.date.issued2018
dc.departmentDepartment of Electrical and Electronics Engineeringen_US
dc.description.abstractWe propose a new multi-objective contextual multiarmed bandit (MAB) problem with two objectives, where one of the objectives dominates the other objective. In the proposed problem, the learner obtains a random reward vector, where each component of the reward vector corresponds to one of the objectives and the distribution of the reward depends on the context that is provided to the learner at the beginning of each round. We call this problem contextual multi-armed bandit with a dominant objective (CMAB-DO). In CMAB-DO, the goal of the learner is to maximize its total reward in the non-dominant objective while ensuring that it maximizes its total reward in the dominant objective. In this case, the optimal arm given the context is the one that maximizes the expected reward in the non-dominant objective among all arms that maximize the expected reward in the dominant objective. First, we show that the optimal arm lies in the Pareto front. Then, we propose the multi-objective contextual multi-armed bandit algorithm (MOC-MAB), and define two performance measures: the 2-dimensional (2D) regret and the Pareto regret. We show that both the 2D regret and the Pareto regret of MOC-MAB are sublinear in the number of rounds. We also compare the performance of the proposed algorithm with other state-of-the-art methods in synthetic and real-world datasets. The proposed model and the algorithm have a wide range of real-world applications that involve multiple and possibly conflicting objectives ranging from wireless communication to medical diagnosis and recommender systemsen_US
dc.description.sponsorshipThis work was supported by the Scientific and Technological Research Council of Turkey (TUBITAK) under Grants 116C043 and 116E229en_US
dc.identifier.doi10.1109/TSP.2018.2841822en_US
dc.identifier.issn1053-587X
dc.identifier.urihttp://hdl.handle.net/11693/75968
dc.language.isoEnglishen_US
dc.publisherIEEEen_US
dc.relation.isversionofhttps://doi.org/10.1109/TSP.2018.2841822en_US
dc.source.titleIEEE Transactions on Signal Processingen_US
dc.subjectOnline learningen_US
dc.subjectContextual MABen_US
dc.subjectMulti-objective MABen_US
dc.subjectDominant objectiveen_US
dc.subjectMulti-dimensional regreten_US
dc.subjectPareto regreten_US
dc.titleMulti-objective contextual multi-armed bandit with a dominant objectiveen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Multi-objective_Contextual_Multi-armed_Bandit_With_a_Dominant_Objective.pdf
Size:
2.31 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: