Multi-objective contextual bandits with a dominant objective

buir.contributor.authorTekin, Cem
buir.contributor.authorTurgay, Eralp
dc.citation.epage3813en_US
dc.citation.spage3799en_US
dc.contributor.authorTekin, Cemen_US
dc.contributor.authorTurgay, Eralpen_US
dc.coverage.spatialTokyo, Japanen_US
dc.date.accessioned2018-04-12T11:45:36Z
dc.date.available2018-04-12T11:45:36Z
dc.date.issued2017en_US
dc.departmentDepartment of Electrical and Electronics Engineeringen_US
dc.descriptionDate of Conference: 25-28 September 2017en_US
dc.descriptionConference Name: IEEE 27th International Workshop on Machine Learning for Signal Processing, MLSP 2017en_US
dc.description.abstractIn this paper, we propose a new contextual bandit problem with two objectives, where one of the objectives dominates the other objective. Unlike single-objective bandit problems in which the learner obtains a random scalar reward for each arm it selects, in the proposed problem, the learner obtains a random reward vector, where each component of the reward vector corresponds to one of the objectives. The goal of the learner is to maximize its total reward in the non-dominant objective while ensuring that it maximizes its reward in the dominant objective. In this case, the optimal arm given a context is the one that maximizes the expected reward in the non-dominant objective among all arms that maximize the expected reward in the dominant objective. For this problem, we propose the multi-objective contextual multi-armed bandit algorithm (MOC-MAB), and prove that it achieves sublinear regret with respect to the optimal context dependent policy. Then, we compare the performance of the proposed algorithm with other state-of-the-art bandit algorithms. The proposed contextual bandit model and the algorithm have a wide range of real-world applications that involve multiple and possibly conflicting objectives ranging from wireless communication to medical diagnosis and recommender systems.en_US
dc.description.provenanceMade available in DSpace on 2018-04-12T11:45:36Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 179475 bytes, checksum: ea0bedeb05ac9ccfb983c327e155f0c2 (MD5) Previous issue date: 2017en
dc.identifier.doi10.1109/MLSP.2017.8168123en_US
dc.identifier.issn2161-0363en_US
dc.identifier.urihttp://hdl.handle.net/11693/37612
dc.language.isoEnglishen_US
dc.publisherIEEEen_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/MLSP.2017.8168123en_US
dc.source.titleProceedings of the IEEE 27th International Workshop on Machine Learning for Signal Processing, MLSP 2017en_US
dc.subjectContextual banditsen_US
dc.subjectDominant objectiveen_US
dc.subjectMulti-objective banditsen_US
dc.subjectOnline learningen_US
dc.subjectRegret boundsen_US
dc.subjectArtificial intelligenceen_US
dc.subjectDiagnosisen_US
dc.subjectLearning systemsen_US
dc.subjectWireless telecommunication systemsen_US
dc.subjectSignal processingen_US
dc.titleMulti-objective contextual bandits with a dominant objectiveen_US
dc.typeConference Paperen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Multi-Objective contextual bandits with a dominant objective.pdf
Size:
382.59 KB
Format:
Adobe Portable Document Format
Description:
Full Printable Version