Multi-objective contextual bandits with a dominant objective
buir.contributor.author | Tekin, Cem | |
buir.contributor.author | Turgay, Eralp | |
dc.citation.epage | 3813 | en_US |
dc.citation.spage | 3799 | en_US |
dc.contributor.author | Tekin, Cem | en_US |
dc.contributor.author | Turgay, Eralp | en_US |
dc.coverage.spatial | Tokyo, Japan | en_US |
dc.date.accessioned | 2018-04-12T11:45:36Z | |
dc.date.available | 2018-04-12T11:45:36Z | |
dc.date.issued | 2017 | en_US |
dc.department | Department of Electrical and Electronics Engineering | en_US |
dc.description | Date of Conference: 25-28 September 2017 | en_US |
dc.description | Conference Name: IEEE 27th International Workshop on Machine Learning for Signal Processing, MLSP 2017 | en_US |
dc.description.abstract | In this paper, we propose a new contextual bandit problem with two objectives, where one of the objectives dominates the other objective. Unlike single-objective bandit problems in which the learner obtains a random scalar reward for each arm it selects, in the proposed problem, the learner obtains a random reward vector, where each component of the reward vector corresponds to one of the objectives. The goal of the learner is to maximize its total reward in the non-dominant objective while ensuring that it maximizes its reward in the dominant objective. In this case, the optimal arm given a context is the one that maximizes the expected reward in the non-dominant objective among all arms that maximize the expected reward in the dominant objective. For this problem, we propose the multi-objective contextual multi-armed bandit algorithm (MOC-MAB), and prove that it achieves sublinear regret with respect to the optimal context dependent policy. Then, we compare the performance of the proposed algorithm with other state-of-the-art bandit algorithms. The proposed contextual bandit model and the algorithm have a wide range of real-world applications that involve multiple and possibly conflicting objectives ranging from wireless communication to medical diagnosis and recommender systems. | en_US |
dc.description.provenance | Made available in DSpace on 2018-04-12T11:45:36Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 179475 bytes, checksum: ea0bedeb05ac9ccfb983c327e155f0c2 (MD5) Previous issue date: 2017 | en |
dc.identifier.doi | 10.1109/MLSP.2017.8168123 | en_US |
dc.identifier.issn | 2161-0363 | en_US |
dc.identifier.uri | http://hdl.handle.net/11693/37612 | |
dc.language.iso | English | en_US |
dc.publisher | IEEE | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1109/MLSP.2017.8168123 | en_US |
dc.source.title | Proceedings of the IEEE 27th International Workshop on Machine Learning for Signal Processing, MLSP 2017 | en_US |
dc.subject | Contextual bandits | en_US |
dc.subject | Dominant objective | en_US |
dc.subject | Multi-objective bandits | en_US |
dc.subject | Online learning | en_US |
dc.subject | Regret bounds | en_US |
dc.subject | Artificial intelligence | en_US |
dc.subject | Diagnosis | en_US |
dc.subject | Learning systems | en_US |
dc.subject | Wireless telecommunication systems | en_US |
dc.subject | Signal processing | en_US |
dc.title | Multi-objective contextual bandits with a dominant objective | en_US |
dc.type | Conference Paper | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Multi-Objective contextual bandits with a dominant objective.pdf
- Size:
- 382.59 KB
- Format:
- Adobe Portable Document Format
- Description:
- Full Printable Version