• About
  • Policies
  • What is openaccess
  • Library
  • Contact
Advanced search
      View Item 
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Electrical and Electronics Engineering
      • View Item
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Electrical and Electronics Engineering
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Multi-objective contextual bandits with a dominant objective

      Thumbnail
      View / Download
      382.6 Kb
      Author
      Tekin, Cem
      Turgay, Eralp
      Date
      2017
      Source Title
      Proceedings of the IEEE 27th International Workshop on Machine Learning for Signal Processing, MLSP 2017
      Print ISSN
      2161-0363
      Publisher
      IEEE
      Language
      English
      Type
      Conference Paper
      Item Usage Stats
      159
      views
      159
      downloads
      Abstract
      In this paper, we propose a new contextual bandit problem with two objectives, where one of the objectives dominates the other objective. Unlike single-objective bandit problems in which the learner obtains a random scalar reward for each arm it selects, in the proposed problem, the learner obtains a random reward vector, where each component of the reward vector corresponds to one of the objectives. The goal of the learner is to maximize its total reward in the non-dominant objective while ensuring that it maximizes its reward in the dominant objective. In this case, the optimal arm given a context is the one that maximizes the expected reward in the non-dominant objective among all arms that maximize the expected reward in the dominant objective. For this problem, we propose the multi-objective contextual multi-armed bandit algorithm (MOC-MAB), and prove that it achieves sublinear regret with respect to the optimal context dependent policy. Then, we compare the performance of the proposed algorithm with other state-of-the-art bandit algorithms. The proposed contextual bandit model and the algorithm have a wide range of real-world applications that involve multiple and possibly conflicting objectives ranging from wireless communication to medical diagnosis and recommender systems.
      Keywords
      Contextual bandits
      Dominant objective
      Multi-objective bandits
      Online learning
      Regret bounds
      Artificial intelligence
      Diagnosis
      Learning systems
      Wireless telecommunication systems
      Signal processing
      Permalink
      http://hdl.handle.net/11693/37612
      Published Version (Please cite this version)
      http://dx.doi.org/10.1109/MLSP.2017.8168123
      Collections
      • Department of Electrical and Electronics Engineering 3524
      Show full item record

      Browse

      All of BUIRCommunities & CollectionsTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsThis CollectionTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartments

      My Account

      Login

      Statistics

      View Usage StatisticsView Google Analytics Statistics

      Bilkent University

      If you have trouble accessing this page and need to request an alternate format, contact the site administrator. Phone: (312) 290 1771
      Copyright © Bilkent University - Library IT

      Contact Us | Send Feedback | Off-Campus Access | Admin | Privacy