• About
  • Policies
  • What is openaccess
  • Library
  • Contact
Advanced search
      View Item 
      •   BUIR Home
      • University Library
      • Bilkent Theses
      • Theses - Department of Industrial Engineering
      • Dept. of Industrial Engineering - Master's degree
      • View Item
      •   BUIR Home
      • University Library
      • Bilkent Theses
      • Theses - Department of Industrial Engineering
      • Dept. of Industrial Engineering - Master's degree
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Combinatorial multi-armed bandits: applications and analyses

      Thumbnail
      Embargo Lift Date: 2019-03-13
      View / Download
      1.6 Mb
      Author
      Sarıtaç, Anıl Ömer
      Advisor
      Dayanık, Savaş
      Date
      2018-09
      Publisher
      Bilkent University
      Language
      English
      Type
      Thesis
      Item Usage Stats
      189
      views
      137
      downloads
      Abstract
      We focus on two related problems: Combinatorial multi-armed bandit problem (CMAB) with probabilistically triggered arms (PTAs) and Online Contextual Influence Maximization Problem with Costly Observations (OCIMP-CO) where we utilize a CMAB approach. Under the assumption that the arm triggering probabilities (ATPs) are positive for all arms, we prove that a class of upper confidence bound (UCB) policies, named Combinatorial UCB with exploration rate к(CUCB-к), and Combinatorial Thompson Sampling (CTS), which estimates the expected states of the arms via Thompson sampling, achieve bounded gapdependent and O(√T) gap-independent regret improving on previous works which study CMAB with PTAs under more general ATPs. Then, we numerically evaluate the performance of CUCB-к and CTS in a real-world movie recommendation problem. For the Online Contextual Influence Maximization Problem with Costly Observations, we study a case where the learner can observe the spread of influence by paying an observation cost, by which it aims to maximize the total number of influenced nodes over all epochs minus the observation costs. Since the offline influence maximization problem is NP-hard, we develop a CMAB approach that use an approximation algorithm as a subroutine to obtain the set of seed nodes in each epoch. When the influence probabilities are Hölder continuous functions of the context, we prove that these algorithms achieve sublinear regret (for any sequence of contexts) with respect to an approximation oracle that knows the influence probabilities for all contexts. Moreover, we prove a lower bound that matches the upper bound with respect to time and cost order, suggesting that the upper bound is the best possible. Our numerical results on several networks illustrate that the proposed algorithms perform on par with the state-of-the-art methods even when the observations are cost-free.
      Keywords
      Combinatorial Bandits
      Multi-armed Bandit
      Approximation Algorithms
      Probabilistically Triggered Arms
      Influence Maximization
      Costly Observations
      Regret Bounds
      Lower Bound
      Permalink
      http://hdl.handle.net/11693/47890
      Collections
      • Dept. of Industrial Engineering - Master's degree 326
      Show full item record

      Browse

      All of BUIRCommunities & CollectionsTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsThis CollectionTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartments

      My Account

      Login

      Statistics

      View Usage StatisticsView Google Analytics Statistics

      Bilkent University

      If you have trouble accessing this page and need to request an alternate format, contact the site administrator. Phone: (312) 290 1771
      © Bilkent University - Library IT

      Contact Us | Send Feedback | Off-Campus Access | Admin | Privacy