Combinatorial multi-armed bandits: applications and analyses
Author
Sarıtaç, Anıl Ömer
Advisor
Dayanık, Savaş
Date
2018-09Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
189
views
views
137
downloads
downloads
Abstract
We focus on two related problems: Combinatorial multi-armed bandit problem
(CMAB) with probabilistically triggered arms (PTAs) and Online Contextual
Influence Maximization Problem with Costly Observations (OCIMP-CO) where
we utilize a CMAB approach. Under the assumption that the arm triggering
probabilities (ATPs) are positive for all arms, we prove that a class of upper
confidence bound (UCB) policies, named Combinatorial UCB with exploration
rate к(CUCB-к), and Combinatorial Thompson Sampling (CTS), which estimates
the expected states of the arms via Thompson sampling, achieve bounded gapdependent
and O(√T) gap-independent regret improving on previous works which
study CMAB with PTAs under more general ATPs. Then, we numerically evaluate
the performance of CUCB-к and CTS in a real-world movie recommendation
problem. For the Online Contextual Influence Maximization Problem with Costly
Observations, we study a case where the learner can observe the spread of influence
by paying an observation cost, by which it aims to maximize the total number
of influenced nodes over all epochs minus the observation costs. Since the offline
influence maximization problem is NP-hard, we develop a CMAB approach that
use an approximation algorithm as a subroutine to obtain the set of seed nodes
in each epoch. When the influence probabilities are Hölder continuous functions
of the context, we prove that these algorithms achieve sublinear regret (for any
sequence of contexts) with respect to an approximation oracle that knows the
influence probabilities for all contexts. Moreover, we prove a lower bound that
matches the upper bound with respect to time and cost order, suggesting that
the upper bound is the best possible. Our numerical results on several networks
illustrate that the proposed algorithms perform on par with the state-of-the-art
methods even when the observations are cost-free.
Keywords
Combinatorial BanditsMulti-armed Bandit
Approximation Algorithms
Probabilistically Triggered Arms
Influence Maximization
Costly Observations
Regret Bounds
Lower Bound