Combinatorial multi-armed bandits: applications and analyses

Sarıtaç, Anıl Ömer

Combinatorial multi-armed bandits: applications and analyses

buir.advisor	Dayanık, Savaş
dc.contributor.author	Sarıtaç, Anıl Ömer
dc.date.accessioned	2018-09-19T06:51:04Z
dc.date.available	2018-09-19T06:51:04Z
dc.date.copyright	2018-09
dc.date.issued	2018-09
dc.date.submitted	2018-09-18
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Includes bibliographical references (leaves 60-66).	en_US
dc.description.abstract	We focus on two related problems: Combinatorial multi-armed bandit problem (CMAB) with probabilistically triggered arms (PTAs) and Online Contextual Influence Maximization Problem with Costly Observations (OCIMP-CO) where we utilize a CMAB approach. Under the assumption that the arm triggering probabilities (ATPs) are positive for all arms, we prove that a class of upper confidence bound (UCB) policies, named Combinatorial UCB with exploration rate к(CUCB-к), and Combinatorial Thompson Sampling (CTS), which estimates the expected states of the arms via Thompson sampling, achieve bounded gapdependent and O(√T) gap-independent regret improving on previous works which study CMAB with PTAs under more general ATPs. Then, we numerically evaluate the performance of CUCB-к and CTS in a real-world movie recommendation problem. For the Online Contextual Influence Maximization Problem with Costly Observations, we study a case where the learner can observe the spread of influence by paying an observation cost, by which it aims to maximize the total number of influenced nodes over all epochs minus the observation costs. Since the offline influence maximization problem is NP-hard, we develop a CMAB approach that use an approximation algorithm as a subroutine to obtain the set of seed nodes in each epoch. When the influence probabilities are Hölder continuous functions of the context, we prove that these algorithms achieve sublinear regret (for any sequence of contexts) with respect to an approximation oracle that knows the influence probabilities for all contexts. Moreover, we prove a lower bound that matches the upper bound with respect to time and cost order, suggesting that the upper bound is the best possible. Our numerical results on several networks illustrate that the proposed algorithms perform on par with the state-of-the-art methods even when the observations are cost-free.	en_US
dc.description.statementofresponsibility	by Anıl Ömer Sarıtaç.	en_US
dc.embargo.release	2019-03-13
dc.format.extent	xiii, 102 leaves : charts ; 30 cm.	en_US
dc.identifier.itemid	B159016
dc.identifier.uri	http://hdl.handle.net/11693/47890
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Combinatorial Bandits	en_US
dc.subject	Multi-armed Bandit	en_US
dc.subject	Approximation Algorithms	en_US
dc.subject	Probabilistically Triggered Arms	en_US
dc.subject	Influence Maximization	en_US
dc.subject	Costly Observations	en_US
dc.subject	Regret Bounds	en_US
dc.subject	Lower Bound	en_US
dc.title	Combinatorial multi-armed bandits: applications and analyses	en_US
dc.title.alternative	Kombinatorik çok kollu haydutlar: uygulamalar ve analizler	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Industrial Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Thesis_AnilOmerSaritac_16092018.pdf
Size:: 1.59 MB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Graduate School of Engineering and Science