Combinatorial multi-armed bandits: applications and analyses

buir.advisorDayanık, Savaş
dc.contributor.authorSarıtaç, Anıl Ömer
dc.date.accessioned2018-09-19T06:51:04Z
dc.date.available2018-09-19T06:51:04Z
dc.date.copyright2018-09
dc.date.issued2018-09
dc.date.submitted2018-09-18
dc.departmentDepartment of Industrial Engineeringen_US
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (M.S.): Bilkent University, Department of Industrial Engineering, İhsan Doğramacı Bilkent University, 2018.en_US
dc.descriptionIncludes bibliographical references (leaves 60-66).en_US
dc.description.abstractWe focus on two related problems: Combinatorial multi-armed bandit problem (CMAB) with probabilistically triggered arms (PTAs) and Online Contextual Influence Maximization Problem with Costly Observations (OCIMP-CO) where we utilize a CMAB approach. Under the assumption that the arm triggering probabilities (ATPs) are positive for all arms, we prove that a class of upper confidence bound (UCB) policies, named Combinatorial UCB with exploration rate к(CUCB-к), and Combinatorial Thompson Sampling (CTS), which estimates the expected states of the arms via Thompson sampling, achieve bounded gapdependent and O(√T) gap-independent regret improving on previous works which study CMAB with PTAs under more general ATPs. Then, we numerically evaluate the performance of CUCB-к and CTS in a real-world movie recommendation problem. For the Online Contextual Influence Maximization Problem with Costly Observations, we study a case where the learner can observe the spread of influence by paying an observation cost, by which it aims to maximize the total number of influenced nodes over all epochs minus the observation costs. Since the offline influence maximization problem is NP-hard, we develop a CMAB approach that use an approximation algorithm as a subroutine to obtain the set of seed nodes in each epoch. When the influence probabilities are Hölder continuous functions of the context, we prove that these algorithms achieve sublinear regret (for any sequence of contexts) with respect to an approximation oracle that knows the influence probabilities for all contexts. Moreover, we prove a lower bound that matches the upper bound with respect to time and cost order, suggesting that the upper bound is the best possible. Our numerical results on several networks illustrate that the proposed algorithms perform on par with the state-of-the-art methods even when the observations are cost-free.en_US
dc.description.degreeM.S.en_US
dc.description.statementofresponsibilityby Anıl Ömer Sarıtaç.en_US
dc.embargo.release2019-03-13
dc.format.extentxiii, 102 leaves : charts ; 30 cm.en_US
dc.identifier.itemidB159016
dc.identifier.urihttp://hdl.handle.net/11693/47890
dc.language.isoEnglishen_US
dc.publisherBilkent Universityen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectCombinatorial Banditsen_US
dc.subjectMulti-armed Banditen_US
dc.subjectApproximation Algorithmsen_US
dc.subjectProbabilistically Triggered Armsen_US
dc.subjectInfluence Maximizationen_US
dc.subjectCostly Observationsen_US
dc.subjectRegret Boundsen_US
dc.subjectLower Bounden_US
dc.titleCombinatorial multi-armed bandits: applications and analysesen_US
dc.title.alternativeKombinatorik çok kollu haydutlar: uygulamalar ve analizleren_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Thesis_AnilOmerSaritac_16092018.pdf
Size:
1.59 MB
Format:
Adobe Portable Document Format
Description:
Full printable version
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: