Combinatorial multi-armed bandits: applications and analyses

Sarıtaç, Anıl Ömer

Combinatorial multi-armed bandits: applications and analyses

Available

The embargo period has ended, and this item is now available.

Files

Thesis_AnilOmerSaritac_16092018.pdf (1.59 MB)

Date

2018-09

Authors

Sarıtaç, Anıl Ömer

Advisor

Dayanık, Savaş

BUIR Usage Stats

10
views

24
downloads

Abstract

We focus on two related problems: Combinatorial multi-armed bandit problem (CMAB) with probabilistically triggered arms (PTAs) and Online Contextual Influence Maximization Problem with Costly Observations (OCIMP-CO) where we utilize a CMAB approach. Under the assumption that the arm triggering probabilities (ATPs) are positive for all arms, we prove that a class of upper confidence bound (UCB) policies, named Combinatorial UCB with exploration rate к(CUCB-к), and Combinatorial Thompson Sampling (CTS), which estimates the expected states of the arms via Thompson sampling, achieve bounded gapdependent and O(√T) gap-independent regret improving on previous works which study CMAB with PTAs under more general ATPs. Then, we numerically evaluate the performance of CUCB-к and CTS in a real-world movie recommendation problem. For the Online Contextual Influence Maximization Problem with Costly Observations, we study a case where the learner can observe the spread of influence by paying an observation cost, by which it aims to maximize the total number of influenced nodes over all epochs minus the observation costs. Since the offline influence maximization problem is NP-hard, we develop a CMAB approach that use an approximation algorithm as a subroutine to obtain the set of seed nodes in each epoch. When the influence probabilities are Hölder continuous functions of the context, we prove that these algorithms achieve sublinear regret (for any sequence of contexts) with respect to an approximation oracle that knows the influence probabilities for all contexts. Moreover, we prove a lower bound that matches the upper bound with respect to time and cost order, suggesting that the upper bound is the best possible. Our numerical results on several networks illustrate that the proposed algorithms perform on par with the state-of-the-art methods even when the observations are cost-free.

Keywords

Combinatorial Bandits, Multi-armed Bandit, Approximation Algorithms, Probabilistically Triggered Arms, Influence Maximization, Costly Observations, Regret Bounds, Lower Bound

Degree Discipline

Industrial Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Permalink

http://hdl.handle.net/11693/47890

Collections

Graduate School of Engineering and Science

Language

English

Type

Thesis

Full item page

Combinatorial multi-armed bandits: applications and analyses

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Combinatorial multi-armed bandits: applications and analyses

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type