Contextual combinatorial volatile multi-armed bandits in compact context spaces

Nika, Andi

Contextual combinatorial volatile multi-armed bandits in compact context spaces

buir.advisor	Tekin, Cem
dc.contributor.author	Nika, Andi
dc.date.accessioned	2021-08-17T06:36:25Z
dc.date.available	2021-08-17T06:36:25Z
dc.date.copyright	2021-07
dc.date.issued	2021-07
dc.date.submitted	2021-08-06
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Includes bibliographical references (leaves 78-83).	en_US
dc.description.abstract	We consider the contextual combinatorial volatile multi-armed bandit (CCV-MAB) problem in compact context spaces, simultaneously taking into consideration all of its individual features, thus providing a general framework for solving a wide range of practical problems. We solve CCV-MAB using two approaches. First, we use the so called adaptive discretization technique which sequentially partitions the context space X into ’regions of similarity’ and stores similar statistics corresponding to such regions. Under monotonicity of the expected reward and mild continuity assumptions, for both the expected reward and the expected base arm outcomes, we propose Adap-tive Contextual Combinatorial Upper Confidence Bound (ACC-UCB), an online learn-ing algorithm that uses adaptive discretization and incurs O˜(T ( ¯ +1)/( ¯ +2)+) regret for any  > 0, where ¯ represents the approximate optimality dimension related to X . This dimension captures both the benignness of the base arm arrivals and the struc-ture of the expected reward. Second, we impose a Gaussian process (GP) structure on the expected base arms outcomes and thus, using the smoothness of the GP posterior, eliminate the need for adaptive discretization. We propose Optimistic Combinatorial Learning and Optimization with Kernel Upper Confidence Bounds (O’CLOK-UCB) which incurs O˜(K√T γ¯T ) regret, where γ¯T is the maximum information gain associ-ated with the set of base arm contexts that appeared in the first T rounds and K here is the maximum cardinality of any feasible super arm over all rounds. For both methods, we provide experimental results which conclude in the superiority of ACC-UCB over the previous state-of-the-art and of O’CLOCK-UCB over ACC-UCB.	en_US
dc.description.statementofresponsibility	by Andi Nika	en_US
dc.embargo.release	2021-12-01
dc.format.extent	viii, 83 leaves : illustrations (some color), charts (some color) ; 30 cm.	en_US
dc.identifier.itemid	B130105
dc.identifier.uri	http://hdl.handle.net/11693/76440
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Multi-armed bandit	en_US
dc.subject	Contextual combinatorial bandit	en_US
dc.subject	Volatile bandit	en_US
dc.subject	Adap-tive discretization	en_US
dc.subject	Gaussian processes	en_US
dc.title	Contextual combinatorial volatile multi-armed bandits in compact context spaces	en_US
dc.title.alternative	Tıkız bağlam uzaylarında bağlamsal birleşimsel değişken çok-kollu haydut	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Electrical and Electronic Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 10411062.pdf
Size:: 1.23 MB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.69 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Graduate School of Engineering and Science