Asymptotically optimal contextual bandit algorithm using hierarchical structures

Neyshabouri, Mohammadreza Mohaghegh; Gökçesu, Kaan; Gökçesu, Hakan; Özkan, Hüseyin; Kozat, Süleyman Serdar

Asymptotically optimal contextual bandit algorithm using hierarchical structures

buir.contributor.author	Neyshabouri, Mohammadreza Mohaghegh
buir.contributor.author	Gökçesu, Kaan
buir.contributor.author	Gökçesu, Hakan
buir.contributor.author	Özkan, Hüseyin
buir.contributor.author	Kozat, Süleyman Serdar
dc.citation.epage	937	en_US
dc.citation.issueNumber	3
dc.citation.spage	923	en_US
dc.citation.volumeNumber	30
dc.contributor.author	Neyshabouri, Mohammadreza Mohaghegh	en_US
dc.contributor.author	Gökçesu, Kaan	en_US
dc.contributor.author	Gökçesu, Hakan	en_US
dc.contributor.author	Özkan, Hüseyin	en_US
dc.contributor.author	Kozat, Süleyman Serdar	en_US
dc.date.accessioned	2019-02-21T16:05:51Z	en_US
dc.date.available	2019-02-21T16:05:51Z	en_US
dc.date.issued	2018	en_US
dc.department	Department of Electrical and Electronics Engineering	en_US
dc.description.abstract	We propose an online algorithm for sequential learning in the contextual multiarmed bandit setting. Our approach is to partition the context space and, then, optimally combine all of the possible mappings between the partition regions and the set of bandit arms in a data-driven manner. We show that in our approach, the best mapping is able to approximate the best arm selection policy to any desired degree under mild Lipschitz conditions. Therefore, we design our algorithm based on the optimal adaptive combination and asymptotically achieve the performance of the best mapping as well as the best arm selection policy. This optimality is also guaranteed to hold even in adversarial environments since we do not rely on any statistical assumptions regarding the contexts or the loss of the bandit arms. Moreover, we design an efficient implementation for our algorithm using various hierarchical partitioning structures, such as lexicographical or arbitrary position splitting and binary trees (BTs) (and several other partitioning examples). For instance, in the case of BT partitioning, the computational complexity is only log-linear in the number of regions in the finest partition. In conclusion, we provide significant performance improvements by introducing upper bounds (with respect to the best arm selection policy) that are mathematically proven to vanish in the average loss per round sense at a faster rate compared to the state of the art. Our experimental work extensively covers various scenarios ranging from bandit settings to multiclass classification with real and synthetic data. In these experiments, we show that our algorithm is highly superior to the state-of-the-art techniques while maintaining the introduced mathematical guarantees and a computationally decent scalability. IEEE	en_US
dc.identifier.doi	10.1109/TNNLS.2018.2854796	en_US
dc.identifier.eissn	2162-2388	en_US
dc.identifier.issn	2162-237X	en_US
dc.identifier.uri	http://hdl.handle.net/11693/50277	en_US
dc.language.iso	English	en_US
dc.publisher	Institute of Electrical and Electronics Engineers	en_US
dc.relation.isversionof	https://doi.org/10.1109/TNNLS.2018.2854796	en_US
dc.source.title	IEEE Transactions on Neural Networks and Learning Systems	en_US
dc.subject	Adversarial	en_US
dc.subject	Big data	en_US
dc.subject	Computational complexity	en_US
dc.subject	Contextual bandits	en_US
dc.subject	Universal	en_US
dc.subject	Online learning	en_US
dc.title	Asymptotically optimal contextual bandit algorithm using hierarchical structures	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Asymptotically_optimal_contextual_bandit_algorithm_using_hierarchical_structures.pdf
Size:: 2.84 MB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

Collections

Scholarly Publications - Electrical and Electronics Engineering