Asymptotically optimal contextual bandit algorithm using hierarchical structures

buir.contributor.authorNeyshabouri, Mohammadreza Mohaghegh
buir.contributor.authorGökçesu, Kaan
buir.contributor.authorGökçesu, Hakan
buir.contributor.authorÖzkan, Hüseyin
buir.contributor.authorKozat, Süleyman Serdar
dc.citation.epage937en_US
dc.citation.issueNumber3
dc.citation.spage923en_US
dc.citation.volumeNumber30
dc.contributor.authorNeyshabouri, Mohammadreza Mohagheghen_US
dc.contributor.authorGökçesu, Kaanen_US
dc.contributor.authorGökçesu, Hakanen_US
dc.contributor.authorÖzkan, Hüseyinen_US
dc.contributor.authorKozat, Süleyman Serdaren_US
dc.date.accessioned2019-02-21T16:05:51Zen_US
dc.date.available2019-02-21T16:05:51Zen_US
dc.date.issued2018en_US
dc.departmentDepartment of Electrical and Electronics Engineeringen_US
dc.description.abstractWe propose an online algorithm for sequential learning in the contextual multiarmed bandit setting. Our approach is to partition the context space and, then, optimally combine all of the possible mappings between the partition regions and the set of bandit arms in a data-driven manner. We show that in our approach, the best mapping is able to approximate the best arm selection policy to any desired degree under mild Lipschitz conditions. Therefore, we design our algorithm based on the optimal adaptive combination and asymptotically achieve the performance of the best mapping as well as the best arm selection policy. This optimality is also guaranteed to hold even in adversarial environments since we do not rely on any statistical assumptions regarding the contexts or the loss of the bandit arms. Moreover, we design an efficient implementation for our algorithm using various hierarchical partitioning structures, such as lexicographical or arbitrary position splitting and binary trees (BTs) (and several other partitioning examples). For instance, in the case of BT partitioning, the computational complexity is only log-linear in the number of regions in the finest partition. In conclusion, we provide significant performance improvements by introducing upper bounds (with respect to the best arm selection policy) that are mathematically proven to vanish in the average loss per round sense at a faster rate compared to the state of the art. Our experimental work extensively covers various scenarios ranging from bandit settings to multiclass classification with real and synthetic data. In these experiments, we show that our algorithm is highly superior to the state-of-the-art techniques while maintaining the introduced mathematical guarantees and a computationally decent scalability. IEEEen_US
dc.description.provenanceMade available in DSpace on 2019-02-21T16:05:51Z (GMT). No. of bitstreams: 1 Bilkent-research-paper.pdf: 222869 bytes, checksum: 842af2b9bd649e7f548593affdbafbb3 (MD5) Previous issue date: 2018en
dc.identifier.doi10.1109/TNNLS.2018.2854796en_US
dc.identifier.eissn2162-2388en_US
dc.identifier.issn2162-237Xen_US
dc.identifier.urihttp://hdl.handle.net/11693/50277en_US
dc.language.isoEnglishen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.relation.isversionofhttps://doi.org/10.1109/TNNLS.2018.2854796en_US
dc.source.titleIEEE Transactions on Neural Networks and Learning Systemsen_US
dc.subjectAdversarialen_US
dc.subjectBig dataen_US
dc.subjectComputational complexityen_US
dc.subjectContextual banditsen_US
dc.subjectUniversalen_US
dc.subjectOnline learningen_US
dc.titleAsymptotically optimal contextual bandit algorithm using hierarchical structuresen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Asymptotically_optimal_contextual_bandit_algorithm_using_hierarchical_structures.pdf
Size:
2.84 MB
Format:
Adobe Portable Document Format
Description:
Full printable version