Show simple item record

dc.contributor.authorTurgay, Eralp
dc.contributor.authorBulucu, Cem
dc.contributor.authorTekin, Cem
dc.date.accessioned2021-03-09T06:47:34Z
dc.date.available2021-03-09T06:47:34Z
dc.date.issued2020
dc.identifier.issn1053-587X
dc.identifier.urihttp://hdl.handle.net/11693/75899
dc.description.abstractMany sequential decision-making tasks require choosing at each decision step the right action out of the vast set of possibilities by extracting actionable intelligence from high-dimensional data streams. Most of the times, the high-dimensionality of actions and data makes learning of the optimal actions by traditional learning methods impracticable. In this work, we investigate how to discover and leverage sparsity in actions and data to enable fast learning. As our learning model, we consider a structured contextual multi-armed bandit (CMAB) with high-dimensional arm (action) and context (data) sets, where the rewards depend only on a few relevant dimensions of the joint context-arm set, possibly in a non-linear way. We depart from the prior work by assuming a high-dimensional, continuum set of arms, and allow relevant context dimensions to vary for each arm. We propose a new online learning algorithm called CMAB with Relevance Learning (CMAB-RL). CMAB-RL enjoys a substantially improved regret bound compared to classical CMAB algorithms whose regrets depend on the number of dimensions dx and da of the context and arm sets. Importantly, we show that when the learner has prior knowledge on sparsity, given in terms of upper bounds d¯¯¯x and d¯¯¯a on the number of relevant context and arm dimensions, then CMAB-RL achieves O~(T1−1/(2+2d¯¯¯x+d¯¯¯a)) regret. Finally, we illustrate how CMAB algorithms can be used for optimal personalized blood glucose control in type 1 diabetes mellitus patients, and show that CMAB-RL outperforms other contextual MAB algorithms in this task.en_US
dc.description.sponsorshipThis work was supported in part by the Scientific and Technological Research Council of Turkey (TUBITAK) under Grants 116E229, and 215E342.en_US
dc.language.isoEnglishen_US
dc.source.titleIEEE Transactions on Signal Processingen_US
dc.relation.isversionofhttps://dx.doi.org/10.1109/TSP.2020.3048223en_US
dc.subjectOnline learningen_US
dc.subjectContextual multi-armed banditen_US
dc.subjectRegret boundsen_US
dc.subjectDimensionality reductionen_US
dc.subjectPersonalized medicineen_US
dc.titleExploiting relevance for online decision-making in high-dimensionsen_US
dc.typeArticleen_US
dc.citation.spage1438en_US
dc.citation.epage1451en_US
dc.citation.volumeNumber69en_US
dc.identifier.doi10.1109/TSP.2020.3048223en_US
dc.publisherIEEEen_US
dc.contributor.bilkentauthorTurğay, Eralp
dc.contributor.bilkentauthorBulucu, Cem
dc.contributor.bilkentauthorTekin, Cem


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record