Exploiting relevance for online decision-making in high-dimensions

Turgay, Eralp; Bulucu, Cem; Tekin, Cem

Exploiting relevance for online decision-making in high-dimensions

Files

Exploiting_Relevance_for_Online_Decision-Making_in_High-Dimensions.pdf (819.99 KB)

Date

2020

Authors

Turgay, Eralp

Bulucu, Cem

Tekin, Cem

BUIR Usage Stats

4
views

43
downloads

Citation Stats

Attention Stats

Abstract

Many sequential decision-making tasks require choosing at each decision step the right action out of the vast set of possibilities by extracting actionable intelligence from high-dimensional data streams. Most of the times, the high-dimensionality of actions and data makes learning of the optimal actions by traditional learning methods impracticable. In this work, we investigate how to discover and leverage sparsity in actions and data to enable fast learning. As our learning model, we consider a structured contextual multi-armed bandit (CMAB) with high-dimensional arm (action) and context (data) sets, where the rewards depend only on a few relevant dimensions of the joint context-arm set, possibly in a non-linear way. We depart from the prior work by assuming a high-dimensional, continuum set of arms, and allow relevant context dimensions to vary for each arm. We propose a new online learning algorithm called CMAB with Relevance Learning (CMAB-RL). CMAB-RL enjoys a substantially improved regret bound compared to classical CMAB algorithms whose regrets depend on the number of dimensions dx and da of the context and arm sets. Importantly, we show that when the learner has prior knowledge on sparsity, given in terms of upper bounds d¯¯¯x and d¯¯¯a on the number of relevant context and arm dimensions, then CMAB-RL achieves O~(T1−1/(2+2d¯¯¯x+d¯¯¯a)) regret. Finally, we illustrate how CMAB algorithms can be used for optimal personalized blood glucose control in type 1 diabetes mellitus patients, and show that CMAB-RL outperforms other contextual MAB algorithms in this task.

Source Title

IEEE Transactions on Signal Processing

Publisher

IEEE

Keywords

Online learning, Contextual multi-armed bandit, Regret bounds, Dimensionality reduction, Personalized medicine

Permalink

http://hdl.handle.net/11693/75899

Published Version (Please cite this version)

https://dx.doi.org/10.1109/TSP.2020.3048223

Collections

Scholarly Publications - Electrical and Electronics Engineering

Language

English

Type

Article

Full item page

Exploiting relevance for online decision-making in high-dimensions

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Attention Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Exploiting relevance for online decision-making in high-dimensions

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Attention Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type