Browsing by Author "Bulucu, Cem"

Now showing 1 - 2 of 2

Open Access
Exploiting relevance for online decision-making in high-dimensions
(IEEE, 2020) Turgay, Eralp; Bulucu, Cem; Tekin, Cem
Many sequential decision-making tasks require choosing at each decision step the right action out of the vast set of possibilities by extracting actionable intelligence from high-dimensional data streams. Most of the times, the high-dimensionality of actions and data makes learning of the optimal actions by traditional learning methods impracticable. In this work, we investigate how to discover and leverage sparsity in actions and data to enable fast learning. As our learning model, we consider a structured contextual multi-armed bandit (CMAB) with high-dimensional arm (action) and context (data) sets, where the rewards depend only on a few relevant dimensions of the joint context-arm set, possibly in a non-linear way. We depart from the prior work by assuming a high-dimensional, continuum set of arms, and allow relevant context dimensions to vary for each arm. We propose a new online learning algorithm called CMAB with Relevance Learning (CMAB-RL). CMAB-RL enjoys a substantially improved regret bound compared to classical CMAB algorithms whose regrets depend on the number of dimensions dx and da of the context and arm sets. Importantly, we show that when the learner has prior knowledge on sparsity, given in terms of upper bounds d¯¯¯x and d¯¯¯a on the number of relevant context and arm dimensions, then CMAB-RL achieves O~(T1−1/(2+2d¯¯¯x+d¯¯¯a)) regret. Finally, we illustrate how CMAB algorithms can be used for optimal personalized blood glucose control in type 1 diabetes mellitus patients, and show that CMAB-RL outperforms other contextual MAB algorithms in this task.
Open Access
Personalizing treatments via contextual multi-armed bandits by identifying relevance
(2019-08) Bulucu, Cem
Personalized medicine offers specialized treatment options for individuals which is vital as every patient is different. One-size-fits-all approaches are often not effective and most patients require personalized care when dealing with various diseases like cancer, heart diseases or diabetes. As vast amounts of data became available in medicine (and otherfields including web-based recommender systems and intelligent radio networks), online learning approaches are gaining popularity due to their ability to learn fast in uncertain environments. Contextual multi-armed bandit algorithms provide reliable sequential decision-making options in such applications. In medical settings (also in other aforementioned settings), data (contexts) and actions (arms) are often high-dimensional and performances of traditional contextual multi-armed bandit approaches are almost as bad as random selection, due to the curse of dimensionality. Fortunately, in many cases the information relevant to the decision-making task does not depend on all dimensions but rather depends on a small subset of dimensions, called the relevant dimensions. In this thesis, we aim to provide personalized treatments for patients sequentially arriving over time by using contextual multi-armed bandit approaches when the expected rewards related to patient outcomes only vary on a small subset of context and arm dimensions. For this purpose,first we make use of the contextual multi-armed bandit with relevance learning (CMAB-RL) algorithm which learns the relevance by employing a novel partitioning strategy on the context-arm space and forming a set of candidate relevant dimension tuples. In this model, the set of relevant patient traits are allowed to be different for different bolus insulin dosages. Next, we consider an environment where the expected reward function defined over the context-arm space is sampled from a Gaussian process. For this setting, we propose an extension to the contextual Gaussian process upper confidence bound (CGP-UCB) algorithm, called CGP-UCB with relevance learning (CGP-UCB-RL), that learns the relevance by integrating kernels that allow weights to be associated with each dimension and optimizing the negative log marginal likelihood. Then, we investigate the suitability of this approach in the blood glucose regulation problem. Aside from applying both algorithms to the bolus insulin administration problem, we also evaluate their performance in synthetically generated environments as benchmarks.