Personalizing treatments via contextual multi-armed bandits by identifying relevance
Embargo Lift Date: 2020-02-09
Item Usage Stats
MetadataShow full item record
Personalized medicine offers specialized treatment options for individuals which is vital as every patient is different. One-size-fits-all approaches are often not effective and most patients require personalized care when dealing with various diseases like cancer, heart diseases or diabetes. As vast amounts of data became available in medicine (and otherfields including web-based recommender systems and intelligent radio networks), online learning approaches are gaining popularity due to their ability to learn fast in uncertain environments. Contextual multi-armed bandit algorithms provide reliable sequential decision-making options in such applications. In medical settings (also in other aforementioned settings), data (contexts) and actions (arms) are often high-dimensional and performances of traditional contextual multi-armed bandit approaches are almost as bad as random selection, due to the curse of dimensionality. Fortunately, in many cases the information relevant to the decision-making task does not depend on all dimensions but rather depends on a small subset of dimensions, called the relevant dimensions. In this thesis, we aim to provide personalized treatments for patients sequentially arriving over time by using contextual multi-armed bandit approaches when the expected rewards related to patient outcomes only vary on a small subset of context and arm dimensions. For this purpose,first we make use of the contextual multi-armed bandit with relevance learning (CMAB-RL) algorithm which learns the relevance by employing a novel partitioning strategy on the context-arm space and forming a set of candidate relevant dimension tuples. In this model, the set of relevant patient traits are allowed to be different for different bolus insulin dosages. Next, we consider an environment where the expected reward function defined over the context-arm space is sampled from a Gaussian process. For this setting, we propose an extension to the contextual Gaussian process upper confidence bound (CGP-UCB) algorithm, called CGP-UCB with relevance learning (CGP-UCB-RL), that learns the relevance by integrating kernels that allow weights to be associated with each dimension and optimizing the negative log marginal likelihood. Then, we investigate the suitability of this approach in the blood glucose regulation problem. Aside from applying both algorithms to the bolus insulin administration problem, we also evaluate their performance in synthetically generated environments as benchmarks.
Contextual multi-armed bandits
Contextual Gaussian process bandits