Personalizing treatments via contextual multi-armed bandits by identifying relevance
Author(s)
Advisor
Tekin, CemDate
2019-08Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
339
views
views
207
downloads
downloads
Abstract
Personalized medicine offers specialized treatment options for individuals
which is vital as every patient is different. One-size-fits-all approaches are often
not effective and most patients require personalized care when dealing with
various diseases like cancer, heart diseases or diabetes. As vast amounts of data
became available in medicine (and otherfields including web-based recommender
systems and intelligent radio networks), online learning approaches are gaining
popularity due to their ability to learn fast in uncertain environments. Contextual
multi-armed bandit algorithms provide reliable sequential decision-making
options in such applications. In medical settings (also in other aforementioned
settings), data (contexts) and actions (arms) are often high-dimensional and performances
of traditional contextual multi-armed bandit approaches are almost
as bad as random selection, due to the curse of dimensionality. Fortunately, in
many cases the information relevant to the decision-making task does not depend
on all dimensions but rather depends on a small subset of dimensions, called the
relevant dimensions. In this thesis, we aim to provide personalized treatments for
patients sequentially arriving over time by using contextual multi-armed bandit
approaches when the expected rewards related to patient outcomes only vary on
a small subset of context and arm dimensions. For this purpose,first we make use
of the contextual multi-armed bandit with relevance learning (CMAB-RL) algorithm
which learns the relevance by employing a novel partitioning strategy on the
context-arm space and forming a set of candidate relevant dimension tuples. In
this model, the set of relevant patient traits are allowed to be different for different
bolus insulin dosages. Next, we consider an environment where the expected
reward function defined over the context-arm space is sampled from a Gaussian
process. For this setting, we propose an extension to the contextual Gaussian
process upper confidence bound (CGP-UCB) algorithm, called CGP-UCB with
relevance learning (CGP-UCB-RL), that learns the relevance by integrating kernels
that allow weights to be associated with each dimension and optimizing
the negative log marginal likelihood. Then, we investigate the suitability of this
approach in the blood glucose regulation problem. Aside from applying both
algorithms to the bolus insulin administration problem, we also evaluate their
performance in synthetically generated environments as benchmarks.
Keywords
Online LearningContextual multi-armed bandits
Contextual Gaussian process bandits
Relevance learning
Personalized medicine