Now showing items 1-1 of 1
Online learning in structured Markov decision processes
(Bilkent University, 2017-07)
This thesis proposes three new multi-armed bandit problems, in which the learner proceeds in a sequence of rounds where each round is a Markov Decision Process (MDP). The learner's goal is to maximize its cumulative ...