Search
Now showing items 1-1 of 1
Online learning in structured Markov decision processes
(Bilkent University, 2017-07)
This thesis proposes three new multi-armed bandit problems, in which the learner
proceeds in a sequence of rounds where each round is a Markov Decision Process
(MDP). The learner's goal is to maximize its cumulative ...