Now showing items 1-1 of 1

    • Online learning in structured Markov decision processes 

      Akbarzadeh, Nima (Bilkent University, 2017-07)
      This thesis proposes three new multi-armed bandit problems, in which the learner proceeds in a sequence of rounds where each round is a Markov Decision Process (MDP). The learner's goal is to maximize its cumulative ...