Q-Learning for MDPs with general spaces: convergence and near optimality via quantization under weak continuity

Kara, A. D.; Saldı, Naci; Yüksel, S.

Q-Learning for MDPs with general spaces: convergence and near optimality via quantization under weak continuity

Files

Q_learning_for_MDPs_with_general_spaces_convergence_and_near_optimality_via_quantization_under_weak_continuity (365.19 KB)

Date

2023-07-12

Authors

Kara, A. D.

Saldı, Naci

Yüksel, S.

BUIR Usage Stats

8
views

65
downloads

Abstract

Reinforcement learning algorithms often require finiteness of state and action spaces in Markov decision processes (MDPs) (also called controlled Markov chains) and various efforts have been made in the literature towards the applicability of such algorithms for continuous state and action spaces. In this paper, we show that under very mild regularity conditions (in particular, involving only weak continuity of the transition kernel of an MDP), Q-learning for standard Borel MDPs via quantization of states and actions (called Quantized Q-Learning) converges to a limit, and further-more this limit satisfies an optimality equation which leads to near optimality with either explicit performance bounds or which are guaranteed to be asymptotically optimal. Our approach builds on (i) viewing quantization as a measurement kernel and thus a quantized MDP as a partially observed Markov decision process (POMDP), (ii) utilizing near optimality and convergence results of Q-learning for POMDPs, and (iii) finally, near-optimality of finite state model approximations for MDPs with weakly continuous kernels which we show to correspond to the fixed point of the constructed POMDP. Thus, our paper presents a very general convergence and approximation result for the applicability of Q-learning for continuous MDPs.

Source Title

Journal of Machine Learning Research

Publisher

Journal of Machine Learning Research

Keywords

Reinforcement learning, Stochastic control, Finite approximation

Permalink

https://hdl.handle.net/11693/114727

Rights

https://creativecommons.org/licenses/by/4.0/

Collections

Scholarly Publications - Mathematics

Language

en

Type

Article

Full item page

Q-Learning for MDPs with general spaces: convergence and near optimality via quantization under weak continuity

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Rights

Collections

Language

Type

Q-Learning for MDPs with general spaces: convergence and near optimality via quantization under weak continuity

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Rights

Collections

Language

Type