Q-Learning for MDPs with general spaces: convergence and near optimality via quantization under weak continuity

buir.contributor.authorSaldı, Naci
buir.contributor.orcidSaldı, Naci|0000-0002-2677-7366
dc.citation.epage199-34en_US
dc.citation.spage199-1
dc.citation.volumeNumber24
dc.contributor.authorKara, A. D.
dc.contributor.authorSaldı, Naci
dc.contributor.authorYüksel, S.
dc.date.accessioned2024-03-14T08:14:35Z
dc.date.available2024-03-14T08:14:35Z
dc.date.issued2023-07-12
dc.departmentDepartment of Mathematics
dc.description.abstractReinforcement learning algorithms often require finiteness of state and action spaces in Markov decision processes (MDPs) (also called controlled Markov chains) and various efforts have been made in the literature towards the applicability of such algorithms for continuous state and action spaces. In this paper, we show that under very mild regularity conditions (in particular, involving only weak continuity of the transition kernel of an MDP), Q-learning for standard Borel MDPs via quantization of states and actions (called Quantized Q-Learning) converges to a limit, and further-more this limit satisfies an optimality equation which leads to near optimality with either explicit performance bounds or which are guaranteed to be asymptotically optimal. Our approach builds on (i) viewing quantization as a measurement kernel and thus a quantized MDP as a partially observed Markov decision process (POMDP), (ii) utilizing near optimality and convergence results of Q-learning for POMDPs, and (iii) finally, near-optimality of finite state model approximations for MDPs with weakly continuous kernels which we show to correspond to the fixed point of the constructed POMDP. Thus, our paper presents a very general convergence and approximation result for the applicability of Q-learning for continuous MDPs.
dc.description.provenanceMade available in DSpace on 2024-03-14T08:14:35Z (GMT). No. of bitstreams: 1 Q-Learning_for_MDPs_with_general_spaces_convergence_and_near_optimality_via_quantization_under_weak_continuity.pdf: 373951 bytes, checksum: 682876f859d1be12c2d1fee235f206fd (MD5) Previous issue date: 2023-07-12en
dc.identifier.eissn1533-7928
dc.identifier.issn1532-4435
dc.identifier.urihttps://hdl.handle.net/11693/114727
dc.language.isoen
dc.publisherJournal of Machine Learning Research
dc.rightsCC BY 4.0
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.source.titleJournal of Machine Learning Research
dc.subjectReinforcement learning
dc.subjectStochastic control
dc.subjectFinite approximation
dc.titleQ-Learning for MDPs with general spaces: convergence and near optimality via quantization under weak continuity
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Q_learning_for_MDPs_with_general_spaces_convergence_and_near_optimality_via_quantization_under_weak_continuity
Size:
365.19 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.01 KB
Format:
Item-specific license agreed upon to submission
Description: