Improving the performance of Batch-Constrained reinforcement learning in continuous action domains via generative adversarial networks

Sağlam, Baturay; Dalmaz, Onat; Gönç, Kaan; Kozat, Süleyman S.

Improving the performance of Batch-Constrained reinforcement learning in continuous action domains via generative adversarial networks

Files

Improving_the_Performance_of_Batch-Constrained_Reinforcement_Learning_in_Continuous_Action_Domains_via_Generative_Adversarial_Networks.pdf (2.11 MB)

Date

2022-08-29

Authors

BUIR Usage Stats

3
views

36
downloads

Citation Stats

Abstract

The Batch-Constrained Q-learning algorithm is shown to overcome the extrapolation error and enable deep reinforcement learning agents to learn from a previously collected fixed batch of transitions. However, due to conditional Variational Autoencoders (VAE) used in the data generation module, the BCQ algorithm optimizes a lower variational bound and hence, it is not generalizable to environments with large state and action spaces. In this paper, we show that the performance of the BCQ algorithm can be further improved with the employment of one of the recent advances in deep learning, Generative Adversarial Networks. Our extensive set of experiments shows that the introduced approach significantly improves BCQ in all of the control tasks tested. Moreover, the introduced approach demonstrates robust generalizability to environments with large state and action spaces in the OpenAI Gym control suite.

Toplu-Kısıtlı Q-öğrenme (TKQ) algoritmasının, ekstrapolasyon hatasının üstesinden geldiği ve derin pekiştirmeli öğrenme ajanlarının önceden toplanmış sabit bir deneyim kümesinden öğrenebildiği gösterilmiştir. Ancak, veri oluşturma modülünde kullanılan Koşullu Değişken Özkodlayıcılar (KDÖ) sebebiyle TKQ algoritması daha düşük bir değişken sınırını optimize etmektedir ve bu nedenle, büyük durum ve eylem uzaylarına sahip ortamlara genelleştirilememektedir. Bu bildiride, derin öğrenmedeki son gelişmelerden biri olan Üretken
Çekişmeli Ağlar’ın (ÜÇA) kullanılmasıyla TKQ algoritmasının performansının daha da geliştirilebileceği gösterilmektedir. Kapsamlı deneyler, tanıtılan yaklaşımın test edilen her bir kontrol görevinde TKQ’yu önemli ölçüde geliştirdiğini göstermektedir. Ayrıca tanıtılan yaklaşım, OpenAI Gym kontrol setindeki geniş durum ve eylem boşluklarına sahip ortamlara hızlı bir genelleştirilebilirlik sergilemektedir.

Source Title

Signal Processing and Communications Applications Conference (SIU)

Publisher

IEEE

Keywords

Deep reinforcement learning, Batch-Constrained reinforcement learning, Offline re-inforcement learning, Derin pekiştirmeli öğrenme, Toplu-Kısıtlı pekiştirmeli öğrenme, Çevrimdışı pekiştirmeli öğrenme

Permalink

http://hdl.handle.net/11693/111278

Published Version (Please cite this version)

https://www.doi.org/10.1109/SIU55565.2022.9864786

Collections

Scholarly Publications - Electrical and Electronics Engineering
Scholarly Publications - Computer Engineering

Language

Turkish

Type

Conference Paper

Full item page

Improving the performance of Batch-Constrained reinforcement learning in continuous action domains via generative adversarial networks

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Improving the performance of Batch-Constrained reinforcement learning in continuous action domains via generative adversarial networks

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type