Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients
buir.contributor.author | Sağlam, Baturay | |
buir.contributor.author | Mutlu, Furkan Burak | |
buir.contributor.author | Çiçek, Doğan Can | |
buir.contributor.author | Kozat, Süleyman Serdar | |
buir.contributor.orcid | Sağlam, Baturay|0000-0002-8324-5980 | |
buir.contributor.orcid | Kozat, Süleyman Serdar|0000-0002-6488-3848 | |
buir.contributor.orcid | Mutlu, Furkan Burak|0000-0002-4165-4145 | |
dc.citation.epage | 80-25 | |
dc.citation.issueNumber | 2 | |
dc.citation.spage | 80-1 | |
dc.citation.volumeNumber | 56 | |
dc.contributor.author | Sağlam, Baturay | |
dc.contributor.author | Mutlu, Furkan Burak | |
dc.contributor.author | Çiçek, Doğan Can | |
dc.contributor.author | Kozat, Süleyman Serdar | |
dc.date.accessioned | 2025-02-22T17:38:38Z | |
dc.date.available | 2025-02-22T17:38:38Z | |
dc.date.issued | 2024-03-02 | |
dc.department | Department of Electrical and Electronics Engineering | |
dc.description.abstract | Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested. | |
dc.description.provenance | Submitted by Muhammed Murat Uçar (murat.ucar@bilkent.edu.tr) on 2025-02-22T17:38:38Z No. of bitstreams: 1 Parameter-Free_Reduction_of_the_Estimation_Bias_in_Deep_Reinforcement_Learning_for_Deterministic_Policy_Gradients.pdf: 4116456 bytes, checksum: ea6d214a0814b1c4e05c40912a5a5fe6 (MD5) | en |
dc.description.provenance | Made available in DSpace on 2025-02-22T17:38:38Z (GMT). No. of bitstreams: 1 Parameter-Free_Reduction_of_the_Estimation_Bias_in_Deep_Reinforcement_Learning_for_Deterministic_Policy_Gradients.pdf: 4116456 bytes, checksum: ea6d214a0814b1c4e05c40912a5a5fe6 (MD5) Previous issue date: 2024-03-02 | en |
dc.identifier.doi | 10.1007/s11063-024-11461-y | |
dc.identifier.eissn | 1573-773X | |
dc.identifier.issn | 1370-4621 | |
dc.identifier.uri | https://hdl.handle.net/11693/116656 | |
dc.language.iso | English | |
dc.publisher | Springer | |
dc.relation.isversionof | https://dx.doi.org/10.1007/s11063-024-11461-y | |
dc.rights | CC BY 4.0 Deed (Attribution 4.0 International) | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.source.title | Neural Processing Letter | |
dc.subject | Deep reinforcement learning | |
dc.subject | Actor-critic methods | |
dc.subject | Estimation bias | |
dc.subject | Deterministic policy gradients | |
dc.subject | Continuous control | |
dc.title | Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients | |
dc.type | Article |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Parameter-Free_Reduction_of_the_Estimation_Bias_in_Deep_Reinforcement_Learning_for_Deterministic_Policy_Gradients.pdf
- Size:
- 3.93 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: