Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

buir.contributor.authorSağlam, Baturay
buir.contributor.authorMutlu, Furkan Burak
buir.contributor.authorÇiçek, Doğan Can
buir.contributor.authorKozat, Süleyman Serdar
buir.contributor.orcidSağlam, Baturay|0000-0002-8324-5980
buir.contributor.orcidKozat, Süleyman Serdar|0000-0002-6488-3848
buir.contributor.orcidMutlu, Furkan Burak|0000-0002-4165-4145
dc.citation.epage80-25
dc.citation.issueNumber2
dc.citation.spage80-1
dc.citation.volumeNumber56
dc.contributor.authorSağlam, Baturay
dc.contributor.authorMutlu, Furkan Burak
dc.contributor.authorÇiçek, Doğan Can
dc.contributor.authorKozat, Süleyman Serdar
dc.date.accessioned2025-02-22T17:38:38Z
dc.date.available2025-02-22T17:38:38Z
dc.date.issued2024-03-02
dc.departmentDepartment of Electrical and Electronics Engineering
dc.description.abstractApproximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.
dc.description.provenanceSubmitted by Muhammed Murat Uçar (murat.ucar@bilkent.edu.tr) on 2025-02-22T17:38:38Z No. of bitstreams: 1 Parameter-Free_Reduction_of_the_Estimation_Bias_in_Deep_Reinforcement_Learning_for_Deterministic_Policy_Gradients.pdf: 4116456 bytes, checksum: ea6d214a0814b1c4e05c40912a5a5fe6 (MD5)en
dc.description.provenanceMade available in DSpace on 2025-02-22T17:38:38Z (GMT). No. of bitstreams: 1 Parameter-Free_Reduction_of_the_Estimation_Bias_in_Deep_Reinforcement_Learning_for_Deterministic_Policy_Gradients.pdf: 4116456 bytes, checksum: ea6d214a0814b1c4e05c40912a5a5fe6 (MD5) Previous issue date: 2024-03-02en
dc.identifier.doi10.1007/s11063-024-11461-y
dc.identifier.eissn1573-773X
dc.identifier.issn1370-4621
dc.identifier.urihttps://hdl.handle.net/11693/116656
dc.language.isoEnglish
dc.publisherSpringer
dc.relation.isversionofhttps://dx.doi.org/10.1007/s11063-024-11461-y
dc.rightsCC BY 4.0 Deed (Attribution 4.0 International)
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.source.titleNeural Processing Letter
dc.subjectDeep reinforcement learning
dc.subjectActor-critic methods
dc.subjectEstimation bias
dc.subjectDeterministic policy gradients
dc.subjectContinuous control
dc.titleParameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Parameter-Free_Reduction_of_the_Estimation_Bias_in_Deep_Reinforcement_Learning_for_Deterministic_Policy_Gradients.pdf
Size:
3.93 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: