Experience replay strategies for improving performance of deep off-policy actor-critic reinforcement learning algorithms

Lorasdağı, Mehmet Efe

Experience replay strategies for improving performance of deep off-policy actor-critic reinforcement learning algorithms

Files

B163127.pdf (4.2 MB)

Date

2025-07

Authors

Lorasdağı, Mehmet Efe

Advisor

Kozat, Süleyman Serdar

BUIR Usage Stats

4
views

6
downloads

Abstract

We investigate an important conflict in deep deterministic policy gradient algorithms where experience replay strategies designed to accelerate critic learning can destabilize the actor. Conventional methods, including Prioritized Experience Replay, sample a single batch of transitions to update both networks. This shared data approach ignores the fact that transitions with high temporal difference error, while beneficial for the critic’s value function estimation, may correspond to off-policy actions that can introduce misleading gradients and degrade the actor’s policy. To resolve this, we introduce Decoupled Prioritized Experience Replay, a novel framework that explicitly separates the transition sampling for the actor and critic to serve their distinct learning objectives. For the critic, it employs a conventional prioritization scheme, sampling transitions with high temporal difference error to promote efficient learning of the value function. For the actor, however, Decoupled Prioritized Experience Replay introduces a new sampling strategy. It selects batches that are more on-policy by minimizing the KullbackLeibler divergence between the actions stored in the buffer and those proposed by the current policy. We integrate Decoupled Prioritized Experience Replay with the state-of-the-art Twin Delayed Deep Deterministic policy gradient algorithm and conduct an evaluation on six standard continuous control benchmarks from OpenAI Gym and MuJoCo. The results show that Decoupled Prioritized Experience Replay consistently accelerates learning and achieves superior final performance compared to both vanilla and prioritized replay. More critically, Decoupled Prioritized Experience Replay maintains learning stability and converges to strong policies in tasks where standard prioritized replay failed to learn. Further ablation studies indicate that the decoupling mechanism is an important factor in this robustness and that the benefits of Decoupled Prioritized Experience Replay are achievable with a computationally inexpensive search, making it a practically effective solution for improving off-policy learning.

Keywords

Deep reinforcement learning, Experience replay, Actor-critic methods, Off-policy learning, Continuous control

Degree Discipline

Electrical and Electronic Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Permalink

https://hdl.handle.net/11693/117395

Collections

Graduate School of Engineering and Science

Language

English

Type

Thesis

Full item page

Experience replay strategies for improving performance of deep off-policy actor-critic reinforcement learning algorithms

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Experience replay strategies for improving performance of deep off-policy actor-critic reinforcement learning algorithms

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type