Experience replay strategies for improving performance of deep off-policy actor-critic reinforcement learning algorithms

Lorasdağı, Mehmet Efe

Experience replay strategies for improving performance of deep off-policy actor-critic reinforcement learning algorithms

buir.advisor	Kozat, Süleyman Serdar
dc.contributor.author	Lorasdağı, Mehmet Efe
dc.date.accessioned	2025-07-28T11:25:01Z
dc.date.available	2025-07-28T11:25:01Z
dc.date.copyright	2025-07
dc.date.issued	2025-07
dc.date.submitted	2025-07-25
dc.description	Cataloged from PDF version of article.
dc.description	Includes bibliographical references (leaves 45-49).
dc.description.abstract	We investigate an important conflict in deep deterministic policy gradient algorithms where experience replay strategies designed to accelerate critic learning can destabilize the actor. Conventional methods, including Prioritized Experience Replay, sample a single batch of transitions to update both networks. This shared data approach ignores the fact that transitions with high temporal difference error, while beneficial for the critic’s value function estimation, may correspond to off-policy actions that can introduce misleading gradients and degrade the actor’s policy. To resolve this, we introduce Decoupled Prioritized Experience Replay, a novel framework that explicitly separates the transition sampling for the actor and critic to serve their distinct learning objectives. For the critic, it employs a conventional prioritization scheme, sampling transitions with high temporal difference error to promote efficient learning of the value function. For the actor, however, Decoupled Prioritized Experience Replay introduces a new sampling strategy. It selects batches that are more on-policy by minimizing the KullbackLeibler divergence between the actions stored in the buffer and those proposed by the current policy. We integrate Decoupled Prioritized Experience Replay with the state-of-the-art Twin Delayed Deep Deterministic policy gradient algorithm and conduct an evaluation on six standard continuous control benchmarks from OpenAI Gym and MuJoCo. The results show that Decoupled Prioritized Experience Replay consistently accelerates learning and achieves superior final performance compared to both vanilla and prioritized replay. More critically, Decoupled Prioritized Experience Replay maintains learning stability and converges to strong policies in tasks where standard prioritized replay failed to learn. Further ablation studies indicate that the decoupling mechanism is an important factor in this robustness and that the benefits of Decoupled Prioritized Experience Replay are achievable with a computationally inexpensive search, making it a practically effective solution for improving off-policy learning.
dc.description.statementofresponsibility	by Mehmet Efe Lorasdağı
dc.format.extent	xv, 49 leaves : illustrations, charts ; 30 cm.
dc.identifier.itemid	B163127
dc.identifier.uri	https://hdl.handle.net/11693/117395
dc.language.iso	English
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Deep reinforcement learning
dc.subject	Experience replay
dc.subject	Actor-critic methods
dc.subject	Off-policy learning
dc.subject	Continuous control
dc.title	Experience replay strategies for improving performance of deep off-policy actor-critic reinforcement learning algorithms
dc.title.alternative	Derin politika dişi aktör-kritik pekiştirmeli öğrenme algoritmalarinin performansını artırmak için deneyim tekrarı stratejileri
dc.type	Thesis
thesis.degree.discipline	Electrical and Electronic Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: B163127.pdf
Size:: 4.2 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.1 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Graduate School of Engineering and Science