Experience replay strategies for improving performance of deep off-policy actor-critic reinforcement learning algorithms

buir.advisorKozat, Süleyman Serdar
dc.contributor.authorLorasdağı, Mehmet Efe
dc.date.accessioned2025-07-28T11:25:01Z
dc.date.available2025-07-28T11:25:01Z
dc.date.copyright2025-07
dc.date.issued2025-07
dc.date.submitted2025-07-25
dc.descriptionCataloged from PDF version of article.
dc.descriptionIncludes bibliographical references (leaves 45-49).
dc.description.abstractWe investigate an important conflict in deep deterministic policy gradient algorithms where experience replay strategies designed to accelerate critic learning can destabilize the actor. Conventional methods, including Prioritized Experience Replay, sample a single batch of transitions to update both networks. This shared data approach ignores the fact that transitions with high temporal difference error, while beneficial for the critic’s value function estimation, may correspond to off-policy actions that can introduce misleading gradients and degrade the actor’s policy. To resolve this, we introduce Decoupled Prioritized Experience Replay, a novel framework that explicitly separates the transition sampling for the actor and critic to serve their distinct learning objectives. For the critic, it employs a conventional prioritization scheme, sampling transitions with high temporal difference error to promote efficient learning of the value function. For the actor, however, Decoupled Prioritized Experience Replay introduces a new sampling strategy. It selects batches that are more on-policy by minimizing the KullbackLeibler divergence between the actions stored in the buffer and those proposed by the current policy. We integrate Decoupled Prioritized Experience Replay with the state-of-the-art Twin Delayed Deep Deterministic policy gradient algorithm and conduct an evaluation on six standard continuous control benchmarks from OpenAI Gym and MuJoCo. The results show that Decoupled Prioritized Experience Replay consistently accelerates learning and achieves superior final performance compared to both vanilla and prioritized replay. More critically, Decoupled Prioritized Experience Replay maintains learning stability and converges to strong policies in tasks where standard prioritized replay failed to learn. Further ablation studies indicate that the decoupling mechanism is an important factor in this robustness and that the benefits of Decoupled Prioritized Experience Replay are achievable with a computationally inexpensive search, making it a practically effective solution for improving off-policy learning.
dc.description.statementofresponsibilityby Mehmet Efe Lorasdağı
dc.format.extentxv, 49 leaves : illustrations, charts ; 30 cm.
dc.identifier.itemidB163127
dc.identifier.urihttps://hdl.handle.net/11693/117395
dc.language.isoEnglish
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectDeep reinforcement learning
dc.subjectExperience replay
dc.subjectActor-critic methods
dc.subjectOff-policy learning
dc.subjectContinuous control
dc.titleExperience replay strategies for improving performance of deep off-policy actor-critic reinforcement learning algorithms
dc.title.alternativeDerin politika dişi aktör-kritik pekiştirmeli öğrenme algoritmalarinin performansını artırmak için deneyim tekrarı stratejileri
dc.typeThesis
thesis.degree.disciplineElectrical and Electronic Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
B163127.pdf
Size:
4.2 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.1 KB
Format:
Item-specific license agreed upon to submission
Description: