Novel sampling strategies for experience replay mechanisms in off-policy deep reinforcement learning algorithms

buir.advisorKozat, Süleyman Serdar
dc.contributor.authorMutlu, Furkan Burak
dc.date.accessioned2024-09-19T08:51:17Z
dc.date.available2024-09-19T08:51:17Z
dc.date.copyright2024-09
dc.date.issued2024-09
dc.date.submitted2024-09-17
dc.descriptionCataloged from PDF version of article.
dc.descriptionThesis (Master's): Bilkent University, Department of Electrical and Electronics Engineering, İhsan Doğramacı Bilkent University, 2024.
dc.descriptionIncludes bibliographical references (leaves 52-55).
dc.description.abstractExperience replay enables agents to effectively utilize their past experiences repeatedly to improve learning performance. Traditional strategies, such as vanilla experience replay, involve uniformly sampling from the replay buffer, which can lead to inefficiencies as they do not account for the varying importance of different transitions. More advanced methods, like Prioritized Experience Replay (PER), attempt to address this by adjusting the sampling probability of each transition according to its perceived importance. However, constantly recalculating these probabilities for every transition in the buffer after each iteration is computationally expensive and impractical for large-scale applications. Moreover, these methods do not necessarily enhance the performance of actor-critic-based reinforcement learning algorithms, as they typically rely on predefined metrics, such as Temporal Difference (TD) error, which do not directly represent the relevance of a transition to the agent’s policy. The importance of a transition can change dynamically throughout training, but existing approaches struggle to adapt to this due to computational constraints. Both vanilla sampling strategies and advanced methods like PER introduce biases toward certain transitions. Vanilla experience replay tends to favor older transitions, which may no longer be useful since they were often generated by a random policy during initialization. Meanwhile, PER is biased toward transitions with high TD errors, which primarily reflects errors in the critic network and may not correspond to improvements in the policy network, as there is no direct correlation between TD error and policy enhancement. Given these challenges, we propose a new sampling strategy designed to mitigate bias and ensure that every transition is used in updates an equal number of times. Our method, Corrected Uniform Experience Replay (CUER), leverages an efficient sum-tree structure to achieve fair sampling counts for all transitions. We evaluate CUER on various continuous control tasks and demonstrate that it outperforms both traditional and advanced replay mechanisms when applied to state-of-the-art off-policy deep reinforcement learning algorithms like TD3 and SAC. Empirical results indicate that CUER consistently improves sample efficiency without imposing a significant computational burden, leading to faster convergence and more stable learning performance.
dc.description.provenanceSubmitted by İlknur Sarıkaya (ilknur.sarikaya@bilkent.edu.tr) on 2024-09-19T08:51:17Z No. of bitstreams: 1 B162661.pdf: 1555309 bytes, checksum: 1cd252ee16c26bd29969ab35c372cfaa (MD5)en
dc.description.provenanceMade available in DSpace on 2024-09-19T08:51:17Z (GMT). No. of bitstreams: 1 B162661.pdf: 1555309 bytes, checksum: 1cd252ee16c26bd29969ab35c372cfaa (MD5) Previous issue date: 2024-09en
dc.description.statementofresponsibilityby Furkan Burak Mutlu
dc.format.extentxii, 55 leaves : illustrations, charts ; 30 cm.
dc.identifier.itemidB162661
dc.identifier.urihttps://hdl.handle.net/11693/115831
dc.language.isoEnglish
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectExperience replay
dc.subjectReinforcement learning
dc.subjectActor-critic algorithms
dc.subjectOffpolicy
dc.subjectDeep learning
dc.subjectContinuous control tasks
dc.titleNovel sampling strategies for experience replay mechanisms in off-policy deep reinforcement learning algorithms
dc.title.alternativeDerin deterministik politika gradyanı algoritmaları için yeni tecrübe tekrarı stratejileri
dc.typeThesis
thesis.degree.disciplineElectrical and Electronic Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
B162661.pdf
Size:
1.48 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.1 KB
Format:
Item-specific license agreed upon to submission
Description: