Browsing by Author "Mutlu, Furkan Burak"

Now showing 1 - 6 of 6

Open Access
Actor prioritized experience replay
(AI Access Foundation, 2023-11-16) Sağlam, B.; Mutlu, Furkan Burak; Cicek, Dogan C.; Kozat, Süleyman S.
A widely-studied deep reinforcement learning (RL) technique known as Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error. Although it has been shown that PER is one of the most crucial components for the overall performance of deep RL methods in discrete action domains, many empirical studies indicate that it considerably underperforms off-policy actor-critic algorithms. We theoretically show that actor networks cannot be effectively trained with transitions that have large TD errors. As a result, the approximate policy gradient computed under the Q-network diverges from the actual gradient computed under the optimal Q-function. Motivated by this, we introduce a novel experience replay sampling framework for actor-critic methods, which also regards issues with stability and recent findings behind the poor empirical performance of PER. The introduced algorithm suggests a new branch of improvements to PER and schedules effective and efficient training for both actor and critic networks. An extensive set of experiments verifies our theoretical findings, showing that our method outperforms competing approaches and achieves state-of-the-art results over the standard off-policy actor-critic algorithms.
Restricted
Dev-Yol ve Nuri özdemir
(Bilkent University, 2018) Mutlu, Furkan Burak; Uludağ, Emre; Akkaya, Gökhan; Özcan, Tuna; Angın, Oğuzhan
1970’li yıllarda sol görüşü benimseyen insanların bir araya gelmesiyle Dev-Yol adlı örgüt oluşmuştur. ülkenin tam bağımsızlığı, işçi ve insan hakları, uluslararası hukuku savunan bu örgüt, insanlara bu fikirleri aşılamak için çeşitli faaliyetler yapmıştır. Gazete ve dergi basımı, afiş asımı, protestolar ve seminerler yaptıkları faaliyetlere birer örnektir. Yaklaşık on yıl boyunca etkili olan bu örgüt 1980 darbesi sonrası sıkıyönetim rejimi tarafından dağıtılmış, örgüt içindeki birçok insan tutuklanarak cezaevine gönderilmiştir. Sol görüşü benimseyen ve siyasetle ilgilenen Nuri özdemir, ihtilal sonrası tutuklanan insanlardan biridir. Bu araştırmada Nuri özdemir’in yaptıkları ve dava sürecinde yaşadığı zorluklar anlatılmıştır.
Open Access
Novel sampling strategies for experience replay mechanisms in off-policy deep reinforcement learning algorithms
(2024-09) Mutlu, Furkan Burak
Experience replay enables agents to effectively utilize their past experiences repeatedly to improve learning performance. Traditional strategies, such as vanilla experience replay, involve uniformly sampling from the replay buffer, which can lead to inefficiencies as they do not account for the varying importance of different transitions. More advanced methods, like Prioritized Experience Replay (PER), attempt to address this by adjusting the sampling probability of each transition according to its perceived importance. However, constantly recalculating these probabilities for every transition in the buffer after each iteration is computationally expensive and impractical for large-scale applications. Moreover, these methods do not necessarily enhance the performance of actor-critic-based reinforcement learning algorithms, as they typically rely on predefined metrics, such as Temporal Difference (TD) error, which do not directly represent the relevance of a transition to the agent’s policy. The importance of a transition can change dynamically throughout training, but existing approaches struggle to adapt to this due to computational constraints. Both vanilla sampling strategies and advanced methods like PER introduce biases toward certain transitions. Vanilla experience replay tends to favor older transitions, which may no longer be useful since they were often generated by a random policy during initialization. Meanwhile, PER is biased toward transitions with high TD errors, which primarily reflects errors in the critic network and may not correspond to improvements in the policy network, as there is no direct correlation between TD error and policy enhancement. Given these challenges, we propose a new sampling strategy designed to mitigate bias and ensure that every transition is used in updates an equal number of times. Our method, Corrected Uniform Experience Replay (CUER), leverages an efficient sum-tree structure to achieve fair sampling counts for all transitions. We evaluate CUER on various continuous control tasks and demonstrate that it outperforms both traditional and advanced replay mechanisms when applied to state-of-the-art off-policy deep reinforcement learning algorithms like TD3 and SAC. Empirical results indicate that CUER consistently improves sample efficiency without imposing a significant computational burden, leading to faster convergence and more stable learning performance.
Open Access
Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients
(Springer, 2024-03-02) Sağlam, Baturay; Mutlu, Furkan Burak; Çiçek, Doğan Can; Kozat, Süleyman Serdar
Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.
Open Access
TMD-NER: Turkish multi-domain named entity recognition for informal texts
(Springer UK, 2023-12-19) Yılmaz, Selim F.; Mutlu, Furkan Burak; Balaban, Ismail; Kozat, Süleyman Serdar
We examine named entity recognition (NER), an essential and commonly used first step in many natural language processing tasks, including chatbots and language translation. We focus on the application of NER to texts that have a lot of noise, such as tweets, which is difficult due to the casual and unstructured language often used in these mediums. In this study, we make use of the largest available labeled data sets for Turkish NER, specifically targeting three informal platforms, namely Twitter, Facebook and Donanimhaber. We choose Turkish as a representative agglutinative language, which has a significantly different structure than other well-known languages such as English, French, and German. We emphasize that the methodologies and insights gained from this study can be extended to other agglutinative languages, like Finnish, Hungarian, Japanese, and Korean. We apply NER to these datasets using 16 different named entity tags through a framework that employs bidirectional long short-term memory (BiLSTM) networks followed by conditional random fields (CRF), known together as the BiLSTM-CRF model. Our experiments show an F1 score of 84% on a combined dataset, which indicates that deep learning models can also be effectively used for business applications in informal settings in agglutinative languages such as Turkish.
Open Access
TMD-NER: Turkish multi-domain named entity recognition for informal texts
(Springer Nature, 2023-12-19) Yılmaz, S. F.; Mutlu, Furkan Burak; Balaban, İ.; Kozat, Süleyman Serdar
We examine named entity recognition (NER), an essential and commonly used first step in many natural language processing tasks, including chatbots and language translation. We focus on the application of NER to texts that have a lot of noise, such as tweets, which is difficult due to the casual and unstructured language often used in these mediums. In this study, we make use of the largest available labeled data sets for Turkish NER, specifically targeting three informal platforms, namely Twitter, Facebook and Donanimhaber. We choose Turkish as a representative agglutinative language, which has a significantly different structure than other well-known languages such as English, French, and German. We emphasize that the methodologies and insights gained from this study can be extended to other agglutinative languages, like Finnish, Hungarian, Japanese, and Korean. We apply NER to these datasets using 16 different named entity tags through a framework that employs bidirectional long short-term memory (BiLSTM) networks followed by conditional random fields (CRF), known together as the BiLSTM-CRF model. Our experiments show an F1 score of 84% on a combined dataset, which indicates that deep learning models can also be effectively used for business applications in informal settings in agglutinative languages such as Turkish.