Browsing by Subject "Exploration"

Now showing 1 - 5 of 5

Open Access
Deep intrinsically motivated exploration in continuous control
(Springer, 2023-10-26) Sağlam, Baturay; Kozat, Süleyman Serdar
In continuous control, exploration is often performed through undirected strategies in which parameters of the networks or selected actions are perturbed by random noise. Although the deep setting of undirected exploration has been shown to improve the performance of on-policy methods, they introduce an excessive computational complexity and are known to fail in the off-policy setting. The intrinsically motivated exploration is an effective alsetup and hyper-parameterternative to the undirected strategies, but they are usually studied for discrete action domains. In this paper, we investigate how intrinsic motivation can effectively be combined with deep reinforcement learning in the control of continuous systems to obtain a directed exploratory behavior. We adapt the existing theories on animal motivational systems into the reinforcement learning paradigm and introduce a novel and scalable directed exploration strategy. The introduced approach, motivated by the maximization of the value function’s error, can benefit from a collected set of experiences by extracting useful information and unify the intrinsic exploration motivations in the literature under a single exploration objective. An extensive set of empirical studies demonstrate that our framework extends to larger and more diverse state spaces, dramatically improves the baselines, and outperforms the undirected strategies significantly.
Open Access
An intrinsic motivation based artificial goal generation in on-policy continuous control
(IEEE, 2022-08-29) Sağlam, Baturay; Mutlu, Furkan B.; Gönç, Kaan; Dalmaz, Onat; Kozat, Süleyman S.
This work adapts the existing theories on animal motivational systems into the reinforcement learning (RL) paradigm to constitute a directed exploration strategy in on-policy continuous control. We introduce a novel and scalable artificial bonus reward rule that encourages agents to visit useful state spaces. By unifying the intrinsic incentives in the reinforcement learning paradigm under the introduced deterministic reward rule, our method forces the value function to learn the values of unseen or less-known states and prevent premature behavior before sufficiently learning the environment. The simulation results show that the proposed algorithm considerably improves the state-of-the-art on-policy methods and improves the inherent entropy-based exploration.
Open Access
"Nedim Gürsel'in ''Bir Avuç Dünya"sıyla dünya cennetlerine yolculuk"
(Ürün Yayınları, 2004) İnal, Tanju
The travel narratives of Nedim Gürsel, a Turkish writer who spends his life in Paris and Istanbul transports us to different cities of the world mostly known as "terrestrial paradises" or "lost paradises". Although Gürsel is preoccupied with a burning desire to flee to other spaces, from his writing about travel emerges a strong feeling of solitude and strangeness that makes hirn experience the woes of escaping frorn his condition. Nevertheless a furtive glance over the city and its nonurgent fumishes him with the key to an emotional metamorphosis that does not cease to awaken in him various reminiscences and literary memories linked to authors who had written about these cities. Thus the various geographical journeys of Nedim Gürsel come off as a literary exploration and an intellectual voyage through thought rather than by a displacement in space.
Open Access
Unified intrinsically motivated exploration for off-policy learning in continuous action spaces
(IEEE, 2022-08-29) Sağlam, Baturay; Mutlu, Furkan B.; Dalmaz, Onat; Kozat, Süleyman S.
Exploration is maintained in continuous control using undirected methods, in which random noise perturbs the network parameters or selected actions. Exploration that is intrinsically driven is a good alternative to undirected techniques. However, it is only studied for discrete action domains. The intrinsic incentives in the existing reinforcement learning literature are unified together in this study by a deterministic artificial goal generation rule for off-policy learning. The agent gains additional reward through this practice if it chooses actions that lead it to useful state spaces. An extensive set of experiments indicates that the introduced artificial reward rule significantly improves the performance of the off-policy baseline algorithms.
Open Access
What to choose next? a paradigm for testing human sequential decision making
(Frontiers Research Foundation, 2017) Tartaglia, E. M.; Clarke, Aaron; Herzog, M. H.
Many of the decisions we make in our everyday lives are sequential and entail sparse rewards. While sequential decision-making has been extensively investigated in theory (e.g., by reinforcement learning models) there is no systematic experimental paradigm to test it. Here, we developed such a paradigm and investigated key components of reinforcement learning models: the eligibility trace (i.e., the memory trace of previous decision steps), the external reward, and the ability to exploit the statistics of the environment's structure (model-free vs. model-based mechanisms). We show that the eligibility trace decays not with sheer time, but rather with the number of discrete decision steps made by the participants. We further show that, unexpectedly, neither monetary rewards nor the environment's spatial regularity significantly modulate behavioral performance. Finally, we found that model-free learning algorithms describe human performance better than model-based algorithms.