Browsing by Subject "Q-learning"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Open Access Fictitious play in zero-sum stochastic games(Society for Industrial and Applied Mathematics, 2022) Sayin, Muhammed O.; Parise, Francesca; Ozdaglar, AsumanWe present a novel variant of fictitious play dynamics combining classical fictitiousplay with Q-learning for stochastic games and analyze its convergence properties in two-player zero-sum stochastic games. Our dynamics involves players forming beliefs on the opponent strategyand their own continuation payoff (Q-function), and playing a greedy best response by using theestimated continuation payoffs. Players update their beliefs from observations of opponent actions.A key property of the learning dynamics is that update of the beliefs onQ-functions occurs at aslower timescale than update of the beliefs on strategies. We show that in both the model-based andmodel-free cases (without knowledge of player payoff functions and state transition probabilities),the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochasticgame.Item Open Access Q-learning in regularized mean-field games(Birkhaeuser Science, 2022-05-23) Anahtarci, B.; Kariksiz, C.D.; Saldi, NaciIn this paper, we introduce a regularized mean-field game and study learning of this game under an infinite-horizon discounted reward function. Regularization is introduced by adding a strongly concave regularization function to the one-stage reward function in the classical mean-field game model. We establish a value iteration based learning algorithm to this regularized mean-field game using fitted Q-learning. The regularization term in general makes reinforcement learning algorithm more robust to the system components. Moreover, it enables us to establish error analysis of the learning algorithm without imposing restrictive convexity assumptions on the system components, which are needed in the absence of a regularization term.Item Open Access What to choose next? a paradigm for testing human sequential decision making(Frontiers Research Foundation, 2017) Tartaglia, E. M.; Clarke, Aaron; Herzog, M. H.Many of the decisions we make in our everyday lives are sequential and entail sparse rewards. While sequential decision-making has been extensively investigated in theory (e.g., by reinforcement learning models) there is no systematic experimental paradigm to test it. Here, we developed such a paradigm and investigated key components of reinforcement learning models: the eligibility trace (i.e., the memory trace of previous decision steps), the external reward, and the ability to exploit the statistics of the environment's structure (model-free vs. model-based mechanisms). We show that the eligibility trace decays not with sheer time, but rather with the number of discrete decision steps made by the participants. We further show that, unexpectedly, neither monetary rewards nor the environment's spatial regularity significantly modulate behavioral performance. Finally, we found that model-free learning algorithms describe human performance better than model-based algorithms.