Fictitious play in zero-sum stochastic games
Date
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Print ISSN
Electronic ISSN
Publisher
Volume
Issue
Pages
Language
Type
Journal Title
Journal ISSN
Volume Title
Citation Stats
Attention Stats
Usage Stats
views
downloads
Series
Abstract
We present a novel variant of fictitious play dynamics combining classical fictitiousplay with Q-learning for stochastic games and analyze its convergence properties in two-player zero-sum stochastic games. Our dynamics involves players forming beliefs on the opponent strategyand their own continuation payoff (Q-function), and playing a greedy best response by using theestimated continuation payoffs. Players update their beliefs from observations of opponent actions.A key property of the learning dynamics is that update of the beliefs onQ-functions occurs at aslower timescale than update of the beliefs on strategies. We show that in both the model-based andmodel-free cases (without knowledge of player payoff functions and state transition probabilities),the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochasticgame.