Fictitious play in zero-sum stochastic games

Date

2022

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

SIAM Journal on Control and Optimization

Print ISSN

0363-0129

Electronic ISSN

1095-7138

Publisher

Society for Industrial and Applied Mathematics

Volume

60

Issue

4

Pages

2095 - 2114

Language

English

Journal Title

Journal ISSN

Volume Title

Series

Abstract

We present a novel variant of fictitious play dynamics combining classical fictitiousplay with Q-learning for stochastic games and analyze its convergence properties in two-player zero-sum stochastic games. Our dynamics involves players forming beliefs on the opponent strategyand their own continuation payoff (Q-function), and playing a greedy best response by using theestimated continuation payoffs. Players update their beliefs from observations of opponent actions.A key property of the learning dynamics is that update of the beliefs onQ-functions occurs at aslower timescale than update of the beliefs on strategies. We show that in both the model-based andmodel-free cases (without knowledge of player payoff functions and state transition probabilities),the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochasticgame.

Course

Other identifiers

Book Title

Citation