Distributed multi-agent online learning based on global feedback
Author
Tekin, C.
Zhang, S.
Schaar, Mihaela van der
Date
2015-05-01Source Title
IEEE Transactions on Signal Processing
Print ISSN
1053-587X
Electronic ISSN
1941-0476
Publisher
Institute of Electrical and Electronics Engineers
Volume
63
Issue
9
Pages
2225 - 2238
Language
English
Type
ArticleItem Usage Stats
134
views
views
123
downloads
downloads
Abstract
Abstract—In this paper, we develop online learning algorithms
that enable the agents to cooperatively learn how to maximize the
overall reward in scenarios where only noisy global feedback is
available without exchanging any information among themselves.
We prove that our algorithms' learning regrets—the losses incurred
by the algorithms due to uncertainty—are logarithmically
increasing in time and thus the time average reward converges
to the optimal average reward. Moreover, we also illustrate how
the regret depends on the size of the action space, and we show
that this relationship is influenced by the informativeness of the
reward structure with regard to each agent's individual action.
When the overall reward is fully informative, regret is shown to
be linear in the total number of actions of all the agents. When
the reward function is not informative, regret is linear in the
number of joint actions. Our analytic and numerical results show
that the proposed learning algorithms significantly outperform
existing online learning solutions in terms of regret and learning
speed. We illustrate how our theoretical framework can be used in
practice by applying it to online Big Data mining using distributed
classifiers.
Keywords
Big data miningDistributed cooperative learning
Multiagent learning
Multiarmed bandits
Online learning
Reward informativeness