Authors:
Adrian Redder
1
;
Arunselvan Ramaswamy
1
and
Holger Karl
2
Affiliations:
1
Department of Computer Science, Paderborn University, Germany
;
2
Hasso-Plattner-Institute, Potsdam University, Germany
Keyword(s):
Policy Gradient Algorithms, Multi-agent Learning, Communication Networks, Distributed Optimisation, Age of Information, Continuous Control.
Abstract:
Distributed online learning over delaying communication networks is a fundamental problem in multi-agent learning, since the convergence behaviour of interacting agents is distorted by their delayed communication. It is a priori unclear, how much communication delay can be allowed, such that the joint policies of multiple agents can still converge to a solution of a multi-agent learning problem. In this work, we present the decentralization of the well known deep deterministic policy gradient algorithm using a communication network. We illustrate the convergence of the algorithm and the effect of lossy communication on the rate of convergence for a two-agent flow control problem, where the agents exchange their local information over a delaying wireless network. Finally, we discuss theoretical implications for this algorithm using recent advances in the theory of age of information and deep reinforcement learning.