Agreement in Distributed Reinforcement Learning
Paulina Varshavskaya
MIT CSAIL
paulina@csail.mit.edu

In a cooperative multi-agent system, be it an insect colony, a school
of fish, or a team of robots, individuals make decisions and behave
based only on locally perceived information. This information seems 
inadequate for the kinds of complex behaviors observed in colonies of
natural organisms, or desired of teams of artificial robots. However,
neighbor exchange of such local information can lead to individuals
approximating global state variables and decisions well enough. One
class of such algorithms are agreement (consensus) algorithms,
detailed in a general framework of distributed computation by
Bertsekas and Tsitsiklis (1997). Consensus-based algorithms have been
used, for example, in the biological modeling of motion of schools of
fish or other flocking systems (Vicsek et al 1995). In robotics, they
have been used in control theory and sensor networks in a similar
manner.  

We combine the basic agreement algorithm in a synchronous,
discrete-time distributed system with a reinforcement learning
algorithm which learns by Gradient Ascent in Policy Space (GAPS)
(Peshkin 2001) to improve the speed and reliability of
learning. Individual robotic agents communicate their current local 
estimates of 1) rewards, and 2) experience, to near
neighbors. This enables learning of good global behaviors in a fully
distributed manner in cases where not communicating this information
is detrimental to learning. We demonstrate that with experiments in a
2D simulator of a lattice-based self-reconfiguring modular robot,
which learns locomotion by self-reconfiguration. 

References:

D.P.Bertsekas and J.N.Tsitsiklis. Parallel and Distributed
Computation: Numerical Methods. Athena Scientific 1997.

T.Vicsek, A.Czirok, E.Ben-Jacob, I.Cohen and O.Schochet. Novel type of
phase transition in a system of self-driven particles. in Physical
Review Letters 75(6), 1995.

L.Peshkin. Reinforcement Learning by Policy Search. PhD
Dissertation. Brown University. November 2001.