Agreement in Distributed Reinforcement Learning Paulina Varshavskaya MIT CSAIL paulina@csail.mit.edu In a cooperative multi-agent system, be it an insect colony, a school of fish, or a team of robots, individuals make decisions and behave based only on locally perceived information. This information seems inadequate for the kinds of complex behaviors observed in colonies of natural organisms, or desired of teams of artificial robots. However, neighbor exchange of such local information can lead to individuals approximating global state variables and decisions well enough. One class of such algorithms are agreement (consensus) algorithms, detailed in a general framework of distributed computation by Bertsekas and Tsitsiklis (1997). Consensus-based algorithms have been used, for example, in the biological modeling of motion of schools of fish or other flocking systems (Vicsek et al 1995). In robotics, they have been used in control theory and sensor networks in a similar manner. We combine the basic agreement algorithm in a synchronous, discrete-time distributed system with a reinforcement learning algorithm which learns by Gradient Ascent in Policy Space (GAPS) (Peshkin 2001) to improve the speed and reliability of learning. Individual robotic agents communicate their current local estimates of 1) rewards, and 2) experience, to near neighbors. This enables learning of good global behaviors in a fully distributed manner in cases where not communicating this information is detrimental to learning. We demonstrate that with experiments in a 2D simulator of a lattice-based self-reconfiguring modular robot, which learns locomotion by self-reconfiguration. References: D.P.Bertsekas and J.N.Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Athena Scientific 1997. T.Vicsek, A.Czirok, E.Ben-Jacob, I.Cohen and O.Schochet. Novel type of phase transition in a system of self-driven particles. in Physical Review Letters 75(6), 1995. L.Peshkin. Reinforcement Learning by Policy Search. PhD Dissertation. Brown University. November 2001.