All animals are social -- ranging from bacteria that perform quorum-sensing, all the way to humans with their rich social structures. This raises a natural question: what is the benefit to interaction? I argue that there is a deep reason behind this pattern: interaction allows greater computation.

The idea of emergent computation through interaction has been explored mostly through cellular automata (Crutchfield & Mitchell, 1995; Lizier, Prokopenko, & Zomaya, 2007), e.g., by evolving the updates rules of a cellular automaton so that it solves a task like deciding if the initial state contains more than half 1s.

This idea can also be extended to social cognition in populations of interacting agents (De Jaegher & Froese, 2009). The claim here is that interaction allows participatory sense-making, i.e., meaning emerges through the interaction process. However, there havenÕt been many examples yet of populations of agents that perform non-trivial computations through interaction.

I present an example of just such a system: a population of neural networks that performs complexity regularization implicitly through interaction during learning. This learning method is called the classification game, and the system can be viewed as a model of the emergence of structured language (Swarup & Gasser, 2009, 2010).

Complexity regularization is the problem of finding low complexity solutions to learning problems. Such solutions are expected to generalize better. This is known as OccamÕs Razor (Blumer, Ehrenfeucht, Haussler, & Warmuth, 1987). The standard approach to complexity regularization in neural networks is to augment the objective function with a term for the (appropriately defined) complexity (e.g., Wolpert, 1994; Hochreiter & Schmidhuber, 1997), because merely minimizing the squared error via backpropagation results in overly complex solutions.

In the classification game, in contrast, the objective function is just the squared error, and the complexity regularization is performed implicitly through interaction during learning. In other words, the interaction process alters the dynamics of learning to make accessible regions of the solution space that are otherwise inaccessible to the backpropagation learning algorithm. Here, I analyze the dynamics of the hidden-layer hyperplanes to show how the altered dynamics lead to lowcomplexity solutions. I also show how the complexity of the solution can be tuned by adjusting the learning rates. Details of the analysis are omitted from the abstract for lack of space.

The interaction protocol for the classification game is very simple: at each step we randomly select a pair of neural networks from the population and designate them as speaker and hearer. They are presented with the same training example. The speaker communicates its hidden layer activations to the hearer as a string of symbols where the presence of a symbol means that the corresponding hidden layer node is active. The hearer ignores the training example itself and tries to predict the label based on the symbols received from the speaker. The speaker also generates the label. They are both then given the expected label, whereupon they update their weights by error backpropagation, and are subsequently returned to the population. This process is continued until some termination criterion is met (e.g., the error drops to an acceptable level).

Figure 1 shows an example of the difference between standard backpropagation training and classification game training. We see that with classification game training, the neural networks find solutions of much lower complexity, even though the number of hidden layer nodes available are the same in each case.

The hidden layer hyperplanes provide a segmentation of the sensory space (the input space). In the classification game, the population must come to consensus on this segmentation by negotiating a common internal representation, which is guided by the error signal from the environment. This process can thus be interpreted as the self-organization of a shared communication system which is just complex enough to solve the given task. This low complexity solution not only guarantees better expected generalization, but also provides a scaffold on which higher level cognition (and language) can be built.

Figure 1: In each of these cases, the true decision surface is the half-ellipse, and all points lying inside it are positive. Points outside the half-ellipse are negative. All points are chosen to lie inside the unit square. Figure (a) shows the hidden layer hyperplanes from a single neural network trained on this task. Figure (b) show the result with the classification game. Figure (c) also shows the result with the classification game, but in this case the parameters have been tuned to allow a solution of higher complexity.