Curiosity and Information-Seeking in Cognitive Development

Curiosity, intrinsic motivation and information seeking
in the self-organization of cognitive development

Humans, and some other animals, devote much time and energy to exploring and obtaining information, and sometimes the search for information can be independent of a foreseeable profit, as if learning were reinforcing in and of itself. This is associated to our high degree of curiosity, our intrinsic desire to know andunderstand.

Such intrinsic motivation mechanisms are observed during the whole life, from infant’s spontaneous exploration of their body and external objects to adults reading novels or conducting research.

Together with my colleagues, we have been studying these mechanisms of curiosity-driven learning and information seeking within a systemic and multidisciplinary approach, modeling them using algorithmic and robotic tools in constant dialog with developmental psychology, neuroscience, and statistical theories of learning.

Further identifying the richness and variety of intrinsic motivation mechanisms, this led us to establish fundamental links between curiosity-driven learning and cognitive development. In particular, we have shown that such mechanisms can self-organize complex developmental structures, where stages of increasing behavioral and cognitive complexity spontaneously form. For example, we showed that an intrinsic drive pushing a robot to search situations where it experiences learning progress can spontaneously lead it to first explore and discover its own body, then external object affordances, and finally vocal and proto-linguistic interaction with others.

In such a vision, a limited set of meta-cognitive structures allow a learner to autonomously and actively select and order its own learning experiences, creating its own curriculum where skills, including the manipulation of external objects, get naturally sequenced towards increasing complexity.

In another related strand of research, detailed on this page, we have also used and extended these models to engineering purposes, and showed that such curiosity-drive active learning mechanisms have strong properties of robustness and efficiency for lifelong robot learning of sensorimotor skills, especially in the context of strategic and life-long learning.

Overview (Curiosity-driven Developmental Process and the Evolution of Language)

IAC and the Playground Experiment. In particular, the IAC architecture and its implications were studied in a series of experiments, called the Playground Experiments (Oudeyer and Kaplan, 2006; Oudeyer et al., 2007). Figure 1 illustrates the cognitive architecture employed by the IAC. Prediction learning plays a central role in the IAC architecture. In particular, there are two specific modules in the model that predict future states. First, the “Classic Machine learner” M is a machine that learns a forward model. The forward model receives as input the current sensory state, context, and action, and generates a prediction of the sensory consequences of the planned action. An error feedback signal is provided on the difference between predicted and observed consequences, and allows to update the forward model . Second, the “Meta Machine learner” metaM receives the same input as M, but instead of generating a prediction of the sensory consequences, metaM learns a meta-model that allows to predict how much the errors of the lower-level forward model will decrease in local regions of the sensorimotor space, i.e. modeling learning progress locally. In order to deal with the difficulties of generalization and high-dimensional continuous spaces, an associated categorization mechanism progressively splits the sensorimotor space in sub-regions, for example by maximizing their differences in predictability (Baranes and Oudeyer, 2009), and focusing its refinement of categorization in regions where learning progress is maximal. Then, in each observed context/state, an action selection system chooses stochastically which actions to experiment so as to maximize expected learning progress. Such a system allows the robot to automatically avoid experimenting actions which outcome is either trivial or too difficult to predict/learn at a given moment of development, while first focusing on simple actions and progressively shifting to more complex ones.

In order to evaluate the IAC architecture in a physical implementation, the Playground Experiments were developed (Oudeyer and Kaplan, 2006; Oudeyer et al., 2007). During the experiment, a quadruped robot is placed on an infant play mat and presented with a set of nearby objects, as well as an “adult” robot caretaker (see Figure 2). The robot is equipped with four kinds of motor primitives parameterized by several continuous numbers and which can be combined, thus forming an infinite set of possible actions: (a) turning the head in various directions; (b) opening and closing the mouth while crouching with various strengths and timing; (c) rocking the leg with various angles and speed; (d) vocalizing with various pitches and lengths. Similarly, several kinds of sensori primitives allow the robot to detect visual movement, salient visual properties, proprioceptive touch in the mouth, and pitch and length of perceived sounds. For the robot, these motor and sensori primitives are initially black boxes and he has no knowledge about their semantics, effects or relations. The IAC architecture is then used to drive the robot’s exploration and learning purely by curiosity, i.e. by the search of learning progress. The nearby objects include an elephant (which can be bitten or “grasped” by the mouth), a hanging toy (which can be “bashed” or pushed with the leg) and an adult robot “caretaker” pre-programmed to imitate the learning robot when the latter looks at the adult while vocalizing at the same time.

Figure 2: The Playground Experiment: a quadruped robot explores and learn physical and social affordances through curiosity-driven learning.

Open-ended and embodied acquisition of skills. A key finding from the Playground Experiments is the self-organization of structured developmental trajectories, where the robot explores objects and actions in a progressively more complex stage-like manner, while acquiring autonomously diverse affordances and skills that can be reused later on. As a result of a series of runs of such experiments, the following developmental sequence is typically observed:

In a first phase, the robot achieves unorganized body babbling;

In a second phase, after learning a first rough model and meta-model, the robot stops combining motor primitives, exploring them one by one, but each primitive is explore itself in a random manner;

In a third phase, the robot now begins to experiment actions towards zones of its environment where the external observer knows there are objects (the robot is not provided with a representation of the concept of “object”), but in a non-affordant manner (e.g. it vocalizes at the non-responding elephant or bashes the adult robot which is too far to be touched);

In a third phase, the robot now explores affordant experiments: he first focuses on grasping movements with the elephant, then shifts to bashing movements with the hanging toy, and finally shifts to exploring vocalizing towards the imitating adult robot.

In the end, the robot has learnt sensorimotor affordances with several objects, as well as social affordances with a peer, and masters multiple skills, yet none of these specific objectives where pre-programmed in the beginning. They self-organize through the dynamic interaction between intrinsic motivation, statistical inference, the properties of the body, and the properties of the environment.

New hypothesis for infant development. Two aspects of this outcome can be noted. First, it shows how an IM system can drive a robot to learn autonomously a variety of affordances and skills for which no engineer provided beforehand specific reward functions. Second, the observed process spontaneously generates three properties of infant development so far mostly unexplained:

Self-Organized and Active Staged development: Qualitatively different and more complex behaviours and capabilities appear along with time, and in a non-linear manner. Such unfolding is highly described in developmental psychology, but little principled explanation currently exists. The Playground Experiment provides the intriguing hypotheses that IM driven exploration, in dynamic interaction with the body and environment, could explain important aspects of how this unfolding can be made spontaneously (thus for example without an internal pre-programmed schedule that specifies to the organism what to do and when to do it). In particular, it suggests that developmental stages could be attractors of the dynamical system formed by the interaction between learning, curiosity, the body and the environment, and since this dynamical system continuously seeks for learning progress, and thus changes itself, the attractors are themselves changing, leading to novel developmental states.

The regularities/diversity duality in developmental structures: The typical developmental trajectory described above is only the most frequent emerging trajectory. No two trajectories are exactly the same (e.g. the order of action exploration in the fifth phase might change). And in some experiments, with the same robot, same mechanism, same environment, widely different trajectories can happen. The whole IM/body/environment system can be seen as a dynamical system with various attractors, and stochasticity can sometimes drive it in local minima far from the main attractor(s) (Thelen and Smith, 1993). Thus, this also suggests a novel principled IM-based mechanism to explain the duality regularities/diversity widely observed in infant development;

The origins of the self/object/other distinction.The categorization system associated in such an IM architecture generates also a progressive internal development of cognitive categories which complement the above described behavioral and skill development. As explained in (Kaplan and Oudeyer, 2007b, Oudeyer et al., 2007), such a mechanism can indeed allow the learning agent to progressively form fundamental categorical distinctions between “self”/”physical objects”/”others”, which are central in infant development.

The origins of imitation:

Early Development of Communication and Language: Through the same general mechanism, the robot both explores and learns how to manipulate objects and how to vocalize to trigger specific responses from a conspecific. While vocal babbling (Oller, 2000), and more generally language play and games, have been shown to be key in infant language development, an associated ad hoc motivation if typically assumed both in developmental psychology and computational models. The Playground Experiment suggests that the exploration and learning of communicative behavior might be at least partially explained by general intrinsically motivated exploration of the body affordances (Oudeyer and Kaplan, 2006). A more detailed study showed that curiosity-driven exploration of vocalizations can allow to reproduce aspects of developmental change in vocal babbling observed in human infants (Moulin-Frier and Oudeyer, 2012). Further analysis of the links between IM, sensorimotor, social and language development can be found in (Kaplan et al., 2008).

Intrinsic motivation systems can be conceptualized as one among many interacting mechanisms that help organisms (natural or artificial) to explore and learn efficiently in very large sensorimotor spaces. Such other mechanisms include social guidance (e.g. imitation learning), cognitive abstraction (e.g. unsupervised perceptual learning that creates internal concepts or goals out of raw sensorimotor values), embodiment and maturation (i.e. evolution of morphological properties of the body). The following article discusses the importance of integrating these mechanisms within an entire cognitive system:

Selected video talks

Selected experiments videos

The Playground Experiment. We have built an experimental setup, called the Playground Experiment, which allowed to show how the curiosity algorithm which we developped allows for the self-organization of developmental trajectories with sequences of behavioural stages of increasing complexity (Oudeyer et al., 2007,Oudeyer and Kaplan, 2006).

Learning omnidirectional quadruped locomotion.In this experiment, we showed how the successive architectures we developped allow a quadruped robot, initially equipped with parameterized motor primitives in the form of a 24 dimensional oscillator (sinuses with various parameters in most of the joints), learns to use these motor primitives to locomote precisely in all directions and in varied manners. In the article (Baranes and Oudeyer, 2013), we study extensively a physical simulation of this experimental setup with active learning algorithms.