Autonomous Mental Development

Traditional approaches to machine intelligence require human designers to explicitly program task-specific representation, perception and behaviors, according to the tasks that the machine is supposed to execute. However, AI tasks require capabilities such as vision, speech, language, and motivation which have been proved to be too muddy to program effectively by hand.

What is Autonomous Mental Development

Autonomous development is the nature's approach to human intelligence. The goal of this line of research is to understand human intelligence and to advance artificial intelligence. For the former, we need to understand how the brain acquires its rich array of capabilities through autonomous development. For the latter, we aim at machine's human-level performance through autonomous development.

Probably the two most striking hallmarks of autonomous mental development (AMD) are:

task nonspecificity,

the skull is closed throughout the brain's lifetime learning.

By "task nonspecificity", we mean that the genome (i.e., natural developmental program) is responsible for an open variety of tasks and skills that a newborn will learn through the life. Those tasks (including the task environment and task goals) are unknown to the prior generations before birth. Thus a developmental program must be able to regulate the development of a variety of skills for many tasks.

By "skull-closed", we mean that the teacher is not allowed to pull out subparts from the brain, define their roles and the meanings of the input ports and output ports, train them individually outside, put them back into the brain and then manually link them. During the autonomous development of a natural or an artificial brain, the teacher has access only to two ends of the brain, its sensors and its effectors, not directly to its internal representation. Therefore, a developmental program must be able to regulate the development of internal representations using information of these two ends. Of course, internal sensation and internal action (i.e., thinking) are also important for the development.

In contrast, all the traditional machine learning methods, including many neural network methods, require an "open skull" approach. That is, their machine development is not fully autonomous inside the "brain". This way, the holistically-aware central controller, at least at the linking ports of the separately trained subparts, is the human teacher. This holistically-aware central controller implants static representations into the artificial "brain", which makes the "brain" brittle because no static representation appears sufficient for dynamic real-world environments.

This new AMD approach is motivated by human cognitive and behavioral development from infancy to adulthood. It requires a fundamentally different way of addressing the issue of machine intelligence. We have introduced a new kind of program: a developmental program. A robot that develops its mind through a developmental program is called a developmental robot.

For humans, the developmental program is in the genome inside the nucleus of every cell. According to the genomic equivalence principle, dramatically demonstrated by animal cloning, the genome is identical across all cells of a human individual. It starts to run at the time of conception of each life. This program is responsible for whatever can happen through the entire life. For machines, the developmental program starts to run at the ``birth'' time of the robot or machine, which enables the robot to develop its mental skills (including perception, cognition, behaviors and motivation) through interactions with its environment using its sensors and effectors. For machines to truly understand the world, the environment must be the physical world which includes the human teachers and the robot itself.

The concept of a developmental program does not mean just to make machines grow from small to big and from simple to complex. It must enable the machine to learn new tasks that a human programmer does not know about at the time of programming. This implies that the internal representation of any task that the robot learns must be internally emerge though interactions, a well known holy grail in AI and a great myth about the brain.

Development does not mean learning from tabular rasa either. The developmental program, the body with sensors and effectors, and the intelligent environment (with humans who are smart educators!) are all important for the success of development. Innate behaviors such as those present at birth can greatly facilitate early mental development.

The basic nature of developmental learning plays a central role in enabling a human being to incrementally scale his level of intelligence from the ground up. In order to scale up the machine's capability to understand what happens around it, the learning mechanism embedded in a developmental program must perform systematic self-organization, according to what it sensed, what it did, the actions imposed by the human when necessary, the punishment and reward it received from the humans or environment, the novelty it predicted, and the context. As a fundamental requirement of scaling up, the robot must develop its value system (also called the motivational system) and must gradually develop its skills of autonomous thinking.

Why autonomous mental development

A traditional common wisdom is that artificial intelligence should be studied within a narrow scope otherwise the complexity is out of hand. The developmental approach aims to provide a broad and unified developmental framework with detailed developmental algorithms, which is applicable to a wide variety of perceptual capabilities (e.g., vision, audition, and touch), cognitive capabilities (e.g., situation awareness, language understanding, reasoning, planning, communication), behavioral capabilities (e.g., speaking, dancing, walking, playing music, decision making, task execution), motivational capabilities (e.g., pain avoidance, pleasure seeking, what is right and what is wrong) and the fusion of these capabilities. By the very nature of autonomous development, a developmental program does not require humans to manually manipulate the internal content of the brain, which drastically reduces the man-hour cost of system development. This fully autonomous development inside the brain is also a great advantage for scaling up to human level performance. From the vast amount of evidence from natural intelligence, I predict that the singularity (machine is more intelligent than humans) will never happen in a general sense without autonomous mental development.

Some recent evidence in neuroscience has suggested that the developmental mechanisms in our brain are probably very similar across different sensing modalities, and across different cortical areas. This is a good news since it means that the task of designing a developmental program is probably more tractable than traditional task-specific programming.

Although a developmental program is by no means simple, the new developmental approach does not require human programmers to understand the domain of tasks nor to predict them. Therefore, this approach not only reduces the programming burden and increases the adaptability to real-world environment, but also enables machines to develop capabilities or skills that the programmer does not have or are too muddy to be adequately understood by the programmer. Therefore, in principle, a developmental machine is capable of being creative for subjects of high value.

Eight requirements for practical AMD

A developmental robot that is capable of practical autonomous mental development (AMD) must deal with the following eight requirements:

Environmental openness: Due to the task-nonspecificity, AMD must deal with unknown and uncontrolled environments, including various human environments.

High-dimensional sensors: The dimension of a sensor is the number of scalar values per unit time. AMD must directly deal with continuous raw signals from high-dimensional sensors (e.g., vision, audition and taction).

Completeness in using sensory information. Due to the environmental openness and task nonspecificity, it is not desirable for a developmental program to discard, at the program design stage, sensory information that may be useful for some future, unknown tasks. Of course, its task-specific representation autonomously derived after birth does discard information that is not useful for a particular task.

Online processing: At each time instant, what the machine will sense next depends on what the machine does now.

Real-time speed: The sensory/memory refreshing rate must be high enough so that each physical event (e.g., motion and speech) can be temporally sampled and processed in real time (e.g., about 15Hz for vision). This speed must be maintained even when a full (very large but finite) physical "machine brain size'' is used. It must handle one-instance learning: learning from one instance of experience.

Incremental processing: Acquired skills must be used to assist in the acquisition of new skills, as a form of ``scaffolding.'' This requires incremental processing. Thus, batch processing is not practical for AMD. Each new observation must be used to update the current complex representation and the raw sensory data must be discarded after it is used for updating.

Perform while learning: Conventional machines perform after they are built. An AMD machine must perform while it ``builds'' itself "mentally.''

Scale up to large memory: For large perceptual and cognitive tasks, an AMD machine must handle multimodal contexts, large long-term memory and generalization, and capabilities for increasing maturity, all in real time speed.

History of Our Developmental Models

2005 - present: the Cortex-like MILN and the brain-like Developmental Networks with embodiments WWN-1 through WWN-8:

- Brain-like development: how the brain is not just a signal processor but also concurrently the developer of the signal processor: brain wires itself through its activities modulated by the genome (a developmental procedure).

- How the brain deals with modulation:
the serotonin system (e.g., punishment and stress), dopamine system (e.g., pleasure), acetylcholine
system (e.g., uncertainty), and norepinephrine system (e.g.. unexpected uncertainty)

- An account of WWN-1 to WWN-8:

WWN-1 (2008):
Learning a general single object directly from cluttered background without presegmentation: from location to type (general recognition task) and from type to location (general detection task) using the same network for the two tasks.

WWN-2 (2010):
In addition to WWN-1, also perform free-viewing mode: object pops up from cluttered scenes.

WWN-3 (2010):
In addition to WWN-2, deals with also multiple objects in cluttered scenes: autonomous attention to an object from multiple objects and cluttered scenes.

WWN-4 (2010):
In addition to WWN-3, found that a rigid cascade (deep learning) architecture is worse than emergence from shallow learning (brain areas emerges from a unified brain architecture). The static MM and PP assignment in the ventrial and dorsal pathways, respectively, is merged from the single unified brain (Y) in later WWN since then.

WWN-6 (2012): Removed pre-developed pulvinar which is useful to avoid learning background. Instead, use synapse maintenance (ACh and NE systems) and the skull is fully closed during development for learning location and type concepts.

WWN-7 (2013): In addition to WWN-6, learn multiple scales for each object (e.g., nose, eyes-and-nose, and face) while the skull is fully closed during development and also deals with short time (video.)

- SASE appears the first model that raised internal sensation and internal action

- SHM uses unsupervised incremental feature development. It appears to be the first model that incrementally develops a complete set of hierarchical receptive fields

- IHDR: incrementally growing, incremental mixture of Gaussians. It appears the first model that used continuous motor signal to perform mixture of Gaussian modeling, going beyond the restriction of using discrete class labels in LDA.

It appeared to be the first deep learning network that adapts its connection structure.

It appeared to be the first visual learning program for both detecting and recognizing general objects from cluttered complex natural background.

It also did segmentation, but in another separate top-down segmentation phase while the network did not do recognition.

The number of neural planes dynamically and incrementally grew from interactive experience, but the number of layers (15 in the experiments) was determined by the image size.

All the internal network learning was fully automatic --- there was no need for manual intervention once the learning (development) had started.

It required pre-segmentation for teaching: A human outlined the object contours for supervised learning. This avoided learning background.

Its internal features were automatically grouped through the last-layer motor supervision (class labels) but learnings of internal features were all unsupervised.

It uses local match-and-maximization paired-layer architecture which corresponds to logic-AND and logic-OR in multivalue logic (Tommy Poggio used a term HMAX later).

The intrinsic convolution mechanism of the network provided both shift invariance and distortion tolerance. (Later WWNs are better in learning location as one of concepts.)

It is a cascade network: features in a layer are learned from features of the previous layer, but not earlier. (This cascade restriction was overcome by the later WWNs.)

Inspired by Neocognitron (K. Fukushima 1975) which was for recognition of individual characters in a uniform background.

Miscellaneous

How long can a developmental robot live? When the hardware of a developmental robot is worn or broken, the developmental program with its learned "brain" can be downloaded from the robot and uploaded to a new robot body. Therefore, unlike a biological brain, a developmental robot can live "mentally" as long as we humans like. It can have a very old "mental age" but a very young "body age."

Theory about brain-mind:

J. Weng, "``Brains as Naturally Emerging Turing Machines,'' in Proc. International Joint Conference on Neural Networks, Killarney, Ireland, pp. +1-8, July 12-17. 2015. PDF file. (This paper explains that the control of a Turing Machine, regular or universal, is an Finite Automaton and therefore a DN can learn any universal Turing Machine one transition at a time, immediately, error-free (if there is sufficiient neuronal resource), and optimal in the sense of maximum likelihood if there is no sufficient neuronal resource.) J. Weng, "Establish the Three Theorems: Brain-Like DNs Logically Reason and Optimally Generalize'', in Proc. International Conference on Brain-Mind, July 27 - 28, East Lansing, Michigan, pp. +1-8, 2013. PDF file. (This paper explains (1) how DP self-programs
the FA-equivalent logic into a DN through observations of the FA's input-output streams; (2) why the learning is immediate and error free; (3) why the DN is optimal in maximum likelihood if it is exposed to
infinitely many observations from the real physical world using the same input-output ends while the DP is not allowed to continue to self-program; (4) why the DN is optimal in maximum likelihood it is exposed to infinitely many observations from the real physical world using the same input-output ends while the DP is not allowed to continue to self-program.) J. Weng, ``A Theory on the Completeness of the DN Logic Capability,'' In Proc. International Conference on Brain-Mind, July 14-15, East Lansing, Michigan, USA, pp. 35-42, 2012. PDF file. (This paper presents a theory that the apparent capability for abstraction in general, and for logic in particular, is in the eyes of the human observers. This paper presents a theory for a DN to learn any practical logic, including proposition logic, classical conditioning, instrumental conditioning, language acquisition, and autonomous planning.) J. Weng, "Why Have We Passed `Neural Networks Do not Abstract Well'?'', Natural Intelligence: the INNS Magazine, vol. 1, no.1, pp. 13-22, 2011. PDF file. (This paper presents why a new class of neural networks DN performs abstraction as well as the common basis model of state-based symbolic models: Finite Automaton, and they are further spatiotemporally optimal.)J. Weng, "Three Theorems: Brain-Like Networks Logically Reason and Optimally Generalize," in Proc. Int'l Joint Conference on Neural Networks, San Jose, CA, pp. +1-8, July 31 - August 5, 2011. PDF file. (Coined the new terms "symbolic network", "emergent network", "Developmental Network (DN)", "Generative DN (GDN)", and Agent Finite Automaton (AFA). Claimed, with the proofs under review by a journal, that (1) a GDN can learn any AFA incrementally, immediately and error free, (2) when such a DN is frozen for new experience it is optimal in the sense of maximal likelihood, and (3) when such a DN is allowed to continue to learning for new real-world experience it thinks optimally in the sense of maximal likelihood.)J. Weng, "A 5-Chunk Developmental Brain-Mind Network Model for Multiple Events in Complex Backgrounds,'' International Joint Conference on Neural Networks, July 18-23, Barcelona, Spain, pp. 1-8, 2010. PDF file. (The first brain-mind model in the 5-chunk scale: development, architecture, area, space and time.) DOI: 10.1109/IJCNN.2010.5596740J. Weng and W. S. Hwang, "From Neural Networks to the Brain: Autonomous Mental Development'' IEEE Computational Intelligence Magazine, vol. 1, no. 3, pp. 15-31, August 2006. PDF file.

Knowledge Hierarchy in Developmental Networks:

Q. Guo, X. Wu, and J. Weng, "Cross-Domain and Within-Domain Synaptic Maintenance for Autonomous Development of Visual Areas,'' in Proc. the Fifth Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, Providence, RI, pp. +1-6, August 13-16, 2015. PDF file. (Abstract concepts and objects are used to build relations.)K. Miyan and J. Weng, ``WWN-Text: Cortex-Like Language Acquisition with ’What’ and ’Where’,'' in Proc. IEEE 9th International Conference on Development and Learning,'' Ann Arbor, pp. 280-285, August 18-21, 2010 . PDF file. (Text as perception, early language learning and early language generalization.)