Learning Through Active Exploration

Yuwei Cui•Research Intern

When I first arrived at Silicon Valley for my internship, the entire environment
looked new. It took me several weeks to get familiar with my new neighborhood.
Interestingly, even with advanced GPS apps on my phone, the most effective way
to learn a new environment is to walk on the street, memorizing landmarks, and
making different turns at intersections. The GPS app could give me smart
directions, but I could not really learn the world just by staring at the map.
Knowledge comes from practice, and we always learn something through active
exploration. Nevertheless, most artificial intelligence techniques adopt data
intensive machine learning approaches. Algorithms are trained to find patterns
by observing massive amounts of data passively, usually without generating any
actions.

At Numenta, we are working on a next-generation machine intelligence algorithm
that learns complex patterns through active exploration. The stream of sensory
inputs is actively generated by execution of a series of motor commands. We call
this new learning paradigm sensorimotor learning and prediction, or SLAP. To
understand how the algorithm works, let us first think of how our brain solves
the same problem.

We know that all our remarkable cognitive abilities, object recognition, scene
interpretation, reasoning and prediction, starts from data streams collected by
our “sensors”, such as retina at the back of eyes, tactile sensors under the
skin and auditory sensors in the cochlea. Believe it or not, most of the inputs
to the sensors are actually generated by ourselves, rather than by changes in
the external world. Our eyes are constantly moving; our touch senses mostly
arise from our own body movement, and the speech we generated is also picked up
by our auditory nerves. After we learn a new environment, we are rarely
surprised by the consequences of our own actions. I can predict exactly what I
will see after each turn on my way to work now. This prediction is based on my
current sensory input and the motor command I am going to execute. Moreover,
despite dramatic changes of input to my sensors, my internal perception is
stable. These two aspects reflect two component of the algorithm. We call the
prediction step “sensorimotor inference”, and the process of building stable
representations as “temporal pooling”.

Jeff Hawkins described the basic ideas on the
NuPIC mailing list. During my internship I
implemented and worked on several SLAP experiments using synthetic
datasets. In one experiment, we trained the SLAP algorithm to recognize a large
number of synthetic images composed of “squares”. Each square is painted with
different color and different images share the same set of colors. Two example
images are shown below. A white diamond represents the portion of the image that
lies on the fovea, and a black arrow represents the proposed motor command.

The algorithm is allowed to explore each image through simulated eye-movements.
At each step the image under the fovea and the motor command is fed to the
algorithm. The first layer of the network learns to make predictions of the next
sensory input. The algorithm also utilizes a reasonable assumption that if two
things are close to each other in time (temporal proximity), they tend to
originate from the same underlying cause, and thus should be grouped together.
Neurons of the second layer “pool” over many neurons in the first layer, and
form a stable representation that is unique to each pattern. During learning,
stable representations will emerge despite changes to the input in the first
layer. These stable representations indicate recognition of the larger image.

The figure below shows example output from a trained system while the eyes are
moving around two different images (10 iterations for each image). At each step,
the sensory input is changing drastically. However, the overall output is a
stable and unique representation for each image.

This simple example illustrates a fundamental mechanism our brain uses to create
stable representations from a changing world. The same mechanism can also be
used for a large variety of problems where the sensor data is actively generated
by the system, such as robot learning, vehicle control, and complex pattern
recognition/detection problems.