Thursday, October 24, 2013

The Analogical Instinct and the One Cortical Algorithm

The nature vs. nurture debate still remains a contentious topic in cognitive science. From an artificial intelligence (AI) researcher's point of view, the possibility that the brain learns largely from experience and the innate structure is minimal offers hope of a tractable and quick route to AI.

When I was a kid, I dreamed of building smart robots that could think like people, but when I got to college (CMU 1993-97) and came face-to-face with the AI research of the day, I gave up. Back then, the prevailing wisdom was that human intelligence derived from many simple agents working in concert, what MIT's Marvin Minsky called a "Society of Mind". To achieve AI, it was believed, engineers would have to build and combine thousands of individual computing units or agents. One group of agents, or module, would handle vision, another language, and so on. It seemed too formidable a task. Later, as a professor, I discouraged my students from pursuing the AI dream and getting similarly frustrated.

AI researchers suffer from a common cultural-philosophical disposition: They would like to explain intelligence in the image of what was successful in physics — by minimizing the amount and variety of its assumptions. But this seems to be a wrong ideal. We should take our cue from biology rather than physics because what we call thinking does not directly emerge from a few fundamental principles of wave-function symmetry and exclusion rules. Mental activities are not the sort of unitary or elementary phenomenon that can be described by a few mathematical operations on logical axioms. Instead, the functions performed by the brain are the products of the work of thousands of different, specialized subsystems, the intricate product of hundreds of millions of years of biological evolution.

Ng continues:

But then, in 2004, I ran into the "one algorithm" hypothesis, popularized by Jeff Hawkins, and it changed the course of my career, reigniting a passion for general AI. For the first time in my life, it made me feel like it might be possible to make some real progress on the AI dream within our lifetime.

When I first read Mountcastle's paper I nearly fell out of my chair.Here was the Rosetta stone of neuroscience — a single paper and a single idea that united all the diverse and wondrous capabilities of the human mind. It united them under a single algorithm. In one step it exposed the fallacy of all previous attempts to understand and engineer human behavior as diverse capabilities. I hope you can appreciate how radical and wonderfully elegant Mountcastle's proposal is. The best ideas in science are always simple, elegant, and unexpected, and this is one of the best. In my opinion it was, is, and will likely remain the most important discovery in neuroscience.

Hawkins laments that:

Scientists and engineers have for the most part been ignorant of, or have chosen to ignore, Mountcastle's proposal. When they try to understand vision or make a computer that can "see," they devise vocabulary and techniques specific to vision. They talk about edges, textures, and three-dimensional representations. If they want to understand spoken language, they build algorithms based on rules of grammar, syntax, and semantics. But if Mountcastle is correct, these approaches are not how the cortex solves these problems, and are therefore likely to fail. If Mountcastle is correct, the algorithm of the cortex must be expressed independently of any particular function or sense. The cortex uses the same process to see as to hear. The cortex does something universal that can be applied to any type of sensory or motor system.

Neuroscientists like to point out that all parts of the cerebral cortex look pretty much alike — not only the different parts of the human cortex, but also the cortices of different animals. One could draw the conclusion that all mental activity in all animals is the same. But a better conclusion is that we cannot simply look at a patch of cortex and read out the logic in the intricate pattern of connectivity that makes each part do its separate thing. In the same way that all books are physically just different combinations of the same seventy five or so characters, and all movies are physically just different patterns of charges along the tracks of a videotape, the mammoth tangle of spaghetti of the cortex may all look alike when examined strand by strand. The content of a book or a movie lies in the pattern of ink marks or magnetic charges, and is apparent only when the piece is read or seen. Similarly, the content of cortical activity lies in the patterns of connections and patterns of activity among the neurons. Minute differences in the details of the connections may cause similar looking cortical patches to implement very different programs. Only when the program is run does the difference become evident.

A good metaphor for understanding the brain is Noam Chomsky's "mental organ". An organ of the body is a specialized structure tailored to carry out a particular function. The heart circulates the blood because it is built like a pump; the lungs oxygenate the blood because they are built like gas exchangers. The lungs cannot pump blood and the heart cannot oxygenate it. This specialization goes all the way down. Heart tissue differs from lung tissue, heart cells differ from lung cells, and many of the molecules making up heart cells differ from those making up lung cells. If that were not true, our organs would not work.

In short, it is clear that the body is not made of spam but has a heterogeneous structure of many specialized parts. All this is likely to be true of the cortex. Whether or not we establish exact boundaries for the cortical areas, it is clear that it cannot be made of cortical spam but must have a heterogeneous structure of many specialized parts.

Therefore, the cortex, I claim, is not a single organ but a system of organs, which we can think of as psychological faculties or mental modules. A jack-of-all-trades is master of none, and that is just as true for the cortex as for our physical organs. Our physical organs owe their complex design to the information in the human genome, and so, I believe, do our mental organs. We do not learn to have a pancreas, and we do not learn to have a visual system, language acquisition, common sense, or feelings of love, friendship, and fairness.

No single discovery proves the claim (just as no single discovery proves that the pancreas is innately structured), but many lines of evidence converge on it. The one that most impresses me is research in artificial intelligence and robotics. All of the programs designed by AI researchers have been specially engineered for a particular function, such as language, vision, movement, or one of many different kinds of common sense. Each of these major engineering problems is unsolvable without making assumptions about the laws that hold in that arena of interaction with the world. I predict that no one will ever build a human like robot, and I mean a really humanlike robot, unless they pack it with computational systems tailored to different problems.

The entities now commonly evoked to explain the mind such as general intelligence, a capacity to form culture, or a universal learning algorithm will surely go the way of protoplasm in biology and of earth, air, fire, and water in physics. These entities are so formless, compared to the exacting phenomena they are meant to explain, that they must be granted near magical powers.

Pinker makes some very good points, but to resolve the issue one way or the other requires deeper knowledge of neuroscience, to which Pinker seems to have an aversion to. Early in his book, Pinker says:

This book is about the brain, but I will not say much about neurons, hormones, and neurotransmitters. That is because the mind is not the brain but what the brain does, and not even everything it does, such as metabolizing fat and giving off heat. Information and computation reside in patterns of data and in relations of logic that are independent of the physical medium that carries them.

This aversion to neuroscience is also seen in the writings of Minsky and Chomsky, and may be traced to taking the "brain is a computer" analogy too far. For example, Minsky has said things to the effect:

As Turing showed, computers can do anything a brain can do. The brain is very likely full of inefficiencies and evolutionary "legacy code", a several hundred million year old kludge. Studying brains is more likely to limit your thinking and confuse you. So why constrain your thinking by the biological messiness of nature's computer? Instead, it is better to study the ultimate limits of computation as best expressed in digital computers and write programs that first match and then surpass human abilities. Consider how we succeeded in building flying machines. All early attempts at imitating the flapping action of winged animals failed. The Wright brothers finally succeeded by using fixed wings and propellers. Today we use jet engines instead of propellers. It works and does so, far better than flapping wings.

I believe that more attention to neuroscience may have made them realize that their view of the cortex as many separate organs is outdated.

Neuroscience has come full circle

With Mountcastle's paper, neuroscience came full circle.

Following Darwin, early neuroanatomists noted the remarkable similarity between the brains of man and other primates as more evidence for their common origin. The major difference seemed to be size.

Others such as Santiago Ramón y Cajal, regarded as the father of modern neuroscience, thought this was unlikely and even a little offensive to human dignity. He felt it was impossible that the unique mental faculties of humans: language, the capability of abstraction, the ability to create concepts and reason about them and finally, the art of inventing ingenious instruments, could have come about by a mere expansion in the size of the cortex. Lending support to this view was the fact that the size or weight of a person's brain was a poor predictor of intelligence. Besides, animals like elephants and whales have far larger brains than humans. All this pointed in the direction of there being something qualitatively different about human brains in general and of geniuses in particular. Thus was found the new field of cortical cytoarchitectonics, at the turn of the twentieth century, whose goal was to find these differences and whose main tools were the microscope and newly developed methods for selectively staining neurons.

Cytoarchitectonists categorized cortical regions based on overall cortical thickness, thickness of the six cortical layers, neuron density, relative proportion of different neuron types, length of horizontal connections, synapse density, etc. Based on such differences, Brodmann in 1909 published his map of 47 cytoarchitectonic regions, which is still the standard cortical map used by neuroscientists. For example many neuroscientists still refer to the primary visual cortex as area 17.

The neurons of the cerebral cortex are arranged in six distinctive
layers. The appearance of the cortex depends on what is used to stain
it. The Golgi stain reveals neuronal cell bodies and dendritic trees.
The Nissl method shows cell bodies and proximal dendrites. A Weigert
stain for myelinated fibers reveals the pattern of axonal distribution. (Kandel 2000)

Later studies based on the effects of electrical stimulation of the cortex and behavioral changes following lesions showed strong correspondence between function and the Brodmann areas, and led to the idea that the cortex consisted of separate "mental organs" each specialized for a particular function.

In the late 1940s the connectivity between the sense organs and the cortex was mapped out, and an interesting coincidence was discovered. Regions of projection of nerve fibers from the sense organs (via the nuclei of the thalmus) coincided exactly with cytoarchitecturally distinct regions. Further studies led to the general conclusion that a cortical area may be defined both by its cytoarchitecture and as the zone of (axonal) projection of a specific thalmic nucleus.

At first this might not seem like a big deal. After all, one can interpret this as further evidence for specialized mental organs. There is nothing surprising that a region specialized for vision should receive visual input from the retina and so on. Infact, this is to be expected. However, if we look more closely at the layered organization of the cortex, the different neuron types in each layer and their roles, it becomes clear that the cytoarchitectural differences identified by neuroanatomists are more likely caused by differences in input and output connections rather than the result of functional specialization.

Cytoarchitecture of different cortical regions: Sensory cortices, such
as the primary visual cortex, tend to have very prominent internal
granule cell layers or layer IV, where incoming fibers make their
connections, while the primary motor cortex has a very meager layer IV
but a prominent layer V, from which large pyramid shaped cells called Betz
cells send axons to the various muscles of the body. (Kandel 2000)

In the paper, Mountcastle says:

That cytoarchitectural differences between areas of neocortex reflect differences in their patterns of extrinsic connections suggests that the neocortex is everywhere functionally much more uniform than hitherto supposed and that its avalanching enlargement in mammals and particularly in primates has been accomplished by replication of a basic neural module, without the appearance of wholly new neuron types or of qualitatively different modes of intrinsic organization.

Cytoarchitectural differences may therefore reflect the selection or grouping together of sets of modules in particular areas by certain sets of input output connections. In the primary motor and sensory cortices this selection is made by a single strongly dominant connection, and cytoarchitectural identification of heterotypical areas is clear and striking. Areas of the homotypical eulaminate cortex (95% of man's neocortex) are defined by more evenly balanced sets of extrinsic connections, and here cytoarchitectural differences, while clear, are less striking. Thus a major problem for understanding the function of the neocortex and therefore of the cerebrum is to unravel the intrinsic structural and functional organization of the neocortical module. That module is, I propose, what has come to be called the cortical column.

I have presented this particular argument at length, because I feel that the lack of a picture in the original paper makes the argument hard to understand for a neuroscience novice like myself. Also, Hawkins does no better. In his book, he makes it seem that the entirety of Mountcastle's argument is that cortical areas look similar, which is definitely not strong enough to support such an extraordinary hypothesis.

The rest of Mountcastle's paper strengthens his hypothesis of cortical functional uniformity by presenting evidence of common processing patterns in different areas, vision, sound, touch and motor, based on a common module, the cortical column. The evidence for this is relatively very well presented and I refer you to the paper to convince yourself about the correctness of the hypothesis.

What does the cortical "one algorithm" do?

If we accept Mountcastle's hypothesis, and furthermore accept that cortical areas are connected in a recursive or hierarchical manner, so that the sensory areas apply the "one algorithm" to sensory input, and higher association areas apply it to the output of the sensory areas and so on, we are led to the conclusion that what you/I perceive as the operation of your/my conscious mind also reflects the structure of the "one algorithm".

Most people would agree that a large part of thinking is recalling past experiences similar (or analogous) to our current experience or current thought. Most of the time we hardly pay any attention to this non stop reminding/recalling, but once in a while something (like say the taste of a madeleine cake dipped in lemon tea) reminds you of a decades old fond memory of a similar situation, and makes you marvel at the speed with which such an old memory was brought up and its appropriateness. Also, rarely, an unexpected analogical similarity between totally different concepts is detected, what you might call a flash of genius.

If you reflect a little more, you realize that this reminding is not only non stop but operates at several levels. For example, simply being aware of where you are requires matching your current surroundings to previous experience. Every part of experience, be it sight, sound, touch, taste or smell, reminds you of something similar you have experienced in the past, and a lot of associated things including where and when it was experienced. Of course, common or boring experiences barely register, and it is the novel or interesting ones that make you more aware of this background process.

I believe it is reasonable to say that this ubiquitious analogizing or similarity finding is what Mountcastle's "one algorithm" does.

The cortex-as-many-modules camp, on the other hand, think that the facility for recalling past similar experiences is provided by yet another specialized module, which they call episodic memory.

The ubiquity of analogy in thinking is nothing new and is maybe as old as thinking about thinking itself, but Hawkins deserves credit for probably being the first to connect this to Mountcastle's hypothesis. The following quote is paraphrased from "On Intelligence":

Making predictions by analogy to the past is something you do continually while awake. This occurs along a continuum, ranging from simple everyday acts of perception occurring in sensory regions of the cortex to difficult, rare acts of genius occurring at the highest levels in the cortex. At a fundamental level, everyday acts of perception are similar to the rare flights of brilliance. It is just that the everyday acts are so common we don't notice them.

ALL our reasonings concerning matter of fact are founded on a species of Analogy, which leads us to expect from any cause the same events, which we have observed to result from similar causes. Where the causes are entirely similar, the analogy is perfect, and the inference, drawn from it, is regarded as certain and conclusive. But where the objects have not so exact a similarity, the analogy is less perfect, and the inference is less conclusive; though still it has some force, in proportion to the degree of similarity and resemblance. Rest of the quote

Hofstadter, from Surfaces & Essences: Analogy as the Fuel and Fire of Thinking (2013), gives his definition of intelligence:

Intelligence is the art of rapid and reliable gist-finding, crux-spotting, bull’s-eye-hitting, nub-striking, essence-pinpointing. It is the art of, when one is facing a new situation, swiftly and surely homing in on an insightful precedent (or family of precedents) stored in the recesses of one’s memory. That, no more and no less, is what it means to isolate the crux of a new situation. And this is nothing but the ability to find close analogues, which is to say, the ability to come up with strong and useful analogies.

In the early portions of the same book, Hofstadter notes that everyday language is full of analogies small and large, and in the last chapter shows that the flashes of genius behind Einstein's theories of light quanta, special and general relativity are brilliant analogies. Also, on his homepage, he calls the ability to fluidly create and use analogies as the "holy grail" of AI. For more on his views, check this.

The deeper question of why using analogy is so effective in the "pursuit of Happiness" runs into the problem of induction that Hume is most famously associated with. Though our collective knowledge has increased exponentially since Hume, nobody still knows the ultimate reason why our universe tends to be regular or uniform. Evolution, nevertheless, has exploited this regularity by equipping animals with a cortex that is very effective at recalling similar situations from past experiences, a facility I call the analogical instinct, taking a hint from Hume. This has proven to be a very successful strategy, in turn favoring the avalanching enlargement of the cortex.

If quantum decoherence or whatever it is that keeps the universe from being totally random is "switched off", the analogical instinct and the cortex would be of no use. As Hume put it, in such a universe:

Anything may appear able to produce anything. The falling of a pebble may, for all we know, extinguish the sun!

Cognitive Psychology is also coming full circle

The debate in neuroscience regarding whether the cortex has multiple specialized mental organs or is more uniform has its higher level counterpart in philosophy of mind and its modern version, cognitive psychology, as the dispute between innateismvs. empiricism.

Of course, no one disputes that some things are innate such as instincts and many things can only be learned from experience. For the past few decades, most of the controversy has been about language acquisition by children. Early philosophers regarded it as another manifestation of man's general ability to acquire knowledge and learn skills.

The controversy was kicked off by Chomsky with his critical review (1959) of the behaviorist psychologist B. F. Skinner's 1957 book Verbal Behavior. Chomsky claims his critique applies to empiricism in general, but the bulk of his arguments attack Skinner's behaviorist approach to language, which is literally a lobotomized, straw man version of empiricism. Its no wonder attacking it is like cutting soft butter with a hot knife.

Chomsky has a number of vocal supporters such as Pinker and Gary Marcus who continue to argue that language is mostly innate. Pinker's book titled The Language Instinct presents this view for a general audience. Little surprise that these are the same people who advocate the cortex as many separate organs view.

But things are changing in cognitive linguistics with a new generation of linguists (such as Michael Ramascar's group) who think the generic learning mechanisms that the brain possesses are more than enough to explain language acquisition. Over the last few years they have devised several experiments to demonstrate that this is the case. For a satirical presentation of their views, see this essay by one of Ramscar's grad students: Are toddlers dumber than mice? In other words they say, the analogical instinct is sufficient to explain the language instinct.

From an engineer's point of view, the most important question is how to reverse engineer the analogical instinct to build true artificial intelligence. For this we need to understand the mechanisms that underlie the analogical instinct in the cortex, and decide on how to effectively "translate" these to existing cpu/gpu hardware architectures, using algorithms and datastructures computer scientists and engineers have fine tuned for these architectures over the last several decades.

Several proposals have been made. Hawkins along with Dileep George initially proposed a technique called Hierarchical Temporal Memory taking inspiration from what is known about neural processing in the visual areas. They predictably applied it to vision problems, object recognition and the like and had decent results, but it was not an unqualified success. They have now gone their separate ways, with Hawkins advocating a new technique called cortical memory, developed by his startup, Numenta, while Dileep George has founded a "stealth startup", Vicarious, and all they have let out is the name of their technique: Recursive Cortical Networks.

In academia many researchers have converged on Deep Belief Networks, based on ideas developed by Geoff Hinton, Yann Lecun, and many other artificial neural network pioneers. The impressive (best in domain) results provided by these methods in several narrow, but unrelated domains has led to their adoption by the big tech companies. Hinton has written an excellent review of different techniques that the machine learning community has proposed as the "one algorithm" in: To Recognize Shapes, First Learn to Generate Images.

If there is sufficient interest, I'll cover my own views on this in a future post.

It's constantly surprising to me how often I get related articles coming through my feed reader. In addition to your post in favor of cortical uniformity, there was an article published in PNAS this week challenging cortical uniformity. The article is at http://www.pnas.org/content/early/2013/10/02/1312691110, and a summary of the findings is available on the Kurzweil AI blog at http://www.kurzweilai.net/neuroscientists-find-cortical-columns-in-brain-not-uniform-challenging-large-scale-simulation-models. It seems to me that these findings accord pretty well with the *limited* plasticity found by rewiring cat auditory and visual cortices in the work by Stephen Lomber.

If we think that an "analogical instinct" is the core of human intelligence, then what model can we build of that "one algorithm" that will, modulo some parameters, *predict* human responses in psychological learning experiments?

Joshua Tenenbaum's work in Bayesian conceptual reasoning shows fairly close prediction of human responses with only one free parameter, b, which helps fix the probability that any given sample is an outlier.

It would also be helpful if a theory of general intelligence could tell us, well, what is intelligence? Even cognitive psychologists working directly with human beings who very definitely *have* intelligence have a hard time defining it.