Why Deep Learning Is a Hindrance to Progress Toward True AI

However, [Quoc] Le said that the biggest obstacle to developing more truly intelligent computers is finding a way for them to learn without requiring labeled training data—an approach called “unsupervised learning.”

This is interesting because we hear so much buzz lately about how revolutionary and powerful deep learning is and about how truly intelligent machines are just around the corner because of it. And yet, if one digs deeper, one quickly realizes that all this success is happening thanks to a machine learning model that will soon have to be abandoned. Why? Because, as Google Brain research scientist Quoc Le says, it is based on supervised learning.

No True AI Is Coming from the Mainstream AI Community Anytime Soon

I have reasons to believe that true AI is right around the corner but I don't see it coming from the mainstream AI community. Right now, they are all having a feeding frenzy over a soon to be obsolete technology. There is no question that deep learning is a powerful and useful machine learning technique but it works in a narrow domain: the classification of labeled data. The state of the art in unsupervised learning (no labels) has so far been a joke. The accuracy of current unsupervised deep neural networks, such as Google's cat recognition program, is truly abysmal (15% or less) and there is no clear path to success.

Time: The Universal Bottom-up Critic

One of the reasons that the performance of unsupervised machine learning is so pathetic, in my opinion, is that researchers continue to use what I call static data such as pictures to train their networks. Temporal information is simply ignored, which is a bummer since time is the key to the AI kingdom. And even when time is taken into consideration, as in recurrent neural networks, it is not part of a fundamental mechanism that builds a causal understanding of the sensory space. It is merely used to classify labeled sequences.

Designing an effective unsupervised learning machine requires that we look for a natural replacement for the top-down labels. As we all know, supervised or not, every learning system must have a critic. Thus the way forward is to abandon the top-down critic (i.e., label-based backpropagation) and adopt a universal bottom-up critic. It turns out that the brain can only work with temporal correlations of which there are two kinds: sensory signals are either concurrent or sequential. In other words, time should be the learning supervisor, the bottom-up critic. This way, memory is constructed from the bottom up and not top-down, which is as it should be.

The Deep Learning Killer Nobody Is Talking About

Other than being supervised, the biggest problem with deep neural networks is that, unlike the neocortex, they are completely blind to patterns they have never seen before. The brain, by contrast, can instantly model a new pattern. It is obvious that the brain uses a knowledge representation architecture that is instantly malleable and shaped by the environment. As far as I know, nobody in mainstream AI is working on this amazing capability of the brain. I am not even sure they are aware of it.

Conclusion: Be Careful

Sensory learning is all about patterns and sequences of patterns, something that mavericks like Jeff Hawkins have been saying for years now. The trick is to know how to use patterns and sequences to design the correct (there is only one, in my opinion) knowledge representation architecture. Hawkins is a smart guy, probably the smartest guy in AI right now, but I believe a few of his fundamental assumptions are wrong, not the least of which is his continued commitment to a probabilistic approach. As Judea Pearl put it recently, we are not probability thinkers but cause-effect thinkers. And this is coming from someone who has championed the probabilistic approach to AI throughout his career.

In conclusion, I will reiterate that the future of AI is both temporal and non-probabilistic. It may be alright to invest in deep learning technologies for now but be careful. Deep learning will become an obsolete technology much sooner than most people in the business believe.

6 comments:

For Chess, deep learning has been used in an unsupervised way to get a computer up to a 40% to 50% win rate. The algorithm taught itself the rules of the game based off observation of all recorded matches. That's pure intuition by the computer, without reliance on brute force computations to figure out the right move. That's the same kind of intuition human beings use, although this is just one very narrow example.

I think you might be basing your criticism of Hawkins' HTM on the version they had before Dileep George left Numenta in 2010, which was indeed based on Hidden Markov Models and Bayesian networks. The current (post-2009) HTM is deterministic and vastly more faithful to the neuroscience. It is also much closer to your own ideas as expressed in your three-parter on the Bayesian approach. For the horses' mouth, I suggest looking at the OSS community based at numenta.org, for an independent take, my blog at inbits.com extends HTM's theory.

Thanks for correcting my false assumption. I'm now even more impressed with Hawkins than before. But I still have a few problems with his approach. I read his (and Subutai Ahmad's) recently published paper, the one that purports to explain why some cortical neurons have thousands upon thousands of inputs. I disagree with it for several reasons as listed below:

1. It does not provide a hierarchical structure for patterns, only for sequences. This is a fatal flaw, IMO, because, among other equally valid reasons, it encourages/requires rampant duplication of patterns. Patterns and sequences must be learned separately.2. Slight differences in neuron polarizations at the synaptic sites cannot account for the high temporal precision observed in humans and animals.3. Hawkins and Subutai do not seem to understand that sequence order is not enough. The timing intervals between nodes in a sequence are very precise. This is evident in fast predatory animals, musicians, dancers and athletes.

Finally, I disagree with Hawkins concept of sparse, distributed memory. Yes, it is true that the cortex can recognize patterns (e.g., faces in the clouds) with very little data but the reason, IMO, has to do with the highly deterministic interplay between patterns and sequences of patterns. Patterns are unique in that they have very few successors and predecessors. In my own model, of sequence memory, a sequence is assumed to be recognized if only 2 of its nodes fired. This due to the deterministic nature of the sensory space.

In my opinion, the high-input neurons observed in the cortex are just part of the mechanism of temporal pooling used by the brain. I have other objections but these will do for now.

Hi Louis,Thanks for getting back on this. Jeff and Subutai are friends but we've disagreed on many important details for the last couple of years. That paper is just one of a set outlining their version of the theory (others are on the way). I've two papers from late last year which "correct" what I see as a flaw in their core algorithm and extend its context. The first - Encoding Reality: Prediction-Assisted Cortical Learning Algorithm in Hierarchical Temporal Memory - provides a more cohesive description and details the model mathematically. The second - Symphony from Synapses: Neocortex as a Universal Dynamical Systems Modeller using Hierarchical Temporal Memory - extends the single layer model to a complete region of cortex and provides an information-theoretic and dynamical systems argument for the evolution of cortex as a system for coupling with and modelling dynamical systems and processes in the world and in the brain. This seems to fit with your idea of the external world as "perfect" and the brain's attempt to model that perfection by absorbing information-rich signals from it (and interacting with it). This new theory unifies Hawkins' HTM with ideas Freeman has had for decades as well as results from the mathematics of complex systems. We have vigorous discussions about these concepts on our HTM theory mailing list - you'd be very welcome to that discussion.

Thanks for the links and the invitation to join the HTM mailing list. I' sorry that I must decline the invitation because I am more of a lone wolf, so to speak, as I feel that I am more productive when working alone. I am intrigued by your second paper, the Symphony from Synapses. The idea that the cortex is a dynamical systems modeller rings a note in my mind because it reminds me of my own approach to modelling representations in memory. IMO, the brain makes several fundamental assumptions about the way the world works. Other than assuming a fully deterministic world (the most important part), it also assumes that the temporal relationships between events in the world are purely "geometric" in nature. By this, I mean that the brain assumes that temporal intervals between nodes in any sequence are proportional to those of other related sequences. The brain captures the proportions with the use of a hierarchy of sequences. IOW, higher level sequences should not be seen so much as being composed of lower level ones but as dictating or representing the timing of the lower ones.

I can't go much further into my "geometrical" approach (now is not the time) at this time. I mention it here only because I believe it is one of the things that are missing in Hawkins' HTM. Also, I don't think that looking into biology will reveal much beyond what we already know about the brain: spike-driven, hierarchical, patterns, sequences, and the like. The actual architecture of memory is much too counterintuitive to detect by looking at a maze of axons and dendrites.

I'll keep an eye out for your continued progress in this exciting field.