Yet, in some ways at least, the field is headed in the opposite direction.

I've often discussed the Chomsky hierarchy, and how most techniques at present fall very low on it. I've often discussed hierarchies "above" the Chomsky hierarchy; hierarchies of logic & truth, problems of uncomputability and undefinability. Reaching for the highest expression of form, the most general notion of pattern.

Machine learning has made artificial intelligence increasingly practical. Yet, the most practical techniques are often the least expressively powerful. Machine learning flourished once it abandoned the symbolic obsession of GOFAI. Fernando Pereira famously said: "The older I get, the further down the Chomsky Hierarchy I go."There's a good reason for this, too. Highly structured techniques like logic induction and genetic programming (both of which would go high in the hierarchy) don't scale well. Commercial machine learning is large-scale, and increasingly so. I mentioned this in connection with word2vec last time: "Using very shallow learning makes the technique faster, allowing it to be trained on (much!) larger amounts of data. This gives a higher-quality result."The "structure" I'm referring to provides more prior bias, which means more generalization capability. This is very useful when we want to come to the correct conclusion using small amounts of data. However, with more data, we can cover more and more cases without needing to actually make the generalization. At some point, the generalization becomes irrelevant in practice.Take XML data. You can't parse XML with regular expressions.1 Regular expressions are too low on the Chomsky hierarchy to form a proper model of what's going on. However, for the Large Text Compression Benchmark, which requires us to compress XML data, the leading technique is the PAQ compressor. Compression is equivalent to prediction, so the task amounts to making a predictive model of XML data. PAQ works by constructing a probabilistic model of the sequence of bits, similar to a PPM model. This is not even capable of representing regular expressions. Learning regular expressions is like learning hidden markov models. PPM allows us to learn fully observable markov models. PAQ learns huge markov models that get the job done.The structure of XML requires a recursive generalization, to understand the nested expressions. Yet, PAQ does acceptably well, because the depth of the recursion is usually quite low.You can always push a problem lower down on the hierarchy if you're willing to provide more data (often exponentially more), and accept that it will learn the common cases and can't generalize the patterns to the uncommon ones. In practice, it's been an acceptable loss.Part of the reason for this is that the data just keeps flowing. The simpler techniques require exponentially more data... and that's how much we're producing. It's only getting worse:

At The New Yorker, Gary Marcus complains: Why Can't My Computer Understand Me? Reviewing the work of Hector Levesque, the article conveys a desire to "google-proof" AI, designing intelligence tests which are immune to the big-data approach. Using big data rather than common-sense logic to answer facts is seen as cheating. Levesque presents a series of problems which cannot (presently) be solved by such techniques, and calls others to "stop bluffing".I can't help but agree. Yet, it seems the tide of history is against us. As the amount of data continues to increase, dumb techniques will achieve better and better results.Will this trend turn around at some point?Gary Marcus points out that some information just isn't available on the web. Yet, this is a diminishing reality. As more and more of our lives are online (and as the population rises), more and more will be available in the global brain.Artificial intelligence is evolving into a specific role in that global brain: a role which requires only simple association-like intelligence, fueled by huge amounts of data. Humans provide the logically structured thoughts, the prior bias, the recursive generalizations; that's a niche which machines are not currently required to fill. At the present, this trend only seems to be increasing.Should we give up structured AI?I don't think so. We can forge a niche. We can climb the hierarchy. But it's not where the money is right now... and it may not be for some time.1: Cthulhu will eat your face.

1 comment:

Should we give up structured AI? I don't think so - developing a true AGI would be the ultimate human achievement, but I doubt there will be much funding to support it. BigData is the focus at the moment, and there is a lot that can be gained from this from a business point of view, but really - it is not "AI", more so aggregation and association algorithms.

I think the development of AGI will be left to the mega corps like IBM/Google and independent researchers.