5/8/2009

Normally I do not blog, but John kindly invited me to do so. Since computability issues play a major role in Artificial Intelligence and Machine Learning, I would like to take the opportunity to comment on that and raise some questions.

The general attitude is that AI is about finding efficient smart algorithms. For large parts of machine learning, the same attitude is not too dangerous. If you want to concentrate on conceptual problems, simply become a statistician. There is no analogous escape for modern research on AI (as opposed to GOFAI rooted in logic).

Let me show by analogy why limiting research to computational questions is bad for any field.

Except in computer science, computational aspects play little role in the development of fundamental theories: Consider e.g. set theory with axiom of choice, foundations of logic, exact/full minimax for zero-sum games, quantum (field) theory, string theory, … Indeed, at least in physics, every new fundamental theory seems to be less computable than previous ones. Of course, once a subject has been formalized, further research (a) analyzes the structure of the theory and (b) tries to compute efficient approximations. Only in (b) do computational aspects play a role.

So my question is: Why are computational questions so prevalent in AI research? Here are some unconvincing arguments I’ve heard:

A) Because AI is a subfield of computer science, and the task of computer scientists is to find (efficient) algorithms for well-defined problems?

I think it does not do any (real-world) problem any good to confine it to computer science. Of course, philosophers and cognitive scientists also care about AI, but where are the mathematicians?

B) Because formalizing AI and finding efficient smart programs goes hand-in-hand?Separating these two issues would lead to no, or at best to results which are misleading or useless in the construction of intelligent machines?

I am not aware of any convincing argument that separating the issues of “axiomatizing a field” and “finding efficient solutions” will (likely) fail for AI. The examples above of other fields actually indicate the opposite. Of course, interaction is important to avoid both sides running wild. For instance, von Neumann’s minimax solution for games, albeit infeasible for most games, is the cornerstone of most practical approximations.

C) Because there is some deep connection between intelligence and computation which can not be disentangled?

Sure, you could say that intelligence is by definition about computationally efficient decision making. This is as unconvincing as argument (A). Pointing out that the human brain is a computational device is quite useful in many ways, but doesn’t proves (C) either. Of course, ultimately we want a “fast” smart algorithm. How is AI different from wanting a fast algorithm computing primes, which you derive from a non-algorithmic definition of primes; or drawing fractals?

D) Because AI is trivial if computational issues are ignored?All conceptual problems have already been solved?

Many have expressed ideas that some form of exhaustive search over all possible solutions and picking the “best” one does the job. This works essentially for exactly those problems that are well-defined. For instance, optimal minimax play of a zero-sum game or solving NP complete problems are conceptually trivial, i.e. if computation time is ignored. But in general AI and machine learning, there is not a universally agreed-upon objective function. The Turing test is informal (involves a human judge in the loop), maximizing expected reward (the true distribution is not known, so expectation w.r.t. to what?), etc. The AIXI model, briefly discussed at this blog, is the first complete and formal such criterion, for which, let me phrase it that way, no flaw has yet been identified. Shane Legg’s award-winning thesis gives an informal introduction and contains lots of discussion.

Conceptual and computational problems in AI should be studied jointly as well as separately, but the latter is not (yet) fashionable. When AI was more logic oriented, some good logicians helped develop the foundations of “deductive” AI. Where are the researchers giving modern “inductive” AI its foundation? I am talking about generic learning agents, not classifying i.i.d. data. Reinforcement learners? Well, most of the hard results are from adaptive control theorists, but it’s reassuring to see parts of these communities merging. It’s a pity that so few mathematicians are interested in AI. A field “mathematical AI” with the prestige of “mathematical physics” would be exciting. As a start: 40% of the COLT & ALT papers on generic learning agents, 30% induction, 20% time-series forecasting, 10% i.i.d. Currently it’s reversed.

25 Comments to “Computability in Artificial Intelligence”

I’m puzzled why you keep using the value of zero-sum games as an example.

It is easy to compute the value of a zero sum game, as well as the corresponding min-max strategies for each player: its just a linear program. Simple learning dynamics also quickly converge to optimal min-max play.

What makes you think that “in physics, every new fundamental theory seems to be less computable than previous ones”? Can you offer some evidence, for example a result in theoretical physics which is known to be not computable? I am also surprised at the claim that computational aspects play little role in foundations of logic. Logicians invented the notion of general computation (Church’s lambda calculus, Goedel’s general recursive functions), and anybody learning logic simply has to know the fundamentals of computability theory.

Itâ€™s a pity that so few mathematicians are interested in AI. A field â€œmathematical AIâ€ with the prestige of â€œmathematical physicsâ€ would be exciting.

I don’t think mathematicians (even outstanding ones) poring over AI problems would make much progress.
They are way, way too hooked on their enjoyable number games.
Focusing on mathematics is just the continuation of the unfortunate GOFAI trend.
I am much more hopeful in some philosophers who are interested by the relationships between language, logic and maths.

I’m in the B and C views above, with respect to myself. I’ll try to be very explicit about why.

It might be helpful to think about what it means to be a “complete” model. There are many ways we could try to formalize AI, such as with (a) deterministic decision process, (b) a MDP, (c) a POMDP, (d) a computable world, or (e) some uncomputable world. In each of these models, we can try to define what it means to solve the AI problem. Much of the earlier work was on AI was in (a). Much of the current work in ML is in (b) with some in (c). AIXI is for (d). I believe what is meant by “complete” is that (e) is not possible anyways for any AI we could implement on a computer, so you might as well stick with (d). This is evidence for point (C) above, where it seems inconsistent to argue that we should particularly care about computable worlds and yet not care about computation.

But more realistically (d) isn’t possible either given computational constraints. Furthermore, I don’t see how setting (d) is amenable to approximation, because computational compromises can degrade your performance with respect to the set of computable worlds by essentially arbitrary and uncontrolled amounts. This lack of tractability suggests (B) is a reasonable concern.

Although I do believe in (B), I don’t believe in the italicized part of (B). The AIXI setting does capture some elements missing from simpler settings. For example, it actually worries about generalization, something which isn’t done in the simple versions of settings (a), (b), or (c) where existing provably sound algorithms (including some I’ve worked on) could perhaps more accurately be described as reinforcement memorization.

The evidence provided that “limiting research to computational questions is bad for any field” isn’t convincing to me. In some real sense, physics gets a free pass here, because where there is physical theory of the world which is not computable, we should be investigating making more powerful computers. String theory also isn’t a great example, as many people have doubts about whether or not it provides useful insights. There are certainly substantial branches of mathematics designed without computability as a concern, but perhaps this was an accident of history, due to their invention before a computer. Perhaps if computers came first, mathematics would be designed differently, emphasizing a computational basis to a greater degree. I generally regard mathematics as a quest to discover how to think, and it seems a bias towards computability is generally useful.

What about the argument that animals (including humans) seem to be able to solve problems intelligently (good vision, good language understanding for humans) and they do so rather efficiently. (Choose your favorite measure of measurable efficiency: aka time, energy, …)

Shouldn’t that suggest that an algorithm running on a moderately powerful computer should be able to do so as well?

Does anyone believe that general AI (or at least increasingly sophisticated artificial agents) is inevitable? The evolutionary development of the brain occurred due to selective pressures favoring increased intelligence. If I were take a step back and squint, I see some parallels with socio-economic pressures favoring more intelligent (e.g., self-sufficient, general purpose) computer programs. Does it suffice for now to just “put our heads down” and work on solving the short-term and medium-term problems at hand?

You seem to argue that there should be more emphasis on AIXI/Solomonoff type models and less on computation. Did I understand you? It sounds nice. But we understand why this is, don’t we? Sociology pushes people to work on familiar problems and problems in which steady progress is possible, but progress on universal induction is hard to come by. I’m a statistician and I find these topics very interesting for fundamental reasons, but I can’t work on them all the time because my peers wouldn’t understand. We wont, for example, find mainstream statisticians taking an interest until it is possible to make more tangible progress and the very definitions will be too computational for most of them anyway; moreover the mathematically inclined generally want analysis-type proofs not lambda-calculus ones. All that said, I do keep working to find models of computation that can describe progressively complex behavior and still be feasible to work with, I just don’t expect to publish any of that any time soon.

Roughly, (1)-(3) can be computed by hand, (4)-(6) by a computer, for (7)-(9) you first need to perform severe analytically simplifications (e.g. find effective theories) before you can practically compute some aspects, (10) may be principally beyond the computational realm.

These theories have been found by considering symmetry principles, physical and mathematical consistency, elegant axiomatic characterizations, etc., and of course experimental guidance and verification. On the other hand, computational considerations are/were a big nuisance but did not guide in finding the fundamental theories in physics. Every year new approximations (effective theories) are derived or inspired by these fundamental theories (and vice vera of course).

Logic is borderline. Proofs are indeed (usually) algorithmically checkable, and the set of theorems is recursively enumerable. So proof theory and recursion theory have a strong constructive/computational flavor. But there is more in logic. I had in mind set theory (see axiom of choice) and model theory (uncountable models). The objects a formal system talks about may or may not be constructible. Intuitionistic logic cares about constructability on another level. So logic is actually a good example, where non-constructive and computational aspects are both valued in harmony. Not so in modern AI.

Btw. I also once thought that non-constructible concepts in logic are an illusion and should be banned, and (strong) intuitionistic logic is the way to go. I don’t think so anymore. Losing some constructibility in the process of abstraction is sometimes worth the price (John, you should be sympathetic to this). I am not a fan of non-constructive math and try to avoid it wherever reasonably possible. I don’t like the axiom of choice, and I am a fan of the Loewenheim-Skolem theorem/paradox, which says that every consistent first-order theory has a countable model, even ZFC and the Reals.

“Focusing on mathematics is just the continuation of the unfortunate GOFAI trend.”

Little math was involved in GOFAI. GOFAI was logic-based.

“I am much more hopeful in some philosophers who are interested by the relationships between language, logic and maths.”

You need someone who can convert philosophical verbiages into rigorous models. Physicists are best trained to model things. Mathematicians/Logicians best trained to deeply analyze such models. Computer scientists are best trained to finding efficient algorithms for (relatively) well-defined problems. It is likely that all make valuable and essential contributions to the grand goal of AI.

“There is nothing that can be said by mathematical symbols and relations which cannot also be said by words. The converse, however, is false. Much that can be and is said by words cannot be put into equations, because it is nonsense.” (Clifford Ambrose Truesdell, 1966)

Just to avoid any misunderstandings: I appreciate and read good philosophers (which include e.g. Earman but not Popper).

Physics strongly indicates that the real world is computable, so considering only non-computable environments is most likely non-restrictive. Your argument is not the reason why I restrict AIXI to computable environments.

I donâ€™t see how setting (d) is amenable to approximation, because
computational compromises can degrade your performance with respect
to the set of computable worlds by essentially arbitrary and
uncontrolled amounts.

The same is true for pruning the exact minimax search tree in Chess and adding heuristics. So you think it’s not a good idea to start with the minimax ideal, and approximate it, because the approximation accuracy is uncontrollable? Actually whatver you think, in this case history shows that it is a good idea for many games.

A “bias towards computability is generally useful”. On average yes, but if everyone had constantly this “computability” bias in his head, some great ideas would have never emerged. It’s similar to the constant arguments of the (non)value of theoretical research, or the (non)usefulness of separating formal specification from implementation, etc.

We wont, for example, find mainstream statisticians taking an interest until it is possible to make more tangible progress

THIS is the main reason why modern research (non only in statistics) is such a painful slog: NO academic can afford to investigate questions which could end up in a total flop, there has to be some “nice” results in view, even minor ones.
Therefore few breakthroughs of any kind because breakthroughs lie almost always on untrodden paths.

I completely agree. Luckily, my investigations of AIXI & Solomonoff often lead to publishable “conventional” results, which can be sold without mentioning AIXI/Solomonoff. Here are two recent offspring:Feature Markov Decision Processes
and Predictive Hypothesis Identification.
Can you see the connection? AIXI & Solomonoff give me and some others a lot of inspiration and insight into more down-to-earth research problems.

Many computer science students lack the ability to abstract away from computational issues. Their thinking mode is always “How can I write a program solving X”. If X is not 100% well-defined, they miss great opportunities from not first asking, what is what I really want, besides a fast algorithm. Even things like creating an algorithm that guesses the right answer, e.g. a non-deterministic finite automaton, and thereafter converting it to a conventional algorithm are foreign to many.

“Much that can be and is said by words cannot be put into equations, because it is nonsense.â€ (Clifford Ambrose Truesdell, 1966)

Yes but this is very misleading in that many things that can be said by words and are not nonsense cannot be put into mathematical symbols either.
Even a sketch of a proof which makes perfect sense to a mathematician (and truly and fully conveys the proof semantic) cannot be put (directly) into mathematical symbols, this is where lies the missing insight into semantics.

There may be merit to that idea at some level, but in practical terms I’m tempted to say no.

The architecture of a compute platform will have a fairly strong effect on what it can do efficiently, and our modern computer hardware is radically different from a brain in that sense. Our computer chips are high throughput, low-parallelism hardware whereas the brain is pretty much the opposite (slower components, massively parallel).

The first is that what we can compute is governed by physics, and (nearly) any physical process can theoretically be used to compute. It’s because of this near-tautology between physics and computation that physics has traditionally had a free pass on computation (with some signs that free pass is ending in the last several decades).

The second is that I’m not disputing whether the “approximate an ideal” approach is used, and has been useful, but rather whether it is the best (most robust and generally useful) approach. A crude model of this comes from competitive analysis, where you try to design an algorithm which competes with some set strategies. You could choose an incredibly large set of strategies for which no feasible algorithm could compete, or you could choose a realistic set of strategies where competition is possible. Many people choose the second. You might think the second is simply an approximation of the first, but I don’t think that’s fair as people reason about the second without reference to the first.

Yes becouse an animal is able to recognize only a small group of images over all possible images, but the set of all these “recognizable images” is about equal to the set of possible existing images and this set is absolutely smaller than the set of all possible images.
So is simple to train an animal to recognize existing images but is about impossible to train an animal to recognize a set of artificial constructed images.
All other problems you mention are different instances of the same classification problem.
Using an algorithm constructed to be trained to classify every possible set is a wrong way becouse the problem is too much wide too much general for nowaday computational resource and pheraps also for future capabilities.

Um, I don’t see how Newton’s laws or Electrostatics can be calculated “by hand.” In any complicated situation beyond two bodies there are no closed form solutions in Newton’s laws. In electrostatics, computing a potential from a potential surface can’t be done by hand.

All of these theories lead to uncomputable problems (except probably Kepler and maybe electrostatics, depending on what you mean by that, a string theory/QG because, well, that’s now well defined) in some real sense: all of these physical theories which allow computers. Thus asking questions about what happens in a given simulation leads directly to an uncomputable theory.

Quantum gravity may, on the other hand, have a strange sort of uncomputability in that the basic objects out of which the theory is formulated may have noncomputable properties: i.e. the problem of classifying manifolds in higher than four dimension, which some of the above theories might require if they allow general topological change, is not computable.

I’m surprised that Marcus’s claim is not that there is too much of a focus on _complexity_ (or perhaps this is what he means). Analyses carried out at the (Marr’ian) computational level have revolutionized computational cognitive science, AI, ML, etc. Furthermore, its clear that we haven’t carried out this program to its full potential in ML. There is, however, a strong bias among computer scientists to accept only efficient algorithms. I believe this is premature and the problem pervades ML/AI.

The existence of Marcus’s “fastest algorithm for all problems in limited* space and limited* time” suggests to me that our metric for evaluating complexity is inadequate. Searching the space of proofs seems like cheating somehow. And, with complexity and computability disregarded, I’m fairly happy to concede that the universal predictors or agents proposed by Marcus essentially solve the (modern formulation of the) problem of intelligence. (If someone is aware of a good critique that attacks an issue other than the noncomputability or complexity, I would be interested.)

However, given that we are interested in creating an intelligence in this world where we are (seemingly) limited to manipulate computable objects, the study of the boundary of what is representable (in particular, a computable probability theory) seems useful. As an example, the space of all computer programs as a hypothesis space is too unwieldy (in fact, often uncomputable given the proposed task). My particular interest in representing probability distributions as computer programs and specifying structured nonparametric Bayesian models is that I believe we can usefully approach this upper bound from below. Many of these models do not have efficient inference algorithms—but we’re finally back to studying knowledge representations.

That said, complexity will ultimately have to be addressed. However, what appears to be and what is currently understood to be efficient (or not) is perhaps limiting us.

You are treading on dangerous ground here
How dare you threaten the turf of so many valiant researchers?
It’s only a matter of “progress” (and more budget), Cyc is making progress since 1984 for instance and I have no doubt that machine learning is a very “promising” avenue toward universal intelligence.
It is the strategy of the million monkeys…