4/10/2005

Steve likes theorems with a dependence on unobservable quantities. For example, if D is a distribution over a space X x [0,1], you can state a theorem about the error rate dependent on the variance, E(x,y)~D (y-Ey’~D|x[y'])2.

I dislike this, because I want to use the theorems to produce code solving learning problems. Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm.

Recast more broadly, this is a debate between “declarative” and “operative” mathematics. A strong example of “declarative” mathematics is “a new kind of science”. Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. Examples include “some things are unpredictable”, “a phase transition exists”, etc…

“Operative” mathematics helps you make predictions about the world. A strong example of operative mathematics is Newtonian mechanics in physics: it’s a great tool to help you predict what is going to happen in the world.

In addition to the “I want to do things” motivation for operative mathematics, I find it less arbitrary. In particular, two reasonable people can each be convinced they understand a topic in ways so different that they do not understand the viewpoint. If these understandings are operative, the rest of us on the sidelines can better appreciate which understanding is “best”.

4 Comments to “Is the Goal Understanding or Prediction?”

Are these two approaches really contradictory?
My feeling is that they are both needed and actually complement each other. When we understand better, we can make better prediction systems, while when we build better prediction systems, we can gain understanding.

Whether the goal of science should be to help people understand the world, or help them predict it (in order to be able to act on it as they desire) is more of a philosophical issue and may not have much to do with learning theory.
It is common (especially in Physics) to judge a scientific theory by its ability to make correct predictions about the world. But this does not mean that scientists should only care about making predictions.

To come back to Smale’s variance theorem, although this may not directly yield a new learning algorithm, it may bring a new understanding of the learning phenomenon and later on lead to new algorithms.

It is thus just a question of time: even if you consider that the only goal is to make predictions, finding a way to explain the observations might be a first step to being able to make predictions. I would not argue that this step is necessary, but I just think that no approach should be discarded and all ideas are good to take. Some people might prefer to spend their time thinking about certain problems and people working on learning theory should not try to judge what is worth spending time on.

Alright, the aim must be in understanding of the relationship under investigation absent the temporal dimension first and foremost as it is a basic assumption that systems evolve along the same temporal dimension at the same rate. Of course, this is an a common assumption, but it is not true.

So the prediction problem is derivative of the understanding problem, but strictly speaking, something different.

We can have meaningful understanding without prediction. We cannot have prediction with meaning without understanding (otherwise, we will be in for rude surprises every once and again that our scientific approach will not tolerate …. “and then a miracle occurs and ….” ).

It seems very comforting to say “all math is good, we should not judge math”, and it is true to some extent.

However, it’s punting on the issue more than I am comfortable. In the real world, we have only so many moments that we can devote to thinking about the mathematics of learning. This constraint suggests we optimize what we spend our time thinking about.

If we think back 10 years to what was good mathematical learning? I can only think of Boosting and SVMs. Of the two, the relationship of an SVM to learning theory is more tenuous, but many of the initial motivations were certainly connected to theorems which made predictions about prediction ability.

If we think back 20 years, there is PAC learning which certainly tries to make predictions about performance (even if in an assumption-heavy way).

If we think back earlier, there is VC theory and statistics, both methods that can be used (with more or less success) to make predictions.

Can we think of any mathematical learning result which is:
a) old
b) has a large impact
c) hasn’t been used to make predictions
?

Don’t take my view as more extreme than it is: I’ve certainly worked on elements of learning theory just for my understanding. There is some value in educating yourself and others about these things. But, what lasts and is useful appears to be operative or predictive statements.