14 May 2006

At my graduation ceremony this past week, Dean Yannis Yortsos gave a short speech, in which he quoted someone famous (I've already forgotten who it was) at the Engineering graduation segment, saying something akin to "Science is the study of what the world is; Engineering is making something the world has never seen before." This comment had a fairly significant effect on me. However, when thinking about it in the context of NLP, it seems that NLP is an Engineering discipline, not a scientific one, at least according to the above (implied) definition of the terms.

NLP lives somewhere within the realm of language, statistics, machine learning, and cognitive science (you can include signal processing if you want to account for speech). Let's suppose that NLP is a scientific discipline: then, what aspect of the world does it study? It cannot be language, otherwise we'd be linguists. It cannot be statistics or learning, lest we call ourselves mathematicians. Cognitive science is out or we'd be...well, cognitive scientists. In fact, at least as I've described it in my job talks, NLP is the field that deals with building systems to solve problems that have a language component. This seems to render NLP to fall exclusively within the realm of Engineering.

This observation seems to have several repurcussions. It means, among other things, that we should not attempt to seek theoretical results, unless we are wearing another hat (I often wear the hat of a machine learning person; other people play linguists occasionally). It also means that perhaps we shouldn't complain as much that so much of NLP is system building. It seems that almost by definition, NLP must be system building: we are out to create things that never existed before.

If one, like me, is interested in science---that is, the study of natural things in the world---it also means that we had better find some related topic (machine learning, cognitive science or linguistics) to study simultaneously.

18 comments:

I agree and think that it is important to realise that there are multiple hats and to be aware which one(s) you are wearing at any instant. Your goals and how you evaluate your progress should depend on your hat.

Surprisingly many people appear to fail to get this. For example, at the height of the artificial neural networks bandwagon journals that claimed to be science journals were packed with papers that were second-rate engineering (at best). Making a novel combination of method and problem doesn't constitute science if it does not serve to illuminate some (relative) invariant of the way the world works.

When deciding whether a field of study is a science or not, I typically think of the following criteria:

(1) It is guided by natural law;(2) It has to be explanatory by reference to nature law;(3) It is testable against the empirical world;(4) Its conclusions are tentative, i.e. are not necessarily the final word; and(5) Its is falsifiable.

I took these from the Overton opinion (http://www.talkorigins.org/faqs/mclean-v-arkansas.html). Though Overton is not an expert on the history of science, these 5 simple criteria usually form the basis of most theories on the nature of scientific study.

I think it could be argued that NLP (and ML in general) satisfy 3-5. It would be much harder to argue for 1 or 2. The first question is whether human language falls in the same domain asBiology, Physics, etc. Furthermore, a study of recent NLP conferences will show that most (not all) researchers are interested in systems that are:

Property (a) is probably given more weight by reviewers, but properties (b) and (c) often make a paper more appealing and lead to its use throughout the field. The problem is that most are not concerned with property (d): the system is based on current knowledge of how humans process and produce language. (d) is really constraint 2 from the Overton opinion. I would argue that this places NLP strictly in the realm of Engineering.

Thanks for the list, Ryan. I tend to agree with you that (d) is largely missing and that (d) implies (2). I know that, at least around here, Ed Hovy is much more interested in (d) than many.

I see no problem with saying that language falls under the umbrella of "natural law." There seems to be little significant difference between language and biology, with the exception that language applies primarily to humans (and perhaps some other higher mammals) rather than all of "life." Language seems to essentially be a byproduct of some aspects of life; I believe this is no different from how a liver is a byproduct of some aspects of life. I'm sure I could dig up plenty of articles that disagree with this though, but I don't know the arguments.

Moreover, one can argue that much machine learning is mathematics and therefore also science.

(I suppose that here I am defining "guided by natural law" to be the same as "not guided by conscious choice" which would include mathematics [fairly standard] as well as language [perhaps less so].)

I agree with y'all for the most part, but just to play devil's advocate:

- It can be argued that math is not a science. Science is grounded in the scientific method, where one develops a hypothesis and tests it with controlled observations from the real world. Much of mathematics involve axioms and deductions, and, as such, mathematic knowledge does not come from the scientific method. Yet math and science are tightly interwined--in fact, much of the hypotheses in science is expressed as mathematic models, formulas, etc.

- Some may not classify linguistics as the same kind of science as biology, physics, or chemistry. This is the distinction people often make between soft vs. hard science, or social vs. physical science. Social scientists may argue that they apply the scientific method with the same rigorousness as physical scientists; yet, some may counter that social phenomenon is way to complex to be studied in control settings.

Anyway, personally I like Feynman's quote about science: "There is the value of the worldview created by science...The world looks so different after learning science."

Now that makes me wonder: Does the world look different after learning engineering?

I'm now more convinced that linguistics is science than math :). Math seems to fail at Ryan's #4 for the most part (though "open questions" famously like P vs NP are tentative, there comes a point where many hypotheses---theorems---are no longer tentative).

I guess I don't see how any of the definitions given exclude the study of language from science. I have many bio friends who replicate experiments hundreds of times because they're unreliable. Just because something is a difficult version of X doesn't make it any less an X. I also don't think that language falls under the category of social phenomenon in the same way that, say, history does. I guess this goes back to my conscious choice distinction (the study of nature is the study of that which we have no real control).

I think that whomever Yortsos quoted was saying something akin to the Feynman quote. The world does look different after Engineering because you have changed the world. This is obviously a different sense than Feynman intended.

I would argue that Linguistics could be a science, but in practice is not. Consider, for instance, Newtonian mechanics. It was a powerful and simple theory that explained most things well and predicted events accurately. However, there were cases, such as the orbit of Mercury, for which it was clearly insufficient. One could easily create a theory that explained the orbit of Mercury, but such a theory would often require complex explanations for most other events. Thus, the scientific community tended to prefer the simple theory that explained most things well over the complex theories that explained more things, but in a much more convoluted manner.

Linguistics feels to me like the study of the orbit of Mercury, whereas computational linguistics and NLP seem to provide simple solutions that handle the common cases well. It is the unusual that interests most linguists. This is fair enough, but often the theories proposed to explain the unusual are often much more than is needed for the average case. But who knows, maybe the theory of relativity for Linguistics is just around the corner :)

Interesting -- I think I tend to agree with Ryan. It's probably somewhat unfair to pick on linguistics since there aren't many linguists hanging around here, but it's also important to acknowledge that linguistics is not just (English) syntax. Personally, I find historical linguistics and typology to be more "scientific" than your standard studies of syntax/semantics/phonology, but that may be too much of a personal bias. I would liken studying English syntax to studying how Mercury moves, but studying how all language work (eg., via typology) to be more like Newtonian mechanics. It's too bad that linguistics is largely dominated by the study of English.

I'm going to throw out a naive guess and agree that NLP as it has been pursued seems to be generally more of an "engineering" practice. Look at the work in general over the years: they're mostly evaluation oriented. This may very well be a result of various competitions and funding bodies' orientations, but it definitely has left an imprint.

On the other hand, I've seen some statistical physics work that was definitely not evaluation-oriented, and those studies employed NL and IR techniques as methods for exploring and validating their hypotheses.

btw- it's great that there's finally a blog dedicated to issues in NLP.

Given that the work in what goes on under the rubric "NLP" is driven not by theoretical questions per se, but engineering problems having to do with the computational processing of speech or text, I'd be hesitant to place this somewhat nascent field in the Science with a capital S camp. That said, there are at least a few people attempting to leverage NLP/ML/IT computational techniques and algorithms for their big-S questions. John Goldsmith, Fernando Pereira, and Mark Steedman come to mind. (In fact, most anyone who has been seriously inspired by the work of Chomsky's mentor, Zellig Harris, would probably qualify.) See especially Goldsmith's LINGUISTICA project for an example of a concrete NLP project whose implementation has serious theoretical implications.

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..sesli sohbetsesli chatsesli sohbet siteleri