15 January 2006

The view of NLP as essentially a hunt-and-return technology has been gathering momentum since the burgeoning of the web. Example-based MT takes this view to machine translation, and phrase-based statisitcal MT is essentially EBMT done with statistics. In question answering (factoid style), the situation is even more dramatic. Deepak's thesis was essentially devoted to the idea that the answer to any question can be found in huge corpora by relatively simple pattern-matching. To a somewhat lesser degree, information extraction technology is something like smoothed (or backed-off) memorization, and performance is largely driving by one's ability to obtain gazeteers relevant to one's task.

Pushing such memorization technology further will doubtless lead to continued success, and there are many open research questions here. I would love others to answer these questions, but I have little interest in answering them myself. Fortunately, I think there are many interesting real-world problems for which simple memorization techniques will not work, and deeper "analysis" or "understanding" is required.

Any QA/summarization task that focuses on something other than "general world knowledge" fits into this category. I might want to ask questions to my email client about past emails I've recieved. The answer will likely exist only once, and likely not in the form I ask the question. I might want to ask questions about scientific research, either from PubMed or REXA. I might want to ask about the issues involved in the election of the Canadian PM (something I know nothing about) or the confirmation hearings of Samuel Alito (something I know comparatively more about). And I would want the answers tailored to me. If I owned a large corporation or were running a campaign, I would want to know what my supporters and detractors were saying about me, and who was listening to whom.

I could be proven wrong: maybe memorization techniques can solve some/all of these problems, but I doubt it. What other problems are people interested in that may not be solvable with memorization?

6 comments:

Agreed! memorization is not going to solve NLP. It is a low hanging fruit. My argument is that it is a low hanging fruit which has yet not been fully plucked. W/o performing such an experiment we cannot make claims that web cannot solve problems. (I want to make a concrete claim saying web can solve problem X with Y% performance.)

Secondly, memorization of facts (or knowledge )may also just be one component (feature/signal) to a system which does deeper analysis. Memorization of facts or knowledge can help a system perform the next level of analysis.

Thirdly, a system that performs deeper analysis could in theory be memorized if we expand all rules involved in its process.