10 March 2006

This post has two parts, both addressing the issue of showing that one task (X) is useful for another task (Y); eg., syntax is useful for MT, or WSD is useful for IR, or ....

The first question is: on whom is the onus of making such an argument. (I'm presupposing that it makes sense to make such an argument.) There are three options: (1) a person who does X; (2) a person who does Y; a third party. Arguments in favor of 1: if I work on a task, I am likely to want to justify its importance. For 2: I know the most about Y, so if X is useful, I can probably get it to work; also, other (2)s are more likely to buy my results. For 3: ???.

One could argue (3) to be (more) unbiased, but I'll make an idealized assumption that (1) and (2) care about furthering science, not some personal research program. Given that, I think the best case scenario is to have a joint authored paper between a (2) and a (1); that failing, I'd probably prefer a paper by a (2). I'd prefer it not from a potential bias perspective, but because a (2) is more likely to produce a Y-system that's state-of-the-art, so showing that X improves on this is going to be more convincing. Of course if a (1) can do that, that's fine too.

The second issue that I see is that in a lot of cases it's not cut and dry. Take for example a recent paper that showed that syntax is useful for EDT. I believe this paper is probably right, given the maturity of the system into which syntax was added. However, take my EDT system. I worked on this for about 15 months not using syntax. I used all sorts of crazy features. If I add syntax (which I've done), it improves things a little. Not a lot. And only for coref, not for mention tagging. Why? Likely, I've engineered around having syntax. If I had added syntax at the beginning, maybe I could have done away with many of the other features I have. We see the same thing in IR: if some complex NLP technique seems to improve IR systems but is expensive to run, then often people will keep tearing it apart until they find some tiny little feature that's doing all the important work. If so, can we still say that the original task helped?

I think, given this, there is a better way to measure the usefulness of X to Y other than: X improved Y's performance. Consider someone setting out to build an EDT system. They want to know whether they should include syntax or not. The real question is: assuming I get comparable performance, is it easier to include syntax or to engineer around it. I don't know how to measure this automatically (lines of code is perhaps a reasonable surrogate, assuming the same coder writes both), but it seems like such a measure would be much more useful and telling than whether or not performance on an arbitrary system goes up.