24 September 2010

ACL 2011 ends on June 24, in Portland (that's a Friday). ICML 2011 begins on June 28, near Seattle (the following Tuesday). This is pretty much as close to a co-location as we're probably going to get in a long time. A few folks have been discussing the possibility of having a joint NLP/ML symposium in between. The current thought is to have it on June 27 at the ICML venue (for various logistical reasons). There are buses and trains that run between the two cities, and we might even be able to charter some buses.

One worry is that it might only attract ICML folks due to the weekend between the end of ACL and the beginning of said symposium. As a NLPer/MLer, I believe in data. So please provide data by filling out the form below and, if you wish, adding comments.

If you woudn't attend any, you don't need to fill out the poll :).

The last option is there if you want to tell me "I'm going to go to ACL, and I'd really like to go to the symposium, but the change in venue and the intervening weekend is too problematic to make it possible."

15 September 2010

I heard earlier this morning that Fred Jelinek passed away last night. Apparently he had been working during the day: a tenacious aspect of Fred that probably has a lot to do with his many successes.

Fred is probably most infamous for the famous "Every time I fire a linguist the performace of the recognizer improves" quote, which Jurafsky+Martin's textbook says is actually supposed to be the more innocuous "Anytime a linguist leaves the group the recognition rate goes up." And in Fred's 2009 ACL Lifetime Achievement Award speech, he basically said that such a thing never happened. I doubt that will have any effect on how much the story is told.

Fred has had a remarkable influence on the field. So much so that I won't attempt to list anything here: you can find all about him all of the internet. Let me just say that the first time I met him, I was intimidated. Not only because he was Fred, but because I knew (and still know) next to nothing about speech, and the conversation inevitably turned to speech. Here's roughly how a segment of our conversation went:

Hal: What new projects are going on these days?Fred: (Excitedly.) We have a really exciting new speech recognition problem. We're trying to map speech signals directly to fluent text.Hal: (Really confused.) Isn't that the speech recognition problem?Fred: (Playing the "teacher role" now.) Normally when you transcribe speech, you end up with a transcrit that includes disfluencies like "uh" and "um" and also false starts [Ed note: like "I went... I went to the um store"].Hal: So now you want to produce the actual fluent sentence, not the one that was spoken?Fred: Right.

Apparently (who knew) in speech recognition you try to transcribe disfluencies and are penalized for missing them! We then talked for a while about how they were doing this, and other fun topics.

A few weeks later, I got a voicemail on my home message machine from Fred. That was probably one of the coolest things that have ever happened to me in life. I actually saved it (but subsequently lost it, which saddens me greatly). The content is irrelevant: the point is that Fred -- Fred! -- called me -- me! -- at home! Amazing.

I'm sure that there are lots of other folks who knew Fred better than me, and they can add their own stories in comments if they'd like. Fred was a great asset to the field, and I will certainly miss his physical presense in the future, though his work will doubtless continue to affect the field for years and decades to come.

The changes the the reviewing process are interesting. Basically the main change is that the author response is replaced by a journal-esque "revise and resubmit." That is, you get 2 reviews, edit your paper, submit a new version, and get a 3rd review. The hope is that this will reduce author frustration from the low bandwidth of author response. Like with a journal, you'll also submit a "diff" saying what you've changed. I can see this going really well: the third reviewer will presumably see a (much) better than the first two. The disadvantage, which irked me at ICML last year, is that it often seemed like the third reviewer made the deciding call, and I would want to make sure that the first two reviewers also get updated. I can also see it going poorly: authors invest even more time in "responding" and no one listens. That will be increased frustration :).

The other change is that there'll be more awards. I'm very much in favor of this, and I spend two years on the NAACL exec trying to get NAACL to do the same thing, but always got voted down :). Oh well. The reason I think it's a good idea is two-fold. First, I think we're bad at selecting single best papers: a committee decision can often lead to selecting least offensive papers rather than ones that really push the boundary. I also think there are lots of ways for papers to be great: they can introduce new awesome algorithms, have new theory, have a great application, introduce a cool new problem, utilize a new linguistic insight, etc., etc., etc... Second, best papers are most useful at promotion time (hiring, and tenure), where you're being compared with people from other fields. Why should our field put our people at a disadvantage by not awarding great work that they can list of their CVs?

Anyway, it'll be an interesting experiment, and I encourage folks to submit!

There are two assumptions that are often used in statistical learning (both theory and practice, though probably more of the latter), especially in the semi-supervised setting. Unfortunately, they're incompatible.

The margin assumption states that your data are well separated. Usually it's in reference to linear, possibly kernelized, classifiers, but that need not be the case. As most of us know, there are lots of other assumptions that boil down to the same thing, such as the low-weight-norm assumption, or the Gaussian prior assumption. At the end of the day, it means your data looks like what you have on the left, below, not what you have on the right.

The manifold assumption that is particularly popular in semi-supervised learning, but also shows up in supervised learning, says that your data lie on a low dimensional manifold embedded in a higher dimensional space. One way of thinking about this is saying that your features cannot covary arbitrarily, but the manifold assumption is quite a bit stronger. It usually assumes a Reimannian (i.e., locally Euclidean) structure, with data points "sufficiently" densely sampled. In other words, life looks like the left, not the right, below:

Okay, yes, I know that the "Bad" one is a 2D manifold embedded in 2D, but that's only because I can't draw 3D images :). And anyway, this is a "weird" manifold in the sense that at one point (where the +s and -s meet), it drops down to 1D. This is fine in math-manifold land, but usually not at all accounted for in ML-manifold land.

The problem, of course, is that once you say "margin" and "manifold" in the same sentence, things just can't possibly work out. You'd end up with a picture like:

This is fine from a margin perspective, but it's definitely not a (densely sampled) manifold any more.

In fact, almost by definition, once you stick a margin into a manifold (which is okay, since you'll define margin Euclideanly, and manifolds know how to deal with Euclidean geometry locally), you're hosed.