Friday, August 13, 2010

In Need of a Few Bad Papers

For my graduate class this semester, there's a lot of paper-reading, and I view learning how to critically and constructively read papers as part of the student goals for the class.

A corollary of this, it seems to me, is that the class should include some bad papers, so students learn to recognize (and, if possible, get something out of) reading those. So I need some really good examples of bad papers. (In one of the areas of the class focus -- web search, compression. coding, streaming data structures...)

Now I should be clear about what I mean by bad papers. I'm looking for something of a higher standard than an automatic journal reject -- I get at least one of those a month in my mailbox, and it's not clear there's much to learn from that. I'm talking about papers that at least superficially look quite reasonable -- indeed, I'm expecting papers that have been published in reasonable places -- but when you think about them more, there are subtle (or not-so-subtle) flaws. In theoretical papers, possibly it might be that the paper starts with a model that sounds really nice but it just clearly wrong for the problem being addressed. For systems papers, it might be a paper where the experiments just don't match up to what ostensibly was being proposed.

[I had a nice example of a bad paper in earlier incarnations of the class, but I don't think it's aged well, and I've removed it.]

Maybe bad is even the wrong term for what I'm looking for. Perhaps I should use a more neutral word, like "controversial" -- indeed, then I can get the students to take sides. (Is the Faloutsos, Faloutsos, Faloutsos paper still considered controversial these days? That could be a nice example, but it's not really on topic for the class.) Or perhaps I just want papers that reached too high for their time -- noble failures. The key is that, in my mind, just showing students examples of great papers doesn't seem didactically sound. Negative examples are important for learning too (especially if they also show that great scientists don't always get it right).

Feel free to mail me rather than post a comment if you're afraid of offending anyone. Naturally, mailing me links to my own papers will be taken with the appropriate humor.

The same goes for the other papers of Balcan about Learning and Clustering via Similarity Functions.

They all try to make a conceptual point, but the conceptual point turns out, each time, to be highly flawed: the theorems are correct, but often nearly-vacuous, and in any case don't mean what you'd have thought they mean.

They all try to make a conceptual point, but the conceptual point turns out, each time, to be highly flawed: the theorems are correct, but often nearly-vacuous, and in any case don't mean what you'd have thought they mean.

Seeing as I was mentioned here, I'd like to say that BBG and the papers on clustering and learning via similarity functions, are some of my favorites of my own papers. Let me focus here on BBG in particular and explain why I like it so much.

First, there are many problems that when you formulate them as an optimization problem, they have the property that the objective function you are using to measure solution quality is not the same as the quantity you really care about. For instance, say you are clustering images of people by who is in them. You might algorithmically map the images into points in R^n and then aim to find a clustering that (approximately) minimizes k-means score on these points. But really you care about whether you clustered the people correctly. Of course, what's happening is your algorithm can't measure what you really care about so instead you optimize something you can measure. Fine - it's not a crazy thing to do. But the effect is this may make the problem seem harder than it really is. In particular, notice that in order for this to work, you *already* need to have used a method for representing your data (e.g, a way of placing images into in R^n) such that the optimal and near-optimal solutions to the objective you are measuring (k-means score) have good quality according to what you really care about. Otherwise, why be excited about getting a near-optimal score? The key point of BBG is that this property (solutions that are near-optimal according to the objective you can measure are also near-optimal according to the one you care about but can't) implies structure you can use algorithmically. Furthermore, this is true *even if* it's NP-hard in general to find near-optimal solutions. E.g., say we believe that it would be enough to get a 1.1 approximation to k-means in order to get close to the clustering we want. You might think this is not useful since k-means is NP-hard to approximate to such a factor. But what BBG shows is you can still use the implied structure to get close according to the objective you care about, even if it's NP-hard in general to approximate the objective you can measure.

To put this another way: I love as much as anyone the challenge of developing better approximation algorithms for important problems. For instance, I'd love to get a factor-2 approximation to k-median. But the BBG algorithms already have the property that if data is such that all factor-2 approximations are close to the desired answer, then BBG will also get a solution close to the desired answer. So, to claim my new factor-2 approximation is better, I need to be able to say "it doesn't just produce any old 2-approximation, it produces a good/natural/special one" for some compelling notion of "good/natural/special". And if we can articulate what we mean by that, we can then ask if we can develop a souped-up version of BBG that needs only the assumption that the "good/natural/special" approximations to k-median are close to the desired answer.

Personally, I think this is great. It says that for problems where we are trying to use solutions to objective A in order to solve problem B, then perhaps hardness results for A can be bypassed by using the assumption we are making *anyway* of the link between quality according to A and quality according to B in our given instances. For me, as someone who has worked in approximation algorithms for a long time, this was really cool.

"The same goes for the other papers of Balcan about Learning and Clustering via Similarity Functions.

They all try to make a conceptual point, but the conceptual point turns out, each time, to be highly flawed: the theorems are correct, but often nearly-vacuous, and in any case don't mean what you'd have thought they mean."

I suppose I should have imagined some some nut with an agenda would use this post to spout some vitriol, but to make clear, I'm disappointed by the anonymous comments, and I thank Avrim for taking time to respond.

Of course, there wasn't anything particularly useful to respond to -- just a statement of "I think this is a bad paper" with some wishy-washy comments with no specifics, no clear points, no actual references to any text. And then a generalization to the "other papers of Balcan".

Just to give my opinion on the opinion, if I saw a review like this when on a PC, my opinion of the reviewer would be downgraded for the rest of the meeting. If I received a review like this, I'd assume the reviewer was either personally biased against me or just an unsuitable individual for reviewing.

It's OK to have a poor opinion of a paper (or, I suppose, a researcher). Indeed, the point of my post was that it should be useful for students to see "bad examples", and learn from them (as they learn from good papers). But particularly if you're going to call a paper bad, you should have the argument ready to back it up.

On the plus side, perhaps these comments will encourage MORE people to read the BBG paper, which should, I think, be good for the authors.

Can you give an example of what you are looking for here? (Maybe the old example, even if it is no longer appropriate, would serve as good illustration.) There are plenty of bad published papers out there, even in top conferences (and certainly as you go to weaker conferences). But I don't see how they would be very interesting to read in class. I think you also risk being insulting to people: I certainly wouldn't want to find my worst paper being highlighted as such in your class, regardless of what you (or I) though about it.

This paper strikes me as a potentially "bad" paper, for reasons which I will outline below. The basic gist of it is that a major conclusion which the authors hold out as significant or surprising appears, upon closer inspection, to be directly built into the authors' definitions, and the definitions seem a little sketchy.

http://www.chidalgo.com/Papers/HidalgoRodriguezPhysicaA2008.pdf

The authors study the persistence of ties in a network of cellular phone calls.

They define the "persistence" of a tie thusly: First, the the cellular phone records are split into ten contiguous time periods (called "panels") of equal length. Persistence is then quantified as the fraction of the panels in which a tie appears.

This means that persistence is a numeric quantity with a value of 0 if the the tie never appears and 1 if it appears in all 10 panels. Note a natural consequence of this: A tie gets a persistence value of 0.5 under each of the following conditions: 1) It appears in each of the first 5 panels and then disappears, 2) It appears in the 5th panel and persists forever, or 3) It appears in every other panel. Intuitively, these do not seem like equivalent notions of persistence.

The first three they consider as node attributes (rather than edge attributes) such that node reciprocity is defined as the fraction of all ego's ties that are reciprocated. They do this in order to study the average persistence of a node's ties as a function of these three characteristics. They find that:

1) Average persistence decreases with degree.2) Average persistence increases with clustering coefficient.3) Average persistence increases with reciprocity.

Their definition of link reciprocity is unclear. The only definition they give is "was there ever a panel in which the caller and callee reciprocally called each other?" I'm assuming the quantity they call "reciprocity" is the fraction of the panels in which the tie was reciprocal.

If this is true, then it seems to me that their main conclusion "reciprocity is the strongest predictor of persistence" is built into the definitions rather than being a brilliant sociological deduction.

Remember that persistence is defined as the fraction of panels in which A calls B. If reciprocity is defined as the fraction of panels in which A calls B and B calls A, then persistence increases whenever reciprocity increases merely by definition. Of course it is possible for persistence to increase without reciprocity increasing, and that is the reason that the correlation is not perfect, but the methodology seems flawed.

It's also possible that my interpretation of their definition of link reciprocity is incorrect, but it probably should have been more clear.

I'm not sure if this is what you mean by "bad." I would be interested in hearing defenses. Any thoughts?

While BBG is a fine paper (if one whose eventual impact seems uncertain to me), clustering in general is a *great* source of bad papers. Drawing randomly from hits on "clustering" in the ACM Digital Library or Google Scholar, will bring up many a stinker in good conferences and journals. You can decide whether it's a success or failure of that randomized algorithm if an old clustering paper of mine comes up. :-)

The common failing is the lack of connection between the criterion being optimized (always chosen for computational convenience) and the properties of any concrete task. There are thousands of clustering algorithms, and precious little insight into what problems which are good for.

To put this another way: I love as much as anyone the challenge of developing better approximation algorithms for important problems. For instance, I'd love to get a factor-2 approximation to k-median. But the BBG algorithms already have the property that if data is such that all factor-2 approximations are close to the desired answer, then BBG will also get a solution close to the desired answer. So, to claim my new factor-2 approximation is better, I need to be able to say "it doesn't just produce any old 2-approximation, it produces a good/natural/special one" for some compelling notion of "good/natural/special". And if we can articulate what we mean by that, we can then ask if we can develop a souped-up version of BBG that needs only the assumption that the "good/natural/special" approximations to k-median are close to the desired answer.