Thursday, May 19, 2011

Why Sam Madden is wrong about peer review

Yesterday my former PhD advisor, Sam Madden, wrote a blog post consisting of a passionate defense for the status quo in the peer review process (though he does say that the review quality needs to be improved). In an effort to draw attention to his blog (Sam is a super-smart guy, and you will get a lot out of reading his blog) I intend to start a flame war with him in this space.

At issue: The quality of reviews of research paper submissions in the database community is deteriorating rapidly. It is clear that something needs to be fixed. Jeff Naughton offered several suggestions for how to fix the problem in his ICDE keynote. A few days ago, I publicly supported his fifth suggestion (eliminating the review process altogether) on Twitter. Sam argued against this suggestion using five main points. Below I list each of Sam's points, and explain why everything he says is wrong:

Sam's point #1: Most of the submissions aren't very good. The review process does the community a favor in making sure that these bad papers do not get published.

My response: I think only a few papers are truly embarrassing, but who cares? Most of the videos uploaded to YouTube aren't very good. They don't in any way detract from the good videos that are uploaded. The cost of publishing a bad paper is basically zero if everybody knows that all papers will be accepted. The cost of rejecting a good paper, which then gets sent to a non-database venue and receives all kind of publicity there, yields tremendous opportunity cost to the database community. Sam Madden should know this very well since (perhaps) his most famous paper fits in that category. The model of "accept everything and let the good submissions carry you" has always proven to be a better model than "let's have a committee of busy people who basically have zero incentive to do a good job (beyond their own ethical standards) decide what to accept" when the marginal cost of accepting an additional submission is near zero. In the Internet age, the good submissions (even from unknown authors) get their appropriate publicity with surprising speed (see YouTube, Hacker News, Quora, etc.).

Sam's point #2: If every paper is accepted, then how do we decide which papers get the opportunity to be presented at the conference? It seems we need a review committee at least for that.

My response: First of all, there might be fewer submissions under the "accept everything model", since there will not be any resubmissions, and there is incentive for people to make sure that their paper is actually ready for publication before submitting it (because the onus of making sure their paper is not an embarrassment now falls on the authors and not on the PC --- assuming once something is published, you can't take it back). So it might be possible to just let everyone give a talk (if you increase the number of tracks). However, if that is not feasible, there are plenty of other options. For example, all papers are accepted immediately; over the course of one calendar year, it sits out there in the public and can be cited by other papers. The top sixty papers in terms of citations after one year get to present at the conference. This only extends the delay between submission and the actual conference by 4 months --- today there is usually an 8 month delay while papers are being reviewed, and camera-ready papers are being prepared.

Sam's point #3: Eliminating the review system will discourage people from working hard on their papers.

My response: I could not disagree more. Instead of having your paper reviewed by three people in private, every problem, every flaw in logic, every typo is immediately out there in the public for people to look at and comment on. As long as submissions cannot be withdrawn, the fear of long term embarrassment yields enough incentive for the authors to ensure that the paper is in good shape at the time of submission.

Sam's point #4: Having papers in top conferences is an important metric for evaluating researchers.

My Response: This is a horrible, horrible metric, and being able to finally eliminate it might be the best outcome of switching to an "accept everything" model. Everybody knows that it is much easier to get a paper accepted that goes into tremendous depth on an extremely narrow (and ultimately inconsequential) problem than to write a broad paper that solves a higher level (and important) problem, but has less depth. The "paper counting" metric incentivizes people to write inconsequential papers. Good riddance.

Sam's point #5: Having papers accepted provides a form of validation, a way to measure progress and success. There is also some kind of psychological benefit.

My response: People who measure themselves in this way are doomed for failure. If you have a paper accepted that nobody ever reads or cites over the long term, you have made zero impact. Just because you managed to get a paper through three poor reviewers is no cause for celebration. We should be celebrating impact, not publication. Furthermore, I strongly disagree with the psychological benefit argument. Getting a paper rejected does FAR more psychological damage than getting a paper accepted does good.

In conclusion, it's time to eliminate the private peer review process and open it up to the public. All papers should be accepted for publication, and people should be encouraged to review papers in public (on their blogs, on twitter, in class assignments that are published on the Web, etc). Let the free market bring the good papers to the top and let the bad papers languish in obscurity. This is the way the rest of the Internet works. It's time to bring the database community to the Internet age. Imagine how much more research could be done if we didn't have to waste so much time of the top researchers in the world with PC duties, and revising good papers because they were improperly rejected. Imagine how many good researchers we have lost because of the psychological trauma of working really hard on a good paper, only to see it rejected. The current system is antiquated and broken, and the solution is obvious and easy to implement. It's time for a change.

23 comments:

Science has a great article on why this may be a bad idea: initialization has a huge impact on peer feedback effects. (http://www.sciencemag.org/content/311/5762/854.short) In short, there is very little correlation between popularity and quality. Now, maybe your argument still holds: the authors say that the bad stuff rarely becomes popular. But, things that do get attention might not get it for the right reasons. (Think of how many journalists jump onto papers that aren't academically interesting but have a catchy tagline.) Or worse, as the authors claim, the papers that get attention get it _for no reason at all_: it's effectively random.

2) Bad evaluations.What about a paper that claims a fantastic result but has the method all wrong? Under an arxiv.org-style approach, that paper still gets equal billing, people can cite it and build on it, and most folks may not read closely enough to recognize the problem. How do we handle this? I suppose we as researchers can just take on a higher burden of proof when we choose to cite a paper.

- Pay reviewers for their services. If reviewing is voluntary and anonymous, it is hard to hold reviewers accountable, honor code only work so much.

- Rate the reviews and hold the reviewers accountable for their job (since they are paid for it). This could mean making the reviews public, or else the authors or other reviewers or editors/program chairs can rate the review based on their quality.

Any other solution would just be a stopgap. What is missing today is accountability.

Disclaimer: I am very new to the research environment, had just one paper accepted and 4 rejected, so I can't claim I have an extensive publishing experience, and I may have got something wrong.

But regarding your comment #2, I would say that we can solve the problem recursively. If A presents flawed results and B builds over them, someone may catch the flaw in B's work (invalidating A's work upstream) or in A's (invalidating B's work downstream). I know, it seems that we will spend a lot of time looking for flaws and invalidating works in a big chain of citations. However:

1 - We, as a community, are not so incompetent. Cases like that should be the exception, not the norm.

2 - Chances of the flaws in A's paper not being caught by the whole community are much smaller than them being caught by 3 reviewers (given enough eyes all bugs are shallow).

3 - When things like that happen, both A and B would be embarassed to have their flawed papers out there (permanently). Peer pressure may work better than peer review.

In the end, I believe that the "accept everything model" has similar benefits, drawbacks and risks as open source software (I know this kind of comparison is pretty dangerous, but I do believe it applies in this case). And open source software has been going pretty well so far.

I also found some interesting thoughts related to this discussion at http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2609/2248 and http://michaelnielsen.org/blog/is-scientific-publishing-about-to-be-disrupted/

I found Naughton's proposal #4 (single-blind review) to be intriguing. Although this doesn't eliminate PCs, perhaps it is a good trade-off. The selling point is that it increases the qualities of reviews because the reviewers are not anonymous.

I'm unsure how to deal with submission quality, but as a young PhD student PC feedback is often helpful. If all submissions are accepted, I fear that people won't really bother to read and provide feedback.

Thank you all for the comments. This is a great comment thread so far.

Michael: I wonder if using some sort of PageRank type algorithm to assess paper quality might work better than the popularity measures used in the Science paper.

rfc9000: If we can't get rid of reviewing altogether, I like your idea. But where will the money come from?

mdediana: Nice points.

Noah: As the review quality deteriorates, some of the PC feedback actually does more to hurt a paper than help it. But I agree that alternative forms of feedback will be important. Researchers will need to be encouraged to post reviews publicly on the Web (which is more of a zero-blind review process). We will need to figure out how to incentivize that appropriately.

Noha: You have a concern "If all submissions are accepted, I fear that people won't really bother to read and provide feedback."

But, I don't think that is a problem.

If a paper is addressing an important problem, people will certainly read the paper and give high-quality feedback. For example, the claim of proof that P ≠ NP by a HP researcher has been thoroughly reviewed in a public way. Many people including Terence Tao gave their public comments. I think such a review approach should be better than in STOC/FOCS.

The real trouble is that too many paper are solving problems that only exist in author's dreams. This is very common in database conferences. Maybe that is why BigTable is published on OSDI.

I have to admit that I'm more on Sam's side on this case. My reasons, mostly from the perspective of a PhD student, are the following:

There is already an accept-all button for papers. There are hundreds of conferences every year that will be indexed by different search engines which is then aggregated into the all-mighty Google. From those conferences there are enough that accept all different kind of papers with regards to their quality that nobody really reads. So this would not change anything.

Top conferences should not be seen as a count metric, but as a "have" metric. The assumed quality of a top conference rates a paper without having to follow it's technological depth. It's not rare that you'll write a paper in your thesis about one topic and find yourself working on a totally different topic 5 years later. So how can anybody at least partially judge your work? Probably based on where you published. If new students start to work for you, where do you tell them to look for ideas? Just Google it?

Of course, getting a paper rejected is hard, because you know how much effort you put into it. And bad reviews suck, but again from my experience 80% of reviews I got, helped me to write a better paper. And I think this kind of validation is needed.

One point that is always neglected (as in the ICDE keynote): For me as a non-native English speaker it is twice as hard to write a paper that gets accepted. Having a native speaker helping you to transform your content into accept-speech is tremendously helping. And yes I think, reviewers are often to hard to non-native speakers.

Instead of changing the system, I would wish that there is a higher acceptance - in terms of judgement - for lower ranked conferences. Many of the problems you, Sam and Jeffrey mention come from the fact that for DB research there are practically only 3 conferences.

In addition, publishing the reviews would solve many problems. I want to know the reviewer feedback for any paper presented at SIGMOD / ICDE / VLDB + the version they submitted.

This comment thread is way too civilized. This isn't the flame war I was trying to start :)

Lestat: thanks for the comment.

grundprinzip: Interesting point about the non-native English problem. But I feel like this supports my point even more!

Also, as long as the community uses the peer review process, any paper submitted to an "accept anything" venue (e.g. the Web) will always be assumed to be not good enough to be published. Hence, for this to work, the whole community needs to switch to "accept everything". Otherwise, none of the benefits will be achieved.

One important point that is still not clearly stated: peer review allows people from external fields to look at what the community values from an analysis by anonymous experts rather than a strict popularity contest. (That's the goal anyway.)

The accept everything model does not provide a way to filter out what the community as a whole values (without reservations). Even if we remove peer reviews altogether, we still need a mechanism to provide that for outsiders.

Mehul: If you insist this is an important goal (I don't feel the same way), you can always do an analysis of which database community members are citing which papers. I agree that this is somewhat of a popularity contest, but if you're citing a piece of work in your paper, it usually means that your paper was at least somewhat influenced by it (thereby being much more robust than a simple online vote).

I think knowing what a scientific community values is important. Although I don't think peer review is the only or ultimate factor in deciding what the scientific community endorses, it is the starting point.

Why is it important ? Because oftentimes the results of our work influences other fields and, more importantly, informs policy. Innumerable times, I have seen work based on previous work that was irrelevant or flat out incorrect. I have also seen policy makers cite work to push an agenda, regards of its merit. Removing peer review only exacerbates the situation.

In both of these cases, peer review is the starting point for vetting out the ignorant and incorrect from plausible. Without anonymous peer review, its difficult to get completely honest expert opinion, especially from young and creative researchers who are vulnerable to retaliation. Certainly, it cannot be the only metric or over-emphasized, which it is today. But, without it, we will need some other mechanism that brings the same level of honesty without fear.

rfc9000: If we can't get rid of reviewing altogether, I like your idea. But where will the money come from?

Today reviewers are not paid anything, program chairs/editors are not paid anything, authors have to pay exorbitant amounts to present their paper once their papers are accepted, and readers have to pay to read the papers. The only profit-making party today is ACM.

In other words, the money should come from ACM (or IEEE, Usenix, etc.).

In summary, one of the major problems with the current peer review system is that papers are expected to contain a full-story. This is a huge drawback and obstacle.

A full-story may be very hard to come up with in the first place - especially for young PhD students who are unaware of all the hidden rules of paper writing. Another drawback of full-story papers is that the 8-12 page paper will be judged as a whole. So even if some parts of the paper are great and others are substandard, it is likely that the substandard pieces will kill the entire paper.

Paper bricks fixes this. With paper bricks you can concentrate on the actual contributions and not spend your time selling it with a nice story, putting it into another context, inventing yet another niche, etc etc. Your are back to spending more time with the actual technical contribution rather than story telling. This also makes life of the reviewers much easier. There are many more advantages discussed in my paper.

A major principal of reviewing is _editing_. The _editing_ process has to be improved to remove the variance and randomness in acceptance decisions. How come the quality of accepted works at top conferences varies that much? This has to be fixed.

And don't get me even started on the quality of "benchmarks" or experimental evaluations we are seeing in accepted top conf papers today...

I'm glad you wrote up your bricks idea in the SIGMOD Record. I certainly think it is intriguing, but I wonder how well it will work in practice. I also don't think it solves the poor review quality problem (though it might solve other problems). However, what's nice about it is that unlike the "accept everything" idea, which the whole community has to adopt for it to work, the bricks idea can be adopted for just one conference (e.g. CIDR) and we can experimentally see how well it works in practice. If the community does not adopt the "accept everything" model, I would certainly endorse trying out bricks in one conference and seeing what happens.

All -- here's a half-way proposal. Suppose we required every author of a paper to review one other paper. Assuming there are at least 3 authors per paper on average, this would yield 3 reviews for every paper. We could give author's 2 weeks to complete their review(s). Senior people would do more reviews, since they submit more papers, but they'd be disincentivized from submitting garbage. We could provide some system for routing papers to reviewers who had some expertise in the area via keywords or an expanded list of subject area tags for each paper. To make you "its not enough like the Internet" folks happy, we could post the reviews publicly (anonymized), and let anyone who wanted to comment. We'd then have a small program committee that would be responsible for assembling the program (who get's to talk, giving awards, etc.) Those people would be required to meet in person and actually discuss the papers they are accepting.

I think the conventional argument against this would be that graduate students may not be experienced enough to perform good reviews, but this shouldn't be in the areas where the graduate students have some expertise.

I think this is much more practical than just asking the Internet to do the reviewing, since I firmly believe the vast majority of the papers (many of which are likely good) will not be critically reviewed in that model. Everyone will fawn over (or piss on) Stonebraker's papers, and poor Abadi will be left to rot. In the "everyone does a review" model, every paper gets at least 3 reviews, which I think is really important.

Wait, so you want to solve the "poor review" problem by replacing a system of carefully chosen reviewers (who at least volunteered for the reviewing job) with a system where people have to review papers against their will and absolutely anybody can review papers simply by submitting something? Are you think this will improve review quality? Really?

(BTW: like I mentioned on Twitter, famous people already get more than their fair share of attention now. I don't think it the "accept everything" model will make it much worse.)

I am totally with accept everything, and I have a Suggestion and a Potential Problem. Let me start with the shorter first, the problem.

Problem: Rich will get richer and poor will get poorer. Because Sam knows you, even though he is VERY knew to the blogs world, he got a boost when you (with hundreds of followers) linked to him (in addition to his established big name in the community). The public review might suffer many of such bias and sub-communities; friends rate each other very high, discuss each others work and ignore others, cite each other, etc.

I have no clear solution to prevent such a potential problem.

Suggestion:Why don't we try having a track of "accept everything", or public review to be more precise, in all conferences for one year. This is in addition to the regular peer reviewed tracks. Thus, at the submission deadline time, the authors choose to go for either regular tracks, to be traditionally reviewed, or for public review track to be published immediate online. Published public review papers should be accessible in a blog-similar fashion, where researchers can comment, discuss, rate and possibly nominate for an in-conference presentation. By time the regular reviews are ready, or may be the camera ready are due, the top, say, 6 papers from the public review track are selected for conference presentations. In addition, the top, say, 10% are included in the official proceedings. After one year, we have data and experience and can evaluate the quality of work this public review is generating/allowing, how many submissions it attracts versus the traditional tracks, we get there, etc.

I think the problem of bias and sub-communities unfortunately already exists today. I really don't believe that the "accept everything" model will make it much worse.

I really like your suggestion, though I think the measure for the public reviews track should be actual citations rather than public ratings (to alleviate the exact problem you talk about). People should be able to comment publicly, but they should not be scoring papers.

I agree that the problems of bias and sub-communities won't get worse, in fact, opening the "review" up might reduce it and makes it traceable.

As for ratings, actual citations count can be used at the end of the year to evaluate the quality of this new review process. But we need another metric that can be used (in couple of months) to determine which 6 papers to be presented in the conference...

5 months stale and I still gave this article and its comments more attention than I give most research papers... If the CS community can't develop an optimized page(paper) rank algorithm following the graph of authors. A particular forum could be defined by its paper/author anchors and go from there. The paper/author anchors could even be chosen from the wealth of past inter-cited papers to focus the conference. Honestly, in 10-20 years I think such a type of publication medium will be much more palatable to the crowd-sourced palate that grew up conditioning on the internet :)

Daniel Abadi

About Me

Daniel Abadi is an Associate Professor at Yale University, doing research primarily in database system
architecture and implementation. He received a Ph.D. from MIT and a M.Phil. from Cambridge. He is best known for his research in column-store database systems (the
C-Store project, which was commercialized by Vertica), high performance transactional systems (the H-Store project, which was commercialized by VoltDB),
and Hadoop (the HadoopDB project). Abadi has been a recipient of a Churchill
Scholarship, an NSF CAREER Award, a Sloan Research Fellowship, the 2008 SIGMOD
Jim Gray Doctoral Dissertation Award, and the 2007 VLDB best paper award. His
research on HadoopDB is currently being commercialized by Hadapt, where Abadi
also serves as chief scientist. He blogs at http://dbmsmusings.blogspot.com and
tweets at http://twitter.com/#!/daniel_abadi.