Wednesday, February 18, 2009

Double-Blind Reviewing

Updated 2/22/09: Added problems 5-8 as I understand them from the comments. These are mostly just extensions of 1-4, but I'm trying to make that more clear.

Michael discussed conflict of interest issues last week (in relation to STOC PC discussions). For me, the discussion brought up the question of why we don't use double-blind reviewing, thus handling the bias issue that is at the heart of conflicts of interest in a way that's been shown to reduce bias. And I do mean shown, I'm not just guessing that it might reduce bias, it's been shown to reduce bias against female authors (I'm willing to extrapolate from that that it reduces bias in general). I know that many people are against using double-blind reviewing for theory conferences. I don't understand why. Here are my guesses about what objections might be, and the reasons that I don't agree with them or understand them.

I, the author, am too lazy to anonymize my paperIt doesn't really take any extra work to do basic anonymization of a paper. Most people refer to themselves in the third person when talking about their past work anyway. We don't have large continuing software projects, so we don't need to anonymize the names of those. These seem like the only real pieces of the paper that can reasonably be anonymized.

I, the PC member, am too lazy to deal with anonymized papersHaving never been a PC member, I don't understand this potential argument.

It wouldn't actually be possible, everyone would know who's paper it was anywayAgain, double-blind reviewing has been shown to reduce bias. As scientists, it seems that we should respect the research and not just decide that this would happen.

I, the author, want to be able to post a copy of my paper on my website as soon as I submit it.This does seem like an issue that would need to be dealt with. In general, I don't think that most reviewers/PC members would actually go trying to find the paper they're reviewing to determine whose it is. Authors could be asked not to post information and/or reviewers could be asked not to look for it. I'm not convinced that the web-based version of this problem is significantly worse than the traditional version where colleagues know what each other are working on.

We're better/smarter than other fields and don't actually have a problem with bias affecting which papers we accept.See #2 and #6. Also, without some sort of research to back this up, if you make this argument I'm likely to assume that you don't have the necessary social science background to accurately support this claim. (I certainly don't.)

The identity of the authors helps to determine if the paper will have an impact/ turn out in hindsight to have been good.This is an argument I do understand and find logical, but I strongly disagree with it. I believe the work should stand on its own. Perhaps I should actually have phrased this problem as I, the reviewer, am too lazy to take the time to fully read and consider the paper. I think this raises some interesting issues about the quality of reviewing, and perhaps revisions to that process also should be considered. But I admit that I may be missing something here, having not reviewed a large number of papers. Still, I imagine that most of us would be wary of a conference that explicitly stated that it took the reputations of the authors into account. If we're not willing to explicitly state the rules by which we do this, we shouldn't do it at all (again, because it's obviously biased).

Our reactions to double-blind reviewing are based on emotions. Some people find it fair and rigorous. Some people find it insulting and limiting.This also makes sense to me. I'm clearly in the first category, and don't understand the second. I believe it's a compliment to authors that their work can stand on its own, without having their name get the work through (see #6). I don't find it at all limiting (see #4 and #8).

If we can't do it perfectly, we shouldn't do it at all.See #3 and #4. Of course there are always going to be reviewers who guess the authors (correctly or incorrectly). There might even be reviewers who are already familiar with the work because it was posted online already. In some ways, it becomes up to the authors to determine if the reviewers will know that it's their paper - those of us who are on the negative end of the bias spectrum or just believe in this process can choose to really be careful so that reviewers won't know it's our paper. But whatever happens, there will be some papers that actually remain anonymous and are judged only according to the work done. I'm fine with a system that's better even if it's not perfect. Similarly, I recognize that this won't suddenly fix all the problems women in computing face. It's still no reason not to fix this one.

Suresh - at their heart, those objections still seem to be based in laziness and the belief that we can ignore the research done in other areas showing that double-blind reviewing reduces bias. Perhaps we need a study done showing that it reduces bias in theory, but I doubt that's ever going to happen. Still, evidence in other fields show that it reduces bias. Without proof, how can we believe that we're that different?

Anonymous 9:52 - what do you mean by "nobody yet figured out a way for the authors to write a paper they do not see?" Do you mean that authors are frequently given their own paper to review? (Does this fall under #2 on my list of objections?)

I agree that we should have double blind reviewing. They have it in crypto and are still allowed to post to eprint. I think that when you receive the paper to review with no names, even if you think you can guess who the authors are, it sends a signal that names are not supposed to matter.

Also, after being on a recent large conference PC, I thought that when it came to choosing borderline papers, names did matter. There is simply no reason not to at least try double blind reviewing in algorithms.

Again, double-blind reviewing has been shown to reduce bias. As scientists, it seems that we should respect the research and not just decide that this would happen.

The research is generally not very compelling, because there's no way to control the behavior of authors. Perhaps female authors feel that double-blind reviewing is superior and are more likely to send their better papers to journals that practice it (while redistributing their worse papers elsewhere because it looks strange to publish exclusively in one venue). Then of course the percentage of female-authored papers is going to go up, but we won't learn anything about whether the review process has become fairer.

My objection to double-blind reviewing is actually quite different from all those you mention. I believe that the identity of the authors ought to be taken into account when evaluating a paper. It should only be a small influence, but definitely not always zero.

For example, suppose you receive a paper that proposes an off-the-wall model for a curious problem nobody seems to have thought about about. The value of such a paper often can't be predicted in advance: it could shape a new field or be a total dud. However, the track record of the authors can be a pretty good guide. If Les Valiant or Noam Nisan is involved, then you should take it very seriously indeed. If the authors have a poor track record, then that certainly doesn't mean the paper must be worthless, but it becomes a lot more of a gamble. If you forbid the reviewers to consider the track record of the authors, then they will make worse decisions on average. This is admittedly an extreme case, but many cases are at least somewhat similar (since much of the value of a paper comes from where it will lead in the future).

9:58pm - I think that you're assuming that conferences should be trying to publish the research with the most impact. I disagree with this assumption and believe that conferences should publish the "best" papers, by which I mean the most interesting and ground-breaking. It's true that often the papers with well-known authors have the most impact. But for the advancement of the field, I think it's important that the "best" papers end up getting published in the best conferences so that they get the most exposure. Also, remember that there was a time when all the well-known researchers weren't well-known.

"I, the author, am too lazy(?) to write a paper that is not an incremental improvement on a previous paper."

You can't have double-blind review without doing away with incremental-improvement papers completely. I don't know how much of an issue this is in theory, but in certain other areas of computing it's major. Often when you see the title you can narrow down the likely authors to a handful.

Wow, I was going to blog about this -- and still might -- but I might as well comment here.

1) I actually find argument #4 that you make the real problematic one. I really have trouble getting behind a process that prohibits authors from making their work available for months (or, if a paper is rejected and has to be re-submitted, possibly YEARS) to satisfy the double-blind reviewing process. If push came to shove, I'd end up voting against it unless I heard a good counter-argument for this problem -- like papers could go up on arxiv, in which case it's not really double-blind in the strict sense some people are after.

2) Suresh -- I laughed out loud at your comment, since I really felt a lot of the counterarguments I got to having people leave the room for conflicts of interest boiled down exactly to, as you say:

"We're smart enough not to be biased: we can factor out author information."

Let me state for the record, I think that's a lousy argument. (I avoid swearing on blogs; but please translate appropriately.)

Anon 5-6-7: I think the issue of whether the author's name should matter is a difficult one, and Anon #6 has presented the argument why it should; past performance can actually be an indicator of future results (and not because of bias). As a related example, I think that the NSF should be able to use a researcher's past record as a criterion in deciding funding. You don't necessarily want to give money to the people who can write the best-sounding proposal; you want to give money to the people who can actually produce the best research, and I think a record of prior research results is at least a good an indicator for that as the written proposal.

The question remains that, if an author's name is allowed to have some weight, how should the community deal with that? In particular, it would be a very bad idea if such a weighting ended up excluding outsiders from other communities from participating in our community, or excessively hurting graduate students/young faculty as they try to build their reputations in our community. I don't think there's a "right answer" here, but it's something the community as a whole needs to think about in terms of what's best for the community. I believe Sorelle's argument would be that research and past experience suggests that once you allow such biases in favor of established individuals, they have a greater and more pernicious effect than people realize. I'm sympathetic to that argument. As you can see, I find it a difficult question.

I think that you're assuming that conferences should be trying to publish the research with the most impact. I disagree with this assumption and believe that conferences should publish the "best" papers, by which I mean the most interesting and ground-breaking. It's true that often the papers with well-known authors have the most impact. But for the advancement of the field, I think it's important that the "best" papers end up getting published in the best conferences so that they get the most exposure. Also, remember that there was a time when all the well-known researchers weren't well-known.

Part of the problem is that it is difficult to judge what the best papers are. For example, the original papers on zero-knowledge proofs were repeatedly rejected from top conferences. In hindsight, they meet absolutely every standard for being "best" (originality, importance, beauty, subtlety, applications, etc.), but at the time some were skeptical. In general, the best we can do is to guess which papers are actually the most interesting and ground-breaking.

It's tricky, since there are two fundamental values here: the best research should be published, and all researchers should be treated equally. These values conflict, and there's no simple way to resolve the conflict.

Ideally, papers that can be judged purely based on what's written in them should be. When a paper fits into a well-defined thread of research, it can generally be evaluated on its own. However, some papers just don't fit into existing categories, and there's no clear basis for judging them. If you prohibit the consideration of track records, then such papers just won't be published in your venue, and that's a real loss for the field.

It's true that the use of track records for evaluation is biased against those who haven't yet had a chance to develop one. That's why they should only be considered when necessary. However, there's no system under which fresh Ph.D.s will ever be allowed to publish unevaluatable papers (unless all quality control goes out the window). By contrast, experienced researchers can be allowed to use their track record as an argument for publishing occasional off-the-wall papers; they had better not abuse this privilege, or they'll blemish those track records.

I think the hardest part of this whole issue is that it is fundamentally an emotional issue rather than a scientific one. Some people love the idea of double-blind review; it feels much more fair and rigorous and it gives them greater confidence in the evaluation process. Other people hate it; it feels insulting and limiting. I actually think these are in some sense the two strongest arguments. If it encourages people, then that's good even if it doesn't lead to fairer reviews. If it upsets people, then that's bad even if it does slightly increase fairness. The problem is that it definitely has both effects.

In general, the best we can do is to guess which papers are actually the most interesting and ground-breaking.

While it is harder to judge the impact of a speculative paper, it can be done. The only reason the field fails so miserably at it is that they approach it with the same reviewing tools and techniques as incremental papers (i.e. is the theorem correct? is the proof non-trivial?).

As other people have pointed out, it takes just one bad review to shot down a paper in STOC/FOCS. SO all you need is one reviewer who "didn't get it" to have the paper be rejected over and over again. A speculative paper is pretty much assured to always be missed by at least on reviewer.

Michael, I think your analogy with the NSF process is not a very good one. When the research hasn't taken place at all, past record is indeed a good proxy for future performance. However in the case of a speculative paper a portion of the research has already taken place and hence the quality of the result can be evaluated on the spot.

The argument from anonymous 9:58pm is to a certain extent just a variant of #2: "I'm too busy to spend the time estimating the impact of this proposal, so I'll use the author's name instead of doing my job".

However in the case of a speculative paper a portion of the research has already taken place and hence the quality of the result can be evaluated on the spot.

This assumes that there's some straightforward notion of "quality" that can be evaluated "on the spot", which is precisely what I deny. Evaluation of research is actually incredibly subtle, in its own way every bit as difficult as actually carrying out research (and in fact the two tasks have a lot of overlap - a large part of doing research is evaluating the potential of various methods one could try).

I believe that nobody is very good at evaluating research (and those who think they are can be dangerously overconfident). Of course many cases are perfectly clear, and others can be reliably determined with some work. However, there are a lot of gray areas, and furthermore everybody makes outright mistakes, which become clear only in hindsight.

There are at least three difficulties: intellectual biases (what you know or don't know, like or don't like, etc.), personal biases (for example against women or against people with unprestigious backgrounds), and underdetermination (the available information simply does not determine the answer). I'm convinced that intellectual biases are by far the most serious difficulty, but nobody knows how to address them. Getting multiple opinions is a good start, but it only goes so far and this problem is probably not solvable.

My impression is that personal biases have a small (on average) but pervasive effect while underdetermination has an uncommon but occasionally quite large effect. The problem is that the best way to get rid of personal biases is to strip out information, while the best way to get rid of underdetermination is to collect as much information as possible.

So the real question is how much of an effect personal biases have in the reviewing process. My (admittedly ill-informed) guess is that they are small potatoes compared to the effects of such biases elsewhere. To make up a number, perhaps 5% of the difficulty of being a woman in CS comes from conference reviewers who don't take the work as seriously as they would if a man did it, while 95% comes from other biases and pressures that would not be addressed by double-blind reviewing.

Certainly discrimination and unfairness are huge problems and the community should devote a lot of time and effort to eradicating them. I just think double-blind reviewing has at best a modest pay-off while also having sizable costs.

Evaluation of research is actually incredibly subtle, in its own way every bit as difficult as actually carrying out research

For incremental papers I think we do an ok job: not a great one, but an ok one. We all know about the biases (hot topics, technical difficulty, etc) but we do much better on those papers than in speculative ones. The trick is to ask people familiar with the subject and likely to have worked on similar questions to give their impression on the quality of the work.

The problem arises when we have contributions that seemingly come out of nowhere, such as zero-knowledge proofs. Those too can be evaluated, but they require a different set of tests than a regular incremental paper.

For one we might need more reviewers than the usual 3-4, and a collection of senior and junior people at that. We need senior researchers since by virtue of their experience their vista spans further. We need junior researchers since they are not vested on any specific theory or scheme so they are more open to innovations--at least that is what Thomas Kuhn claims.

We might also need to ask different questions such as "are the ideas novel, sound and likely applicable" instead of "are the theorems/proofs difficult and far reaching". Moreover contrary to incremental papers the more "obvious" an original fundamental idea sounds the more likely it is to be a fundamental contribution. There is a certain obviousness to using hard functions as in Diffie-Hellman or to measuring an external memory algorithm performance in terms of I/O operations a la Aggarwal and Vitter.

I just think double-blind reviewing has at best a modest pay-off while also having sizable costs.

I wasn't necessarily arguing for blind review, but against conscious use of names as a substitute for reading the paper, understanding its claims and pondering its reach.

CRYPTO ALREADY DOES THIS.Whatever objections you have to double blind reviewing keep in mind that CRYPTO ALREADY DOES THIS.ANd for format and styleCRYPTO, FOCS, STOC, etc aresimilar enough. Hence the pragmatic objections can be overcome- they were by CRYPTO.

YES some authors will be known.So what? Having double blind sendsa powerful message.

I do have sympathy with the argument that if a big name has a new idea then maybe being a big name should count some. But it shouldn't count that much. AND double blind will lead to greatstories like``when Scott Aaronson proposedphoton classes, a concept that is now taught to freshman, it was initially turned down.''We need stories like that :-)

"I, the author, want to be able to post a copy of my paper on my website as soon as I submit it."

This does seem like an issue that would need to be dealt with. In general, I don't think that most reviewers/PC members would actually go trying to find the paper they're reviewing to determine whose it is. Authors could be asked not to post information and/or reviewers could be asked not to look for it.

This is huge, and asking authors not to post their papers would be a terrible solution. The point of conferences is to spread research, not to stifle results.

In the alternative, you will have some PC members who know who the authors are. These will typically be the experts, or those who most closely reviewed the paper. (You often can't do a serious review without looking up previous literature, so asking the PC members not to search the web seems impractical.)

Generally, I don't think it will make much difference, but I can imagine a difference in the borderline cases.

Whatever objections you have to double blind reviewing keep in mind that CRYPTO ALREADY DOES THIS.

Yes, but they had much more powerful reasons to introduce it in the first place. First, commercial interests play a far more important role in the Crypto community than they ever have or will in FOCS/STOC (and sad experience has shown that the prospect of gaining/losing lots of money can corrupt people you would never expect, and can seriously influence people who aren't consciously corrupted). Second, the Crypto community has had far more serious internal disagreements and rifts; by contrast, the FOCS/STOC community is much more homogeneous and agreeable.

Hence the pragmatic objections can be overcome- they were by CRYPTO.

So yes, they can be, but by circumstances that do not apply to most theory conferences.

I think a relevant question here is "what is the goal of the conference?" Is it to pick the best program for the conference so as to advance research? Or is it to "rate" a certain number of papers as better than others?I think most arguments for double blind reviewing implicitly assume the latter. I am not sure if that is a fair assumption.

"This is huge, and asking authors not to post their papers would be a terrible solution. The point of conferences is to spread research, not to stifle results."

Is it really? Because when people's papers representing solid contributions get rejected from STOC/FOCS, they are less likely to post those results. So the whole concept of having competitive conferences stifles research to some extent.

I was recently on a major theory conference committee and I can say that big names DEFINITELY helped on the borderline papers. After that experience, I am 100% for double-blind reviewing. If anything, it would remind the PC members that they cannot use the argument "the authors are well-established" to support papers, since they can not admit they know the authors (even if they do). At the very least, they will have to come up with a more convincing argument.

If it made no difference, as they claim, why are they so much against it?

One reason is the belief that it does make a difference, but it's a negative difference.

Another is that it adds wasteful complications and overhead to the process. If it doesn't help, then we shouldn't bother, even if it doesn't hurt.

A final reason is the fear that it's setting the stage for much worse policies. With open distribution of preprints, double blind reviewing is mostly meaningless, but it's therefore not too harmful. If anyone tried to prevent dissemination of information to improve the review process, then it would be utterly dreadful for the field. The biggest danger of having double-blind reviewing is that it is just tempting people to say "Wait, the double-blind reviewing isn't having the desired effect. That's because we're doing it wrong, so let's prohibit anyone from saying anything about their research until it is officially published." I don't think there's any real chance that this could be enforced, but the prospect is so terrible that I'd rather fight it one step in advance and not even risk having the situation arise. Double-blind reviewing amounts to a philosophical statement that publication should be the first time anyone ever sees a research paper. Maybe this restriction makes sense in some fields, like medicine, where it could be dangerous to let the public draw their own conclusions from unrefereed papers. However, it's a position I have no intention of endorsing in computer science.

One reason is the belief that it does make a difference, but it's a negative difference.

I was not talking about this. I was referring to the many people who in one breadth claim that it makes no difference, in the next they proceed to strongly oppose it.

But since you brought it up, let's go over your arguments about negative cost.

The biggest danger of having double-blind reviewing is that it is just tempting people to say "Wait, the double-blind reviewing isn't having the desired effect. That's because we're doing it wrong, so let's prohibit anyone from saying anything about their research until it is officially published."

Well, given that the AI community has had blind reviews for quite a while and this hasn't happened, your whole scenario conjures a strawman, nothing else. Ditto for networks, crypto and DB.

Double-blind reviewing amounts to a philosophical statement that publication should be the first time anyone ever sees a research paper.

This is another strawman.

The double blind process is not meant to create an airtight zero-knowledge protocol.

The goal of the double blind process is that, on the average, one does not know with certainty who wrote the paper. Yes, one can make a guess that with 50% probability it was written by group X, but then again with 50% it was *not* written by group X, making the submission process rather more fair for the "no names" who wrote this last half (why a 50-50 split? because studies have shown that this is about right for the accuracy of guessing in double blind reviewing when people are prompted to guess the identity of the authors).

This is why for several double blind conferences you are still allowed to review the paper even if you accidentally chance upon the identity of the authors.

"Double-blind reviewing amounts to a philosophical statement that publication should be the first time anyone ever sees a research paper."

No, it does not. Under double-blind reviewing, you are free to post your article to arxiv, your webpage, or wherever.

The fact is that people usually don't post papers that they know are borderline papers to these sites. Therefore, someone who is famous has to make the following choice: reveal my identity because it will help get my paper into STOC AND make my results public

OR

not make my results public and submit anonymously.

These are precisely the papers for which we *REALLY NEED* double blind reviewing.

PCs don't like double-blind because for the borderline papers, they can "play it safe" and just accept big-name papers. This should be avoided.

I also support double-blind reviewing. We should certainly allow people to post pre-prints.

Even if most of the PC suspects who wrote something, pretending not to know might reduce unconscious bias. For example under the current system when a paper authored by a female author is discussed the author is presumably referred to as "she." Under double-blinding the PC would refer to all authors as "they", reducing the chance of unconscious gender bias.

There's only one objection to double-blind reviewing that I consider serious enough to possibly tip the balance against double-blinding being a good idea. David Eppstein brought wrote in a comment on MM's blog (http://mybiasedcoin.blogspot.com/2009/02/stoc-pc-meeting-part-iii-conflicts-of.html Feb. 4 2:10 PM): "Then we fall into a problem of having a double standard: the set of well-known people who publish preprints gets treated one way, the set of people who choose to maintain their anonymity or who are not well enough for their preprints to be recognized gets treated differently."

My personal guess is that the biases under double-blind reviewing, including the tendency for work by well-known authors to be recognized more frequently, would be smaller than the biases we have currently. (If you don't believe we have biases currently, go read a couple papers describing actual controlled experiments on the effect of authors names on reviews.)

Theory conferences (and especially socg [but much less FOCS/STOC]) try to verify that the submitted results are correct. The reputation of the authors is very important in this. If I see a complicated paper by Sharir, lets say, I will assume that its correct. If on the other hand, I see a paper of the same complexity written by Chookee Chookee, I will be more skeptical.

I think the 'we can figure out who the authors are anyway' is a canard. It's an 'in principle', dare I say theoretical objection that has little grounds in reality.

If you look at the actual reviews produced by reviewers for conference, you have to wonder: who has the time to go around googling papers to figure out who wrote what ? I certainly don't, and only even try if there's something that smells really funny.

Again, it's a matter of margins. If 50% of the reviewers don't go looking, that's 50% MORE double blind reviewing you're getting. If 50% of the reviewers can't guess who wrote a paper, etc etc. Goes back to #8 I guess: the problem of perfection

The system IS broken. At least, I see it as broken since there is bias against female authors (see paper I referenced), and I see a paper getting a different review if its given two different author names as a clear indication that the reviewing process IS broken.

"Theory conferences (and especially socg [but much less FOCS/STOC]) try to verify that the submitted results are correct. The reputation of the authors is very important in this. If I see a complicated paper by Sharir, lets say, I will assume that its correct."

1. **Assume** does not mean you do not go and read the paper. Its just mean that your confidence in its correctness is higher. It means that instead of spending 50% of the time trying to make sure the paper is correct, I would spend 20% of the time doing it.

2. The fact that there is bias in the system does not prove that its broken. To show its broken you have to show that consistently wrong decisions are being made because of bias. Good luck proving that, because I think this is generally false.

3. Theory conferences have high acceptance rate. Even if 50% of the papers get accepted because of bias (which is way way too high even if we think biase corrupts the system), this still leaves half of the slots open to the unbaised submissions.

4. The fact that a single author paper is written by a female in my mind does not cause it to be judged differently and with more bias.

---I think the best way to protect against bias is to have high acceptance rate. I think bias matters way less if you have >30% acceptance rate. ---Naturally, I might be taking this point of view because I have no problem getting my papers published. Maybe the bias works for me, so I dont see it..

I didn't see anyone else mention a possible problem with the double-blind system: It makes it impossible for PC members to declare a conflict of interest.

If my former student (collaborator/best friend/spouse) submits a paper and I get it to referee, I'm sure to recognize the author even though it's been anonymized. Then what? A second round of paper assignments to clean these up?

The way conflicts are handled in a DB setting is that the onus shifts to the authors. As an author, I'm required to go down the list of PC members, marking the different kinds of conflicts. Of course, I could ignore this, but this generally doesn't happen.

re Anon: The point of the research that Sorelle's been pointing to is that it's no longer legitimate to say "In my mind there's no bias": most of the biases being uncovered (especially those against women) are unconscious - not amenable to self-examination and introspection.

Suresh: I checked out CRYPTO since they've been doing this forever, but I couldn't tell from the CRYPTO 2009 webpage if there's any system in place for authors to state conflicts of interest with PC members.

I have submitted papers to double blind conferences. When I have done so, I have felt constrained not to post a preprint. So the argument I've seen here that double-blind reviewing won't have a chilling effect on the open exchange of information is, by example, bogus. Whether the effect is by explicit submission rules or by self-censorship, it's there.

Though I think the argument that "lots of people are against it, therefore it's a good idea" is much stupider.

My reference point is all the database conferences. I've been an author and a PC member on these conferences, and that's how it works there.

0xDE is right btw: I have felt the chilling effect of the DB process when submitting, and have chafed at it, frankly. Conferences DO exhort people not to post preprints. But this is author-centric, and has to be balanced against the advantages.

Sorelle: I can't imagine that mat conferences are double blind: math conference vetting is only for sanity checking, and actually knowing the author is a reasonable sanity check: they don't filter based on content as far as I know. For math journals, I don't know how things work, but I'd find it odd, since many journals view an arxiv entry as the submission process.

For me, googling around is a natural part of a review, in particular when I am reviewing a journal submission.

Most papers are not about solving a well-known open problem. The contributions are minor; a reviewer cannot know offhand whether exactly this variant of that problem has been studied previously. The reviewer must do his own homework, find out about relevant prior work, and double-check that the discussion of prior work in the submission is thorough and honest.

Naturally all this takes some time, but it has been useful. For example, I have spotted not only cases where a lot of relevant prior work has been blatantly ignored, but also a serious case of plagiarism.

How is all this supposed to happen in a double-blind system? Googling around seems inappropriate, as it is far too easy to accidentally find the identity of the authors. On the other hand, how could I honestly recommend a paper for publication if I do not even know if it is literal copy-paste from someone else's work?

I'll restrict myself to listing some arguments in favor of non-anonymous submissions that I have not seen mentioned so far. (I also agree with many of the arguments that have already been given, but don't want to rehash those.)- sub-reviewing: It becomes more difficult to send a paper out to a subreviewer if you don't know who the authors are. Of course you don't want to send it to an author. But you also don't want to send it to someone with a conflict with the author, at least not without being aware of the conflict first. And you may want to try to find a subreviewer who might be less biased about a particular piece of work, or conversely you may want to send it to someone who might have greater familiarity with the work.

- putting biases in the clear: many have already noted that anonymity is easily violated. If a paper is not anonymized, it puts everyone on the PC on a level playing field in this regard. It also means that any active bias (say, if you know a reviewer has been a frequent co-author with the author of a paper) can potentially be detected.

- making it easier to discuss a paper: I think extra information is a good thing. It is easier to discuss talks that people have heard, or follow-up work, if the authors are not anonymous.

I don't claim that any of these arguments lock the case for non-anonymous submissions. But they need to be considered.

I should add that I am divided about the issue myself. As a grad student I was also totally in favor of anonymous submissions, but now I see the arguments in favor of the other side. Can I suggest, in the most constructive way possible, that one should serve on a PC and get some more experience about the issue before setting their opinions in stone?

For me, the best solution would be for papers to be anonymous when read/reviewed, but for authors' names to be revealed during the discussion phase.

Regarding the claimed anti-female bias, which seems to be the main argument in favor of anonymity, all I can say is (1) I have never seen such bias in the committees I have been on, and (2) the culture of the physical sciences is vastly different from the culture of (T)CS, in many ways. I don't think you can draw any conclusions about our field from a study done in the field of ecology.

Finally, I want to mention that the crypto community is divided about the issue. Many in the field would prefer non-anonymous submissions, and TCC does it this way.

Though I think the argument that "lots of people are against it, therefore it's a good idea" is much stupider.

Tsk. Tsk. You snipped half the quote to make it sound stupider. You left out the part where it talks about people strongly opposing it while claiming it makes no difference.

I welcome people who give reasoned arguments against double blind reviewing, and we have seen some among the opponents here, though by no means all. There's been plenty of strawmen, fallacy of nirvana and prediction of apocalypses which beggar belief, given that the system is in use in many other fields.

I am still waiting for a proof that the system is broken. Proof of bias does not imply the system is broken.

This may go back to a fundamental disagreement on the purpose of conferences (which is why I have trouble responding - from my point of view the "brokenness" is obvious). I believe the purpose is to accept the "best" papers (I suppose it's up to the PC to define best). If changing the name of a paper's author changes whether the paper is accepted or not, then the best papers aren't the ones being accepted anymore, the papers with the "best" (according to the bias of the reviewers/PC) authors are being accepted. Since this is not the purpose of the conference, the system is broken.

Ha. Well. The "Best" is never well defined. PC discussions are always fluffy, especially towards the bottom of the accepted papers. There is no full ordering on papers, and things are sometime very random.

So, I guess I feel that the current system is an acceptable approximation to the best of all possible systems.

Sorelle I'd be very carefull before I'd claim for a "proof" for a biased against females based on one study. Social scientists are very carefull to draw conclusions based on one study. Usually a meta analyses is done, various designs dealing with causal issues are used etc.

For example, for years there were many studies that "proved" that when a massege is signed by a female name it is evaluated less favorably etc. It was later found that all of these studies had a serious flow-all of them used consistently names for females that had lower valence. When this was corrected the "bias" disappeared.

Generally I wouldn't be trying to "prove" any of this at all, since I'm skeptical about any non-math proof. But I was asked for a proof (twice), so I did my best to explain my reasoning, which you're free to disagree with.

By the way, I believe that most (all?) math journals/conferences are double-blind. Historically, does anyone know how it is that we didn't end up adopting that?

I believe it's just the opposite: no math journals are double-blind. (Math conferences play a very different role than in CS, with no publication of extended abstracts or serious reviewing of papers. Giving a talk at a math conference is like speaking in a seminar in CS.) At least, 10+ years into a mathematics research career I've never heard of one that is, so I imagine if there are any they must be pretty obscure. I'm told that some journals experimented with it in the 60's or 70's but they all decided against it.

The closest thing I can think of is math education journals, about the teaching of math. They often use double-blind refereeing.