March 07, 2005

Raising standards -- by lowering them

Chris at Mixing Memory reports that blogging about (his end of) science is discouraging:

As an academic, I have spent a lot of time hiding away in the ivory tower, oblivious to the larger world around me. As a graduate student, especially, I had almost no time to pay any attention to what non-scientists were saying about cognitive science. However, on a fateful day in early 2004, I chose to crawl out of my hole and actually look at what other people were saying. I started reading blogs. And now I want to crawl back in!

I'm happy to say that he's decided to resist the urge, and to keep fighting the good fight. His stuff is smart, well informed, and well written -- take a look at his posts on Lakoff and framing, for example, or on recoveredmemories, or his most recent post on corpus-based approximations to meaning. However, I'm going to disagree with his advice to people writing about science.

Actually, his advice is specifically directed at "anyone who wants to talk about cognitive science, but has not spent a lot of time studying it", but the things that bother him are characteristic of science writing in general, and so his prescriptions apply more generally as well. You should read his Five Points for yourself, but here's my summary:

Chris says that "if everyone followed these guidelines when they wrote about fields in which they are not experts, maybe the public wouldn't have such a god-awful understanding of the sciences."

Though I sympathize deeply with Chris' frustration, I disagree completely with his prescriptions, for two main reasons.

First, if everyone followed his guidelines, there'd be an order of magnitude less science writing than there is, and there's already too little. Instead of putting up higher barriers to entry, we should be encouraging more people to do more thinking and writing about mathematics, science and technology (and history and literature and art, too, but that's another story). A rising tide of interest and involvement will lift all intellectual boats, even if some unsavory stuff floats up off the mud flats.

Second, the peer-reviewed literature may be the best thing we've got, but it's not very good. There's an enormous quantity of irrelevant junk in it, and a certain amount of out-and-out crap. Much of it is unreadable or misleading. Worse, a lot is missing -- questions that don't get asked, negative results that don't get published, whole problem areas that don't get addressed for decades at a time. And you don't have to agree with Steve McIntyre's views on global warming to sympathize with his complaints about "disclosure and due diligence" in the refereed literature.

1. Encourage everyone to think about science, and to write about it on the web, whether they know anything about it or not. And encourage them to criticize what others write, and to read others' criticisms, and to tell their friends about the best stuff that they find, whether in the popular media, or in the technical literature, or in weblogs. I claim that open intellectual communities intrinsically tend to generate a virtuous cycle: if there were an order of magnitude more science writing in blogs, there'd be less than an order of magnitude more crap, and more than an order of magnitude more good stuff. (The same is probably true for science writing in newspapers, though the network effects are smaller there.) This follows from a scientific version of
Moglen's Metaphorical Corollary to Faraday's Law: add more wires, lower the resistance, and more intellectual current is induced.

1. Open access on the web for all scientific publications, with durable doi-style references.
2. Open access on the web for all data and programs involved in scientific publications.
3. Standard APIs for references in all scientific publications, and methods for inducing trackbacks across all achives of such publications.

Point 1: Open access lowers encourages people to read (and evaluate!) primary sources, not just someone's summary. More people reading more papers is good.

Point 2: All the data and programs behind published claims should be published in electronic form, so that readers can check methods and results, try alternative models, and (most important) build on others' work. This shortens the half-life of mistakes, and accelerates the spread of good ideas.

Point 3: Now that nearly all journals, proceedings etc. are on the web, there's no excuse not to make it trivial to extract the citation graph (i.e. who cites whom for what). Then users can wander around in the graph, use it to calculate value via the analog of page-rank, and do all sorts of other neat things. The way things are currently done, finding the citation graph is a non-trivial exercise in text analysis and reference normalization, even for the documents that are not hidden behind a publisher's barrier. This is one place where "semantic web" ideas really ought to be imposed.

There will of course be cases where this much openness is not possible -- e.g. where data can't be published for privacy or intellectual property reasons. But such cases should be treated like the use of anonymous sources in journalism -- permitted only where the results are valuable enough, and there's no alternative.