Free Science, One Paper at a Time

Howard Eisen, 1942-1987

On Father’s Day three years ago, biologist Jonathan Eisen decided he’d like to republish all his father’s papers. His father, Howard Eisen, a biologist and a researcher at the National Institutes of Health, had published 40-some-odd papers by the time that he died by suicide at age 45. That had been in Febuary 1987, while Jonathan, a sophomore at college, was on the verge of discovering his own love of biology. At the time, virtually all scientific papers were just on paper. Now, of course, everything happens online, and Jonathan, who in addition to researching and teaching also serves as an editor for the open-access, online-only journal PLoS Biology, knows this well. So three years ago, Jonathan decided to reclaim his father’s papers from print limbo and make them freely available online. He wanted to make them part of the scientific record. He also wanted, he says, “to leave a more positive presence” — to ensure his father had a public legacy first and foremost as a scientist.

How hard could it be? Howard Eisen had been a federal employee, so his work rightly lay in some sense in the public domain. And Jonathan, as an heir, presumably owned copyright anyway, along with his brother Michael (also a biologist, and one of the founders of the Public Library of Science, the innovative journal group that publishes PLoS Biology). Yet to the brothers’ continuing chagrin, Jonathan has found securing and publishing his father’s papers to be far harder than he expected.

For instance, even though Jonathan has access to the enormous University of California library system, which subscribes to a particularly high number of journals, he often can’t even find some his father’s papers. And when he finds a paper in a journal the university doesn’t subscribe to, he is asked to pay as much as $50 to read the paper — even though his father did the work with public funds. He’s not alone; one recent study found that even most university researchers have access to only about half the papers they need to cite for a given bit of research. Just yesterday, in fact, Jonathan asked on Twitter if anyone could send him a copy of one of his father’s paper and confronted a paywall asking for his credit card number. “I ain’t payin’,” he replied.

Meanwhile, Jonathan has found and downloaded the PDFs for about half his father’s papers, but he remains uncertain whether he could safely post them on his website. While some publishers allow such “collegial sharing,” others leave their policies unclear, and he worries about getting sued. His brother urged Jonathan to post them.

“Come on,” Michael wrote in a comment at Jonathan’s blog. “I DARE them to sue us.”

Jonathan has posted the whole list at his blog and uploaded what PDFs he could obtain, and so far he has not been sued or asked to take them down. Yet he remains wary and unsatisfied. He knows that few researchers will find his father’s papers if they reside only on his web page. So for now, his father’s work remains buried in an old structure — a calcified matrix. Though Jonathan bangs away at the surrounding rock, he knows he hasn’t really pried the work loose. This frustrates him on two fronts: It stops him from freeing his father’s work. And it confirms to him science, which should be a fluid medium, has much of its content still trapped in old structures.

“I started this partly to test how hard it would be to try to make science more available in the current system,” he says. “I’m finding that even with my father’s papers, or even with my own, it’s not very easy.”

Jonathan Eisen’s quest has solidified his conviction that science needs to radically rework the way it collects and shares its data, methods, and findings. He has plenty of company. A growing number of prominent scientists want to replace the aging journal system with something faster, cheaper, and richer. The current system, they note, grew out of meeting notes and journals published by societies in Europe over three centuries ago. Back then, quarterly or monthly volumes could accommodate the flow of ideas and data from most disciplines, and the printed journal, though it required a top-heavy, expensive printing and publishing infrastructure, was the most efficient way to share those ideas.

“But now,” says Jonathan Eisen, “there’s this thing called the Internet. It changes not just how things can be done but how they should be done.”

As Stanford biochemist and PLoS co-founder Patrick Brown put it a few years ago, “What seemed an impossible ideal in 1836, when Antonio Panizzi, librarian of the British Museum, wrote, ‘I want a poor student to have the same means of indulging his learned curiosity, … of consulting the same authorities, … as the richest man in the kingdoms,’ is today within reach. With the Internet, we have the means to make humanity’s treasury of knowledge freely available to scientists, teachers, students and the public around the world.”

“The existing system worked well for quite a while,” says Jonathan Eisen. “But it was not designed by theory. It was designed by constraints.” In a world that provides communications conduits far larger and faster, those constraints have now made science’s traditional pipeline a bottleneck.

~

To get a sense of how the current system curbs science, consider a rare case in whichresearchers attacked a big medical problem with an open-science model. In 2004, in the United States, a network of government and private researchers, including large drug companies, used open-science principles to accelerate research into Alzheimer’s. The project, as Gina Kolata aptly described it in the New York Times last summer, “was an agreement … not just to raise money, not just to do research on a vast scale, but also to share all the data, making every single finding immediately available to anyone with a computer anywhere in the world. Before that, researchers worked separately, siloing off much of their work. Now methods and data formats were standardized. The data would immediately enter the public domain, where anyone could build on it.”

An extraordinary project ensued. The U.S. National Institute on Aging contributed over $40 million, and 20 companies and two nonprofit groups kicked in another $25 million to fund the first six years. The program produced an explosion of papers on early diagnosis and helped generate more than 100 studies to test drugs or other treatments. It greatly sped and opened the flow of findings and data. According to the New York Times, the project’s entire massive database had been downloaded more than 3,200 times by last summer, and the data sets containing images of brain scans was downloaded almost a million times. Everyone was so pleased with the results that they renewed the accord this year. And all because, as a researcher told Kolata, “we parked our egos and intellectual-property noses at the door.”

The language used here — everything entering the public domain, the dismantling of silos, the parking of egos and IP padlocks — might have been lifted from an open-science manifesto. And even Big Science appreciated the outcome. To open-science advocates, this raises a good and somewhat obvious questiknowleon: Why don’t we do science like this all the time?

Part of the answer, strangely, is the very thing at the center of science: the paper. Once science’s main conduit, the paper has become its choke point.

It’s not just that the paper is slow, though that is a huge problem. A researcher who submits a paper to a traditional journal right now, for instance, won’t see the published piece for about a year. She must wait while the paper gets passed around among editors, then goes through rounds of peer review by experts in her field, who might and often do object not just to her methods or data but to her findings and interpretations. Finally, she must wait while it moves through an editing, layout, and publishing pipeline that itself might run anywhere from 2 to 12 weeks.

Yet the paper is not simply slow; it’s heavy. Even as increasingly data-rich science has outgrown the paper’s ability to deliver and describe all that science has to offer — its deep databases, its often elaborate methods — we’ve loaded it up needlessly with reputational weight and vital functions other than carrying data.

The paper is meant to be a conduit for the real content and currency of the science: the ideas, methods, data, and findings of the people who do science. But the tremendous publishing and commercial infrastructure built around the academic paper over the last half-century has concentrated so many functions and so much value in the journal that the paper itself, rather than the information in it, has become science’s main currency. It is the paper you must buy; the paper you must publish; the paper you must cite; the paper on which not just citations but tenure, reputation, status, and even school rankings are built.

~

To get an idea of the paper’s excess weight, go to Cambridge, England, and find Mark Patterson. Patterson is a scientific-publishing old hand gone rogue. He formerly worked at two of the biggest scientific publishing companies, Elsevier and Nature Publishing Group (NPG), each of which puts out scores of journals. A few years ago he moved to the staff at PLoS.* Patterson is now director of publishing there, and since he joined, PLoS has leveraged open-science principles to become one of the world’s biggest publishers of peer-reviewed science and the biggest single publisher of biomedical literature. Readers like it because they get free access to good science. Researchers like it because their work reaches more readers and colleagues. PLoS’s success is heartening open-science advocates greatly — and unsettling the traditional publishers.

To describe PLoS’s innovations, Patterson likes to talk about how PLoS’s most innovative journal, PLoS One, deals with four essential functions of science that are currently wrapped up in the scientific paper: registration, certification, dissemination, and preservation. The current publishing regime, he argues, locks up these functions too closely in the current, conventional version of the scientific paper — even though some of these functions can be met more efficiently by other means.

So what are these functions?

Registration is essentially a scientific claim of discovery — a marker crediting a particular researcher with an idea or finding. The current system registers these contributions via a paper’s submission date. Certification is essentially quality control: ensuring a paper is solid science. It is traditionally done via peer review. Dissemination means getting the stuff out there — publication and distribution, in printed journals or online. And preservation, or archiving, involves the maintenance of the papers and citations to create a breadcrumb trail other researchers can later follow back to an idea or finding.

“The current journal system does all four of those things,” says Patterson. “But it doesn’t necessarily do them all well. The trick is finding a system that gets each of these done most efficiently, sometimes by other means, instead of having them all held by the publisher.” He and others contend that science would gain both speed and rigor by “unbundling” some of these functions from the paper and doing them in new ways.

PLoS loosens things up mainly in distribution and quality control. All of its journals are open-access — that is, free to read. Instead of making every would-be reader either buy a journal subscription or pay a per-article price of $15 to $50, PLoS collects a fee from the researcher to publish — usually about $1400 or so — and then publishes the paper online and makes it free. The author fee is substantial, but it’s actually a small addition to the other costs of doing science, and performs the essential function of getting it out there. It’s Panizzi’s dream realized: every poor schoolchild — or at least every schoolchild with web access — can read PLoS. Researchers like this, and it works. A recent study showed that on average, papers and data published open-access receive more citations than did those behind paywalls.

PLoS’s rapid growth has shaken things up. Some journal groups, such as Elsevier, have responded by allowing authors to pay to have a paper open-access on publication. Yet commercial publishers that do this tend to retain certain rights that PLoS does not, and they’re less likely to release underlying data, metadata about the publications, or other data and rights. And the practice creates a weird and uncertain market: You can go to, say, Neuron, and find, in the same issue, one paper you can download for free and another that costs $30. The difference? The authors of the latter paper didn’t pay the open-access fee.

Meanwhile, PLoS’s biggest, most cross-disciplinary journal, PLoS One, streamlines quality control in a way that’s more complex and raises more ire. The traditional route, peer review, generally involves having two or three experts evaluate the entire paper — data, methods, findings, conclusions, significance. The publisher relays these peer critiques to the author, usually with requests for either changes or clarifications. If the author answers those to the publisher’s satisfaction, the paper gets approved.

PLoS One uses a similar process but — crucially — asks its reviewers to judge only on technical merits, and not on any assessment of the paper’s novelty, significance, or impact. “The idea,” says Patterson, “is to let the importance be determined later by how much the paper’s ideas and findings and conclusions are taken up by the community. We’re letting the scientific community at large determine a paper’s value and importance, rather than just a couple of reviewers.”

This makes many people at Patterson’s old workplaces uneasy. Gerry Altmann, editor of Cognition, an Elsevier journal, and an open-minded man, doubts this sort of post-publication filter can serve the purpose. “Peer review should be about ensuring that there’s a robust fit between findings and conclusions, and that a paper sits well within the context of a discipline,” Altmann told me. “These are insidious changes.”

Can the hivemind do quality control? Patterson answers by noting that any paper’s true value — its lasting contribution — is generally decided by the scientific community even under the current system. Yet he acknowledged that at present few scientists actually go online and make comments or otherwise review papers published there. We’re a long way from the vision of an active scientific community replacing peer review with a crowd-sourced rigor and fact-checking. The hivemind apparently has better things to do. Altmann thinks it’s starry-eyed to think that will change.

Others say researchers would engage these tasks if it was worth their while. They argue that you can make it worthwhile by giving researchers credit for a wider range of contributions to science, starting with post-publication peer review and evaluations.

This is the idea behind ORCID, a program that would give each researcher a unique, immutable digital identification, somewhat like a permanent url. That ID would serve like a deposit account: the researcher would accumulate reputational credit not just for papers published, but also for other contributions the scientific community deems valuable. Reviews of others’ work could thus generate deposits, as could public outreach, talks, putting data online, even blogging — anything that helps science but currently goes unrewarded. This would allow hiring, tenure, grant, and awards committees to weigh a broader set of contributions to science. ORCID holds particular promise because it has already lined up buy-in from publishing giants Nature and Thomsons Reuters (though it’s unclear what contributions various stakeholders will agree to credit).

What would such a system look like? One idea is being developed by a team led by Luca de Alfaro, of the University of California, Santa Cruz. Working with Google, the team hopes to develop broader-based reputational metrics that are built, writes de Alfaro in a recent essay in The Scientist, “on two pillars”: tenure, grant, and similar rewards for authors of papers and their reviewers alike; and — crucially — a content-driven way of gauging the merit of both papers and reviews. Authors would get credit for work of high value, as measured by citations, re-use of data, and discussion generated. Reviewers, meanwhile, would get credit based not just on output but on how well their reviews predicted a work’s future value.

“Thus two skills would be required of a successful reviewer,” writes de Alfaro: “the ability to produce reviews later deemed by the community to be accurate, and the ability to do so early, anticipating the consensus. This is the main factor that would drive well-respected people to act as talent scouts, and to review freshly published papers, rather than piling up on works by famous authors.” De Alfaro says much of the technology to weigh such variables already exists in algorithms used at Google and (for evaluations of reviewers) Amazon.

Such a system could readily be incorporated into a program like ORCID. It could also give researchers incentives and credits — points, essentially — for public outreach or for openly sharing underlying data and details about method, both after and even before publication, so that other researchers can more easily test or use the data and methods. In short, a more flexible credit system could generate more activity in almost any area of science simply by weighting it more heavily.

~

These many pressures and alternatives seem to be loosening the publisher’s grip. Last June, librarians at the University of California system balked when the Nature Publishing Group sent a contract renewal containing a 400 percent price hike on the scores of NPG journals the huge library system subscribes to. The increase would have jumped the cost to over $17,000 per journal. The librarians objected that it was ludicrous for universities to fund research and then pay to read it. They threatened to boycott NPG not just as subscribers but as contributors to the journals. NPG softened and worked out a deal.

Meanwhile, universities and researchers are rebelling in other ways. Some are starting open-access journals or opening up some they already publish. And PLoS continues to create new models, including fast-track journals for time-sensitive disciplines, such as those that cover the flu and other infectious diseases, to cut the traditional one-year publication cycle down to a day. Another outfit, LiquidPub, is launching what it calls “liquid journals,” in which “social computing and liquid knowledge will shape and navigate information waters.” Phillip Lord and Robert Stevens, of Newcastle University and the University of Manchester, have created KnowledgeBlog, a publishing framework based on blog technology. Even Shakespeare scholars are entering the open-science world: Last summer, the Shakespeare Quarterlyran an experiment in which it not only put its journal online but opened the job of peer review to the public, so that anyone who cared to register could comment, say, on the racial implications of playing Titus Andronicus as an “American Gangsta.”

And then there are those such as Newcastle University computer scientist Phillip Lord, mentioned earlier, who just publishes on WordPress. A blog may seem a sketchy way to publish science. Yet in a way it makes sense. Science, however rigorous, implicitly recognizes that every explanation is provisional; there’s no finished version. So what could be more fitting than to revamp science through a platform explicitly built to be revised, commented on, and updated?

~

Yet if a more open scientific publishing landscape may seem inevitable, it’s hardly clear how to get there. Talk of inevitability hasn’t much helped Jonathan Eisen get his father’s papers out in the open. He has struggled to find the right leverage point, or perhaps the right tool, to lift them onto a platform any more prominent than his own web page.

A few months ago, however, Jonathan ran into a tool that added some leverage — and just might chip away as well at the calcified matrix in which his father’s and others’ scientific work has been stuck.

It is the simplest of academic tools: a desktop reference manager called Mendeley. Yet it comes with an extra dimension: a website at which you can share papers you like, creating a metadata-rich index that can lead other users to your user profile and papers, and vice-versa. You load your bibliography up — all those papers on cognitive neuroscience, say, or dark energy, or, if you’re Jonathan Eisen, evolutionary biology and extremophile bacteria — and Mendeley’s algorithms link you up with papers you might have overlooked and the researchers who wrote, read, or collected them. Maybe, Jonathan wondered aloud on Twitter, he could create a posthumous profile for his dad and post his papers up there. Mendeley promptly told him he could. He did. Howard Eisen now has his own Mendeley page, with all 41 of his papers listed and 24 of them uploaded as PDFs. Now you, as well as anyone in the research community who takes a minute to sign up at Mendeley, can find and read them, add them to your libraries. Since Mendeley now has over 800,000 members and is growing at an accelerating rates, this puts Howard Eisen and his work if not in science’s mainstream, then in a sizeable and fast-growing tributary.

Jonathan also likes Mendeley because it seems to advance the larger open-science agenda. It’s a sort of friendly Trojan horse. You download a reference manager — a good one, and free — and suddenly have a tool that can help open science.

“Smart,” says Leslie Carr, director of the Web Science Training Center at the University of Southampton. “Most people who’ve tried to create software to drive open science have started off on the web and tried to encourage people to share. Mendeley starts where the researchers already are, with a tool researchers need, which is a desktop reference manager to manage their bibliography and organize their thoughts. Then the very act of looking for more papers leads them into an open science model based on sharing.”

Mendeley exists because its CEO and co-founder, Victor Henning, needed a tool to understand better the cross-disciplinary mountain of literature he’d compiled for his thesis at Bauhaus-University of Weimar. “I had this huge trove of papers from different disciplines,” Henning told me, “and I wanted to see where the connections and overlaps and gaps were.” But when he looked for software to do this, he found nothing he liked.

“This was 2004, 2005. Last.fm was happening. By then I was collaborating a lot, and we were talking about doing a couple different people’s data. Then we realized you could do a social version of this. Why not map a bunch of people’s data? That would give an even better picture of the ideas in play.” By this time, being a business student, he was thinking: startup. He also realized he didn’t like most of the reference managers on the market. So he thought: let’s roll that in too.

Thus emerged Mendeley. The name rose from its dual mission: Mendeleev was the Russian chemist who created the periodic table, which organized the known elements into a structure that suggests the properties of other elements still to be found. Mendel was the 19th-century monk and botanist who saw that crossing two packets of information could yield a third packet that derived but differed from the first two. Two nice models of how science works.

The focus on the paper came of pure necessity. The American bank robber Willie Sutton said he robbed banks because “that’s where the money is.” At least for now, the paper is where the data are. But the people at Mendeley know quite well that a) the paper will get unbundled and in many functions displaced and b) they’re now grasping a bundle with a bunch more stuff in it. But they’re most interested in the threads that run from paper to paper. They mean to charge not for the bundled information but for helping people find the connections between the bundles. The company offers a free version that accommodates smallish libraries. If your library runs bigger than 500 MB, you can pay a $10 a month to run the company’s algorithms and store a copy of your data and papers in the cloud. If you’re a company or a department or simply someone who wants to run some highly sophisticated or customized analyses on aggregated scientific data — and on the all-important hivemind indications of what’s newly hot — you can pay more, providing Mendeley another income stream. Mendeley also talks of striking a deal, maybe, with publishers to make papers available on a rough iTunes model: a buck a paper, perhaps, with algorithms running in the background to help you find papers you’d like but don’t know about.

Many feel this model holds a lot of potential not only to make papers available more freely (or cheaply) but to help unbundle and redistribute the functions now unnecessarily bundled with the paper. But can Mendeley do this? It seems to possess the vision and flexibility of mind. While Mendeley is necessarily focused on the PDF right now, for instance, there’s no reason it can’t adjust its databases and algorithms to index, share, and analyze the importance of contributions other than traditionally published papers; they can do new metrics. And the company’s advanced programming interface, or API, recently published, should allow outside developers to create modules and add-ons to track new metrics, including author identifiers such as ORCID.

The chassis, then, can accommodate changes under the hood. The trickier part may be getting the steering right — that is, creating a UI that offers a powerful and full-featured but easy and intuitive way to use both the traditional reference manager and the broader social, sharing, and analytical tools.

They’re still working on that. “Most of the people I talk to who’ve used this,” says Leslie Carr, “think that the desktop and the web sharing aren’t as well integrated as they could be, from a software perspective, and that the analytic tools aren’t as accessible or transparent as they should be.” I find the same thing myself: Many of the metrics and connections between papers aren’t accessible on the desktop, presumably because they require the server’s data and processing power, and finding them on the web interface feels vaguely opaque. Even when you find some relationships, you worry you’re missing something.

Yet the company seems both open and responsive. When users pressed last summer for more hivemind information and more fluid sharing, the company substantially upgraded the website’s social-sharing module. It was quick to produce iPhone and iPad versions of its software. In general it seems fairly nimble and eager to meet user needs.

On the other hand, a lot could stop them. They could run out of money; with over 40 employees, their burn rate is high, but then again, their funding angels seem both confident and deeply pocketed. They could get sued. They could fail to add features fast enough to satisfy demanding users. They could not quite create the magic that software needs to be transformative. In short, they’ll need what any gamechanger needs: a good concept, some serious programming and promotional chops, and luck.

Mendeley chief scientist Jason Hoyt thinks the real killer app in open science will not be software but … the researcher. He recently made the call in a blog post titled “Dear researcher, which side of history will you be on?”

For the past three centuries, he noted, technology has prevented us from fulfilling Panizzi’s dream of fast, free science. But the technology is there now, and so are the business models, as PLoS has shown. So what is the revolution waiting for.

It is waiting, wrote Hoyt, “for us, the researchers.”

We could choose to publish in only Open Access. We could choose to reward tenure for Open Data. We could choose to only reward publications or data that are proven to be reused and make either a marked economic or research impact. Instead, we choose to follow a model that promotes prestige as the primary objective. …

“The future, I suspect, will look upon our society and practice with regards to scientific knowledge-share as we similarly do now with the Dark Ages. Each time we hold back data or publish research that isn’t immediately open to all, we have chosen to be on the wrong side of history.

He has a point. It’s interesting, for instance, to imagine what would happen if researchers and university librarians got together and created a global version of the sort of revolt that the University of California librarians threatened. “You get all the librarians together on this,” says Cameron Neylon, a director at the UK’s Science and Technology Facilities Council and an academic editor at PLoS, “and this is pretty much over.” And Librarians at the Ramparts sure makes a nice image.

Jonathan Eisen, too, thinks that opening science will require the researchers to step up. But he suspects they won’t step up in number until reward systems offer some incentive more tangible than being on history’s good side. Only then will the upslope ease. In the meantime, Jonathan continues to push his father’s papers up that hill, and he waits to see how well Mendeley, among his other efforts, can help pull them up into the open. Jonathan tends to push hard in strong spurts around Father’s Day, make some progress, then set the load down a while before resuming.

“It’s one of those things that’s just going to take some time,” he says. “I didn’t think it would be quite so hard. But we’ll get there.”

~ ~ ~

Copyright 2011 David Dobbs. All rights reserved. You may excerpt short sections, as per fair use, as long as you link back to this article. For permission to reprint in whole, please drop me a line.

*Disclosure: I sometimes write for Nature Publishing Group and have friends both there and at PLoS.

NOTE: In the week or so after this published, Jonathan Eisen was inspired to substantially complete the job of assembling his father’s publications at Mendeley. See my short follow-up post here.

Corrections:

May 12, 2011: • Fixed some typos. • Changed pounds to dollars. • The original version made it sound as if all PLoS journals evaluated submitted papers based only on method, rather than method plus significance. The current version is corrected to state that only PLoS’s flagship journal, PLoS One, uses that streamlined method of peer review.

May 13, 2011: • Corrected i.d. of Jonathan Eisen’s position at PLoS, where is an academic editor-in-chief of PLoS Biology. • Prior version called PLoS One the “flagship journal” of PLoS. A couple people differed. Changed to note that it is PLoS’s most innovative and cross-disciplinary journal. • Clarified criteria by which PLoS One referees papers. • Corrected description of KnowledgeBlog, which was created by Phillip Lord and Robert Stevens, not Peter Murray-Rust (who told me about it).