The Internet Made Information Free: Now It Has Come For Academic Research

A new study out earlier this month suggests that the world’s largest “pirate” archive of academic literature, Sci-Hub, may hold as much as 68.9% of the 81.6 million scholarly publications captured in Crossref’s DOI database. Put another way, more than two-thirds of the world’s major contemporary research output is now available completely for free, despite a vast fraction of it being ordinarily paywalled away from public (and even scholarly) access. What does this latest glimpse into the world of copyright tell us about the future of academic publishing and open access?

Perhaps the most remarkable aspect of this latest study is the picture it paints of just how far Sci-Hub has come in its mission to make the world’s academic literature freely available to all. That nearly two-thirds of modern scholarly output can be accessed for free in a single place through a centralized uniform interface is truly transformative. Even at the best resourced academic libraries whose online subscriptions span every imaginable journal, there will always be countless obscure and specialty journals that aren’t available and require an email to a librarian requesting a one-time article or issue purchase and sometimes a multiple day turnaround before the PDF is in hand. For those journals an institution has subscriptions for, obtaining the actual PDF of an article of interest can all too easily become an epic saga of searching across multiple database interfaces and wading through complex cumbersome user interfaces that haven’t been modernized since the terminal era, only to find that the article is embargoed until next month or just the text only version is available without figures.

Sci-Hub’s novelty is thus a combination of both making the world’s copyrighted paywalled academic literature available for free and a single centralized search interface that makes all of it available through one single search box.

To those outside the academic world, it might seem strange that there is so much interest in a website that just publishes pirated copies of academic papers. After all, while it might make sense that there is a huge demand for pirated movies and music, how much interest is there really for arcane technical write-ups and couldn’t libraries just buy access to all of the journals they need?

It may surprise many to learn that the modern norm of paying six figure subscription fees for a single collection wasn’t always the case. The Guardian published a fantastic look back last month at just how academic publishing evolved from scientists and scholarly societies sharing their knowledge into the for-profit commercial enterprise of today.

Take the example of a faculty member at a public university whose salary is paid by the taxpayers and whose research grant comes from a taxpayer-supported federal funding agency. In many disciplines, most of the scholarly knowledge that comes from that taxpayer-funded research will be given for free to journals operated by commercial for-profit enterprises who will publish it behind paywalls and charge the scientific community and the very taxpayers who funded it in the first place, for access. The result is that taxpayers pay both sides of the equation and there is a highly uneven landscape in which researchers at smaller poorly funded institutions have far less access to the scholarly landscape than do those at well-funded libraries, while the general public has no access at all to the world’s scholarly knowledge.

The Internet has fundamentally rewritten consumer expectations around access to information, from news articles to music, encyclopedias to movies. Today we expect everything to be free and instantly accessible anywhere from our phones. Academic literature has to date been one of the few bastions that has escaped this transformation, due in no small part to the fact that institutional subscriptions have largely hidden the costs from researchers who remain blissfully unaware that their library pays $100,000 a year for a set of journals they rarely use.

Many journals permit authors to post preprints of their work online, but even here things can become contentious. Last month the APA was forced to issue a statement that it was “refocusing” its counter-piracy efforts after it began sending out formal takedown notices to academic websites hosting preprints of accepted manuscripts from APA journals. In this case, APA contracted with a third-party vendor to identify copies of papers from APA journals that it deemed out of compliance with its preprint rules and send them formal demand to remove the content for copyright violation. What raised concern, however, is that beyond adding a brief informational page to its website, APA did not contact its authors reminding them of what they could and could not share as preprints and to notify them that it would be enforcing its preprint regulations and issuing legal takedown notices, with the result being that authors in a number of cases were forwarded the notices. An APA spokesperson noted that since its journals had published papers by several thousand authors over the pilot time span and that only some of them had published preprints out of compliance, it did not send out notices to all of its authors since its efforts were focused on pirate sites.

Beyond such takedown notices and full-blown lawsuits, it is unclear what options the publishing industry has to try and slow the rise of sites like Sci-Hub. One possibility would be electronic watermarking of PDF files and stenographic marking of figures and other visual elements to record which university and user account were used to download a given paper, using criminal or civil prosecution against individual downloaders who provide the copies to Sci-Hub. However, it is likely such approaches would be met with similar technical countermeasures by the community. Indeed, the authors note that Google Trends data shows that each major challenge mounted by publishers against Sci-Hub only increases its visibility and popularity.

What are the alternatives to the paywalled world of commercial academic publishing? Preprint servers, where authors submit copies of their papers for free download prior to publication have been growing in interest outside of their traditional disciplines. One of the benefits of preprint servers is the potential for refocusing the publication review process on scientific accuracy rather than “impact” and allowing research to be published that is scientifically sound, but which may run against current orthodoxy in the field.

I myself ran into this issue fairly early in my academic career when a paper was rejected from a top journal with a one-line review that offered only that it was the reviewer’s belief that data mining was gaining too much traction in the field and thus on principle they would not permit any paper to be published that utilized computerized or quantitative techniques. Despite noting to the editor that the review actually had nothing to do with the paper at all and that the other two reviewers had recommended publication based on the merits of the paper, the editor rejected the paper saying the dissenting reviewer was an extremely senior luminary in the field. Shortly thereafter another journal rejected a different paper with one of the reviewers arguing that data mining was merely a passing fad and all fields of study would shortly abandon using computers and return back to human analysis. Another paper was rejected with the argument that there were no computer algorithms in existence that could extract person names from a collection of textual documents and thus rejected the paper for using a non-existent methodology.

Granted that each of these cases involved journals in disciplines without a long history of computational approaches, but they reflect the momentum against change that all too often faces the introduction of new methods or datasets to established fields, where peer review can act less as a scientific review and more as a philosophical gatekeeper.

In some of the preprint models being discussed, rather than immediately submitting their studies for blind peer review at a journal, authors would instead upload their draft papers for public access to a major preprint server, potentially along with the datasets and tools used. The community at large would then review and discuss the paper in open forums, with all commentary public and associated with their real names. Scholars from other fields and even members of the general public would also be able to weigh in, offering guidance such as raising ethics issues that may be unfamiliar to the field.

Successful papers might then be submitted to traditional journals with the preprint copy ensuring permanent open access or, under some models, journal publication would be eschewed all together and submitting to a preprint server would count as publication. Of course, minimizing the proliferation of scientifically unsound or fabricated works would require additional diligence under such a model and there would likely be uneven peer review, but it would at least bring the review process into the open and ensure that all papers are open access.

Putting this all together, it seems the world wants academic literature to join the long list of things the Internet has made free. Sci-Hub’s meteoric growth and the fact that in spite of immense legal pressure it has still managed to amass more than two-thirds of the contemporary major scholarly output of the academic enterprise suggests that commercial publishers have reached a tipping point in their losing battle against the open access movement. In the Internet era information will be free, the only question remaining is who pays for that freedom.

Based in Washington, DC, I founded my first internet startup the year after the Mosaic web browser debuted, while still in eighth grade, and have spent the last 20 years working to reimagine how we use data to understand the world around us at scales and in ways never before...