The inside story of Aaron Swartz’s campaign to liberate court filings

And how his allies are trying to finish the job by tearing down a big paywall.

Tear down this wall

In a back-of-the-envelope calculation a few days before the offsite crawl was shut down, Swartz guessed he got around 25 percent of the documents in PACER. The New York Times similarly reported Swartz had downloaded "an estimated 20 percent of the entire database." Other media outlets have repeated the figure ever since. Unfortunately, neither is accurate. PACER has more than 500 million documents, so the 2.7 million documents Swartz downloaded accounts for less than one percent of the database.

Nevertheless, the Swartz corpus proved valuable. Malamud's privacy audit helped to publicize the need for more rigorous privacy protections in the e-filing system. When Schultze, Harlan Yu, and I began work on RECAP, we pre-loaded it with Swartz's documents so at least some cases would be pre-populated with documents. Swartz's documents also served as the basis for some of my own privacy research.

Swartz, Malamud, and Schultze always saw the PACER scraping project primarily as a way to pressure the judiciary to provide free public access to the full PACER database. Ever since 2008, Schultze has made PACER a major focus of his work, writing extensively about the case for tearing down PACER's paywall.

Schultze believes the courts are breaking the law by charging 10 cents a page for public documents. As then-senator Joe Lieberman (I-CT) pointed out in a 2009 letter, the 2002 E-Government Act, which authorizes PACER fees, permits them to be charged only "to the extent necessary" to cover the costs of providing the service. In its legislative report, the Senate committee behind the bill stated it "intends to encourage the Judicial Conference to move from a fee structure in which electronic docketing systems are supported primarily by user fees to a fee structure in which this information is freely available to the greatest extent possible."

Yet PACER fee collections appear to have dramatically outstripped the cost of running the PACER system. PACER users paid about $120 million in 2012, thanks in part to a 25 percent fee hike announced in 2011. But Schultze says the judiciary's own figures show running PACER only costs around $20 million. Schultze believes this massive disparity is inconsistent with the court's mandate to charge PACER fees only "to the extent necessary" to run the PACER system.

And even the $20 million figure may overstate the cost of running PACER. "We don't know what is included in these line items because the courts have never told us," Schultze said. "But the PACER system is run extremely inefficiently. It has individual servers in each district, individual staff for each district, and privately leased network connections."

Schultze believes costs could be slashed if the courts moved to a modern cloud-based hosting platform. Indeed, he notes, the General Services Administration (GSA) has already developed a streamlined process for government agencies to lease cloud computing resources.

The GSA has even granted some hosting providers "FISMA level 2 security certification," Schultze points out, which allows the Department of Homeland Security to use them for its applications. "If it's good enough for DHS, it's good enough for the courts," Schultze argued.

“The PACER system is run extremely inefficiently.”

Schultze believes the courts could shift their servers to the cloud with minimal technical changes. "They would just start up a new virtual machine for every court. Each court could continue to administer their own PACER instance. There's no complicated engineering required."

Schultze believes the judiciary's Amazon bill could be as little as $1 million per year, or less than one percent of what the courts are currently charging. Malamud is less optimistic, given the inherent inefficiencies of government bureaucracies. But he believes an efficient PACER system shouldn't cost more than $10 million.

Interestingly, the executive branch pays the courts millions of dollars every year in PACER fees. The Department of Justice alone pays the courts about $4 million per year for access to public court documents. Schultze believes the money Congress currently allocates for executive branch agencies to pay PACER fees would be sufficient to fund the entire PACER system. That would allow the judiciary to eliminate PACER fees to private users.

Open PACER

When we contacted the Administrative Office of the courts for comment, they stressed that "fully 95 percent of all PACER fees come from just five percent of all users. Court opinions are free, and 65 to 75 percent of active PACER users don't exceed $15 of use in a quarter, and therefore are not charged. In addition, academic researchers, pro bono lawyers, and indigent users can apply for exemptions."

But Schultze doesn't believe waivers address the problems with PACER's fee system. "Obtaining a waiver requires filing a separate request with each court, which may grant and revoke the waiver at its discretion," Schultze noted in an e-mail. "Many classes of individuals are not even eligible to apply, including the media."

As a practical matter, the major obstacle to opening PACER likely hinges on finances. The judiciary tells Ars that in addition to financing PACER itself, PACER fees go to pay for "electronic case filing and about a half-dozen other information technology categories" in what it calls its "public access program." In other words, PACER has become a cash cow for the judicial branch, generating $100 million in profits the court has plowed into non-PACER IT projects.

It's understandable the courts wouldn't want to give up that revenue in an era of austerity. But for Schultze, that revenue stream isn't a good enough reason to restrict public access to public documents. He drafted the Open PACER Act to mandate the paywall's elimination.

"My bill is one page," Schulze told us. "It does two things. First, it repeals the court's ability to charge for access to electronic public records. Second it mandates that they provide electronic public records to the public for free."

In recent weeks, Schultze made multiple trips to DC to lobby for the proposal. He hasn't found a sponsor yet, but he's optimistic he'll find one soon. "I've been talking to potential sponsors in both the House and the Senate," Schultze said. "There are many members of Congress that see government transparency as a high priority. I expect that those are the members that will sponsor the bill."

Several members of Congress stopped to pay their respects at a memorial service for Swartz held in DC on February 4. Among the speakers was Rep. Darrell Issa (R-CA), an influential Republican who has championed open government. Sen. Ron Wyden (D-OR), a reform-minded Democrat, also spoke. At times, the event took on the tone of a political rally.

So far, most of the legislative attention in the wake of Swartz's death has focused on "Aaron's Law" to reform the Computer Fraud and Abuse Act. But Schultze believes tearing down the PACER paywall should also be a priority. After all, public access to information was a central theme of Swartz's life. Opening PACER would be another fitting tribute to his memory.

Correction: This article originally stated that the Government Accountability Office had developed a streamlined process for cloud computing. In fact, the process was developed by the General Services Administration.

Promoted Comments

I am an attorney involved in civil litigation and I use PACER on a regular basis. For those who don't know, this is how it works. If you are involved in the litigation, the first time you open a pleading, you are not charged for accessing the pleading. And you can download it in PDF form for free whenever you access it, so if you download it the first time, you can re-read it for free whenever you want. Every other time you access it via PACER though, you are charged for accessing it. This isn't like iTunes where you buy a song once and it's yours. If you are not a litigant in a particular case, you are charged for accessing each pleading the first time you access it and every subsequent time as well.

[edit] I see some people have mentioned copyrighted works. Please note that PACER fees do not go to the author of the document you're downloading, even in part. They are access fees, not copyright license fees. [/edit]

For a midsize firm like mine, these charges (which range from a few cents to a few dollars per document, depending on its length) are just part of overhead. But for a pro se litigant or an impoverished private citizen interested in following litigation that may indirectly affect him or her (despite not being a party to that litigation), these charges may, in my opinion, form a barrier to access to justice. I think a better solution would be to switch the revenue generated from *accessing* documents to revenue generated by *filing* documents. Litigants proceeding "in forma pauperis" would still be exempt from these fees, just like they're exempt from other filing fees. Other litigants could pay them. And this way, concerned citizens could follow court proceedings online without having to pay to do so.

I'm curious about this claim of yours: "The documents in PACER—motions, legal briefs, scheduling orders, and the like—are public records. Most of these documents are free of copyright restrictions, yet the courts charge hefty fees for access."

Good question. There's very little caselaw on this question, but it's generally assumed that court filings can be freely redistributed even if they were authored by private parties.

The right of public access to court proceedings is partly derived from the Constitution and partly from court tradition. By conducting their judicial work in public view, judges enhance public confidence in the courts, and they allow citizens to learn first-hand how our judicial system works.

Schultze believes the courts could shift their servers to the cloud with minimal technical changes. "They would just start up a new virtual machine for every court. Each court could continue to administer their own PACER instance. There's no complicated engineering required."

Speaking as someone overhauling infrastructure to run on the AWS cloud, that would be funny if he weren't apparently serious. Shit is *not* easy. Even after a few months of careful effort and controlled migration, my site is still tightly bound to running in one region of the AWS cloud, with extensive manual setup of IAM (because the ARNs themselves are often region-specific).

"No complicated engineering"? Ha.

Actually, in this case, shit *is* easy. Let me break it down for you:

- Each PACER instance is a Perl CGI webapp, with a SQL backend- Every request gets proxied through a server sitting in DC, which does cookie-based auth- I don't know if they are doing any significant load balancing (I'm guessing not, given that they complained that Aaron was making "one request every three seconds.")

I don't know what kind of crazy crap you're doing with IAM (although I have used it myself for projects and found it fairly straightforward), but that's not necessary here. Nor are there any complications related to ARNs. Static IP, SSH, done. If they want to do some load balancing, there's an EC2 wizard for that.

Of course, the really easy approach for the courts would be to push all of the data to static S3 buckets. openpacer.org runs off of S3 with via static files, with the lovely S3 web hosting functionality. I pointed the DNS to the right place, checked a box on S3, done.

...but feel free to take a look at how PACER actually works and tell me why I'm wrong.

Timothy B. Lee / Timothy covers tech policy for Ars, with a particular focus on patent and copyright law, privacy, free speech, and open government. His writing has appeared in Slate, Reason, Wired, and the New York Times.