Of all the questions we’ve received, probably the most common is whether
it will be possible to access the documents in our archive without using
PACER at all. The answer is yes, but at the moment we don’t offer any
good browsing or searching tools.

The big reason has to do with privacy. One of our top priorities in
developing RECAP was making sure we don’t inadvertently compromise the
privacy of individuals who are the subject of court records. A lot of
sensitive personal information is revealed in the course of federal
court cases. A variety of private parties might be interested in using
the information contained in these records for illicit purposes such as
identity theft, stalking, and witness intimidation. We wanted to make
sure we weren’t inadvertently facilitating those types of activities.

In theory, the courts have redaction rules designed to deal with these
problems. Judges can order particularly sensitive documents to be
sealed, and the rest of the documents are supposed to be redacted to
prevent inadvertent disclosure of private information. Unfortunately,
this process is far from perfect. Private information does sometimes
wind up in the public version of court documents.

When court records were kept entirely on paper, the problem was
mitigated by a kind of “security by obscurity”: documents might have
officially been public, but accessing them was expensive and cumbersome,
so in practice they were rarely accessed by the “bad guys.” PACER
represented a dramatic reduction of the costs of accessing court
documents. This facilitated many beneficial uses of these documents, but
it also made some illegitimate uses easier. As we move toward a free
public access model, both the benefits and the challenges will grow.

It might be argued that this is an argument against making the documents
free at all, but that is not our view. Remember that private data
brokers are already harvesting PACER documents and building full-text
search engines; if you’re willing to spend some money, you can already
get whatever privacy-compromising information is in PACER. So better
privacy protections are not a luxury; we need them whether or not we
move to an open access regime. And we think open access comes with an
important advantage: it opens the door to experiments in crowdsourced
privacy auditing.

To minimize the risk that we would inadvertently compromise people’s
privacy, we deliberately set modest goals for the initial version of
RECAP. RECAP is built around the existing PACER interface, and is
designed to be used by existing PACER users. We asked the Internet
Archive to disable search engine indexing so that it wouldn’t be too
easy to find whatever private information is available. We recognize
that this leaves a lot of room for improvement, but we think it was
necessary to protect privacy in the short run.

At the moment, there’s no officially-supported mechanism for browsing
RECAP repository, but you can directly link to individual documents and
dockets. To see all the files available for a case, just strip the
filename from the end of any document
URL
for that case, giving you a URL like
http://www.archive.org/download/gov.uscourts.dcd.118919/ (dcd is
PACER’s code for the DC District, 118919 is PACER’s number for the
case). One of the available files will be a
docket.html
file. There will also be a
docket.xml
file, which might be more useful for automated parsing—stay tuned for
details about our XML format.

Obviously this is clumsy, and improving it is on our to-do list, but
we’re a small team and it may be a while before we have time to do them.
We’d love to hear from third parties interested in building better
interfaces to our repository. As some of us have written
before, one of the great advantages
of open access is the fact that there can be more than one interface to
the same data. If you’d like to take a crack at building a user-friendly
but privacy-preserving interface to the repository, please get in touch.