Ginsburg has participated in hundreds of oral arguments that we have in our system.

Clicking on the button at the bottom takes you back to our database of oral argument recordings where you can further refine your search. If the judge is active, there is an icon in the upper right that lets you subscribe to a podcast of the cases heard by that judge. At this time, these features are only available for the Supreme Court and for jurisdictions where the judges for specific cases are provided by the court website. We hope to expand this in the future.

To our knowledge, a linkage like this has never previously existed on any system, and we hope that it will make research and exploration faster and easier for our users …

If you are a user of iTunes, you can easily subscribe to our podcasts by opening iTunes and searching for “Free Law Project” or “oral arguments.” Once you subscribe, the podcasts will download to iTunes wherever you use it.

These podcasts contain all of the oral argument audio for a given court or for a search that you create. This means that as of this moment, you can pipe the audio from the Supreme Court and Federal Circuit Courts directly to your pocket with almost no effort.

To learn more about creating custom podcasts or about the podcasts that we already have, we’ve created a page on our site with all the details. It also has information about how to subscribe using Google Music, Stitcher Radio, and other apps.

We hope you’ll enjoy these podcasts. Who doesn’t want the Supreme Court in their pocket?

After months of development, we are thrilled to share a from-scratch re-launch of the RECAP Archive. Our new archive, available immediately at https://www.courtlistener.com/recap/, contains all of the content currently in RECAP and makes it all fully searchable for the first time. At launch, the collection contains information about more than ten million PACER documents, including the extracted text from more than seven million pages of scanned documents.

The search capabilities of this new system empower researchers in new ways. For example:

It is now straightforward to search for certain types of documents within our archive of PACER documents. This makes it easy to find examples of motions to dismiss, summary judgements, or any other type of document.

By now most readers of this blog know that PACER brings in a lot of money by selling public domain documents at a dime per page. What people might not realize is how these costs can add up for individual researchers or journalists. Looking through our database, we realized that we have quite a few really big cases.

All of the cases below have more than ten thousand entries that we know about.1 There are some names you might recognize:

PACER is the system that the public and various organizations use to access electronic records in the federal district and appeals courts. When PACER is used, it charges for certain activities, like downloading a PDF or making a search query. Raising funds this way was authorized by congress in the E-Government Act to the extent that the revenue paid for running PACER.

In the beginning the revenue from these charges was fairly modest, but the revenue has risen for many years, culminating in revenue of $145M in 2015 (the last year that’s available).

This chart shows the trends in PACER revenue since 1995:

In total, that’s $1.2B that PACER has brought in over 21 years, with an average revenue of $60.7M per year. The average for the last five years is more than twice that —- $135.2M/year.1 These are remarkable numbers and they point to one of two conclusions. Either PACER is creating a surplus —- which is illegal according to the E-Government Act —- or PACER is costing $135M/year to run.

Whichever the case, it’s clear that something has gone terribly wrong. If the justice system is turning a profit selling public domain …

As most readers of this blog know, PACER is a system run by the Administrative Office of the Courts (AO) that hosts over a billion documents from the Federal District and Circuit courts. The system was created in the nineties and was set up with a paywall so that you pay for every “page” of data that you receive. The idea of the fees, as established by the E-Government Act, is that the AO could use them to recoup the cost of running the PACER, but the pricing of the content has always been a bit odd. In my last post I talked about how these fees result in an outrageous cost for PACER data. In this post, I do a deep dive into the core unit of PACER’s pricing and attempt to answer the question, what is a “page” of PACER data?

The size of PACER’s fees has varied over the years, but they’ve always gone up, and they’ve always been assessed roughly as follows:

If you download a PDF from PACER, you pay by the page.

If you do a search, you pay by the number of search results returned. Because you don’t …

Recently, we started a new project to analyze a few million PACER documents that we acquired through the RECAP Project. As we began working with the data, one thing we did was count how many pages every document had so that we could calculate the average length of a PDF in PACER. Fairly quickly we learned that based on our sample, the average length of a PACER document is 9.1 pages.1

Based on a sample of about 2M PDFs, the average length of a PACER document is 9.1 pages. The max (so far) is 4,417.

With these two statistics and the knowledge that downloading a document costs ten cents per page, we can once again see how PACER—-the biggest paywall the world has ever known—-is a deeply troubling system. At this price, purchasing the …

Iain Carmichael and Michael Kim recently gave a presentation at the PyData Carolinas conference on the topic of Networks and the Law. For their talk they analyzed data from CourtListener, applying a variety of network algorithms to identify important or influential cases.

Abstract

What does network science have to say about the law? Can we determine which are the most the most influential cases in our legal system? Can we understand how legal doctrine evolves? Using tools from network statistics and data provided by Court Listener (an open legal data project), we analyze the network of law case citations.

Citation networks have recently been a topic of interest to network scientists. Court Listener, an open data initiative, provides the network of law case citations as well as the text of (almost every) court case in the US. This network data set provides a rich array of questions that are of interest to legal scholars as well as network scientists.

Can we determine which cases are the most influential in our legal system? Can we understand how legal doctrine evolves? We will discuss what we learned about how …

We’re getting ready to launch a brand new search engine for PACER content. When it launches, one of the big features it will have is full-text search for the millions of documents that people have submitted using our RECAP system. To our knowledge, this will be the first free system for searching PACER content in this way, allowing you to look up documents by any word they might contain.

The big problem with this goal? We have about a million PDFs that consist only of images. Some of these are actually quite beautiful:

A beautiful handwritten motion. It goes on like this for 46 pages.

But others are hideous:

An 84 page log from 1957. It’s come a long ways just to appear on this blog today.

But no matter how a document looks, we want to extract the text so that we can make it searchable. This is done using a system called Optical Character Recognition (OCR), which looks at each pixel in each page of each document and tries to figure out what letter it is a part of. As you might expect, this can take a while when you’re processing millions of documents averaging …

We are happy to share that as of today, oral argument recordings from the Second Circuit Court of Appeals are finally available on CourtListener.com. This means that you can search these recordings, create email alerts for them, listen to them on our site, and even include them in custom podcasts. Of course, we also provide enhanced versions of these recordings for download, and for developers or researchers they’re also available as bulk data or via our APIs.

Before today, we were unable to provide these features for the Second Circuit because they didn’t post their oral argument recordings on their website, so we’re thrilled that they’ve begun doing so. At this point, only the Tenth and Eleventh Circuits do not post their oral argument recordings, but we are hopeful that they will follow the lead of the other circuits and begin doing so soon.

We’re excited to share that as of today, we have added the latest data from the Supreme Court Database (SCDB) into CourtListener. This update adds SCDBID’s, parallel citations, vote counts, and decision direction data to about 20,000 Supreme Court cases. Each of these enhancements enables some great functionality.

Here’s a taste, showing Katz v. U.S. plotted to Olmstead. In this graph you can see that over time the vote went from a divided conservative vote in 1928 to a divided liberal vote in 1967:

The other big enhancement that we’re excited about is that we were able to add about 60,000 parallel citations to the cases we have in CourtListener. This enables our citation parser to find these old citations and …

The final enhancement we’re excited about is a layer of polish across the entire site. This cleans up some old issues, adds explanations to areas that were somewhat unclear before, and makes the site more accessible to people with certain physical disabilities.

We’re talking a lot about RECAP lately and we’ve realized that it’s a good time to retire the recapthelaw.org website. Free Law Project took over RECAP back in May of 2014 and since then there have been two places where we wrote blog posts, two places where you could get information about PACER and RECAP, and two places we had to maintain on a day to day basis. By winding down this site, we’ll be able to focus more clearly on the task at hand — liberating documents from PACER.

As of now, all the old content has been moved to this site, and the new home for RECAP is https://free.law/recap/. You can go check it out —- we spent some time on it, and it should be a great homepage for the project.

If you have any thoughts or notice anything broken please let us know. We’ll have more announcements about RECAP soon.

We’re proud to share that as of today we’ve added campaign finance data to our database of judges. This update links judges in the CourtListener system to their fundraising profiles in the FollowTheMoney.org database, allowing researchers and members of the public a new way to understand judges elected in State Supreme Court jurisdictions. This work was made possible by a prototype grant from the John S. and James L. Knight Foundation.

Using this system, you can easily see the sources of money that a judge received as part of an election, and you can put it side by side with all of the data that we have already gathered about that judge, such as the decisions they’ve written, the positions they’ve held professionally and in the judiciary, and their biographical information.

For example, on the page for Judge Tom Parker, there is a new section that looks like this:

Tom Parker has raised approximately $2.1M dollars.

To our knowledge, it has never previously been possible to research the decisions written by a judge side by side with the money they’ve received. We invite researchers and journalists to use this information to uncover interesting …

When we launched our judicial database, we shared our plan to show the cases written by each judge. As of today, we’re pleased to share that we’ve launched the first iteration of that endeavor. If you pull up any judge, say, Sonia Sotomayor, you’ll see a new section at the bottom that looks like this:

This listing provides the five most important opinions by the judge, and you can click the button at the bottom to see all of the cases they wrote or participated in. Clicking the button takes you to our search results, where you can slice and dice the data, choosing, for example, to see only their opinions from the Second Circuit, or their Supreme Court Cases.

In the search results and in the list on the judge profile page, the opinions are ordered by relevance, using our CiteGeist relevance engine. This highlights the cases that have been cited the most frequently by the most important cases.

Finally, you can now get an RSS feed for any active judge in our system, enabling you to keep up with anything they write. To do so, click the RSS icon (), and configure it with your RSS …

This brings the number of jurisdictions to more than 400, and it is getting hard to tell which courts are important. For example, some of the courts above have been terminated, and the King’s Bench is actually an English jurisdiction—-or was, until it was terminated in 1873.

To address the confusion that is caused by so many jurisdictions (a good problem to have), we’ve identified the …

That’s right, years after the other circuits put their oral arguments online, the Second Circuit has decided to join the party. According to the Court’s announcement (presently on the homepage; will eventually be in their archive):

At its quarterly meeting on May 23, 2016, the judges of the United States Court of Appeals for the Second Circuit approved the posting of audio recordings of oral arguments to the Court’s website, commencing August 15, 2016, the first day of the 2016 Term.

A few months ago, we calculated that this content would cost $300,000 to purchase, so this is great news for historians, scholars, legal practitioners, and everybody in between.

To make this change, the Court has proposed a change to its local rules, and there is a 30 day period ending July 15th for the public to make comments on the change. The change the Court has proposed is quite minor, simply stating that the website should now have …

You said you liked listening to oral argument recordings, and we heard you. Back in 2014, we began collecting oral argument recordings, and we’re happy to share that as of today we have more than 365 days of continuous oral argument listening — a full year. You can sit down today, start listening to oral arguments, and 365 days later, you’ll have finished listening to what we currently have. (Of course, by then, we’ll have thousands more minutes to listen to!)

Lots of people like binge watching TV shows. So, for comparison, this much oral argument audio is similar to watching: