www.elsblog.org - Bringing Data and Methods to Our Legal Madness

14 November 2006

Administrative Office data

I recently taught a seminar in which students looked at some aspect of federal civil trial trends, and wrote an empirical paper. In the course of preparing for that seminar, I put together a list of both raw and processed data sources on outcomes in state and federal courts. That list is here. The most important such source is the filing and terminations data collected by district court clerks' offices and then sent to the Administrative Office of the U.S. Courts. The AO publishes an annual report of summary statistics, available (since 1997) here, and issues the raw data, year by year, via the ICPSR. (The study numbers are: 1970-2000: Study 8429; 2001: Study 3415; 2002: Study 4059; 2003: Study 4026; 2004: Study 4348.)

The AO's data cover only very basic facts about a case: its filing date; its docket number and court; the "nature of suit" according to the clerks' office transcription of the "civil cover sheet" filled out, usually, by the plaintiff's lawyer; at what procedural point the case ended (prior to trial, during trial, after trial, etc.); the nature of the judgment (settlement, dismissal, etc.); who won (if anyone did); the amount of the judgment.

The observations in the raw data files are for each case "termination" in the relevant statistical year. (In addition, there's a raw data file for any cases still pending in the district courts, with information as of the time of filing.) So for many research questions, the first step has to be to combine all the different years of data, in order to look at time trends, or in order to use date of filing, rather than date of termination. Ted Eisenberg and Kevin Clermont have done us all a public service by doing that combination on-line and making custom summary statistics available here. In cooperation with Eisenberg and Clermont, the new Center for Empirical Research in the Law, run by my colleague Andrew Martin, will soon be posting a fancier version of the same data. In the meantime, however, anyone interested in working with the raw data (in Stata format) can download it all together, from this page of my site (see I.A.2).

Lots of interesting work that can be done with the AO data--but the
limits are quite extreme. First, the data are quite difficult to work
with because the variables are never exactly what you'd want (there's
no coding for summary judgment) or are pretty noisy (what is
"dismissed: other" and why is is used so much more often in some
districts than in others?). In addition, the AO has stripped out the
names of the judges, as well as the case captions, which makes the data
harder to use. More generally, as in so many public datasets collected
by a bureaucracy for its own reasons, each variable is a little odd,
and requires a bit of investigation prior to use. Some variables are
highly reliable (who won seems to be in this catagory, for example);
others less so. The data can be audited pretty effectively, since
1993, using the federal court's electronic docketing sytem, PACER. I did a topic-specific such audit in an article about inmate litigation, in 2003; Ted Eisenberg and I wrote a more systematic paper on the results of auditing that same year; and Gillian Hadfield has written some others.

Two things would make the dataset vastly more useful. The first is
if the judges were added back in. This has been extremely difficult
previously, and had to be done one case at a time, using PACER. But
Christy Boyd, a political science student here at Wash. U., has come up
with a vastly better method. The second is if the data collected
lined up a bit better with the kinds of questions researchers actually
cared about; if, for example, there were variables within the nature of
suit "civil rights: employment" for whether the case concerned race,
sex, disability etc. In order to make commenting easier, I'm going to
start two posts immediately after this one. One will describe
Christy's method, briefly, and link to her fuller description of it.
The other will ask for your thoughts on how the AO data might be
improved; I think it would really be great for a group of scholars who
use these data to try to meet with folks there to talk about ways to
make the data more useful. If you have any ideas on this, I urge you
to post them.