The recent attention given to government information on the Internet, while laudable in itself, has been largely confined to the Executive Branch. While there is a technocratic appeal to cramming the entire federal bureaucracy into one vast spreadsheet with a wave of the president’s Blackberry, one cannot help but feel that this recent push for transparency has ignored government’s central function, to pass and enforce laws.

Whether seen from the legislative or judicial point of view, law is a very prose-centric domain. This is a source of frustration to the mathematicians and computer scientists who hope analyze it. For example, while the United States Code presents a neat hierarchy at first glance, closer inspection reveals a sprawling narrative, full of quirks and inconsistencies. Even our Constitution, admired worldwide for its brevity and simplicity, has been tortured with centuries of hair-splitting over every word.

Nowhere is this more apparent than in judicial opinions. Unlike most government employees, who must adhere to rigid style manuals; or the general public, who interact with their government almost exclusively through forms; judges are free to write almost anything. They may quote Charles Dickens, or cite Shakespeare. A judicial opinion is one part newspaper report, one part rhetorical argument, and one part short story. Analyzing it mathematically is like trying to understand a painting by measuring how much of each color the artist used. Law students spend three years learning, principally, how to tease meaning out of form, fact out of fiction.

Engineers such as myself cannot tolerate ambiguity, so we feel a natural desire to bring order out of this chaos. The approach du jour may be top-down (taxonomy, classification) or bottom-up (tagging, clustering) but the impulse is the same: we want to tidy up the law. If code is law, as Larry Lessig famously declared, why not transform law into code?

This transformation would certainly have advantages (beyond putting law firms out of business). Imagine the economic value of knowing, with mathematical certainty, exactly what the law is. If organizations could calculate legal risk as efficiently as they can now calculate financial risk (recession notwithstanding), millions of dollars in legal fees could be rerouted toward economic growth. All those bright liberal arts graduates who suffer through law school, only to land in dismal careers, could apply themselves to more useful and rewarding occupations.

The second answer speaks to the goal of information management, and the forms in which law is conveyed. The indexing of the World Wide Web succeeded for two reasons, form and scale. Form, in the case of the Web, means hypertext and universal identifiers. Together, they create a network of relationships among documents, a network which, critically, can be navigated by a computer without human aid. This fact, when realized at the scale of billions of pages containing trillions of hyperlinks, allows a computer to derive useful patterns from a seemingly chaotic mass of information.

Law suffers from inadequacies of both form and scale. For example, all federal case law, taken together, would comprise just a few million pages, only a fraction of which are currently available in free, electronic form. In spite of the ubiquity of technology in the nation’s courts and legislatures, the dissemination of law itself, both statutory and common, remains a paper-centric, labor-intensive enterprise. The standard legal citation system is derived from the physical layout of text in bound volumes from a single publisher. Most courts now routinely publish their decisions on the Web, but almost exclusively in PDF form, essentially a photograph of a paper document, with all semantic information (such as paragraph breaks) lost. One almosts suspects a conspiracy to keep legal information out of the hands of any entity that lacks the vast human resources needed to reformat, catalog, and cross-index all this paper — in essence, to transform it into hypertext. It’s not such a far-fetched notion; if law were universally available in hypertext form, Google could put Wexis out of business in a week.

But the legal establishment need not be quite so clannish with regard to Silicon Valley. For every intellectual predicting law’s imminant sublimation into the Great Global Computer, there are a hundred more keen to develop useful tools for legal professionals. The application is obvious; lawyers are drowning in information. Not only are dozens of court decisions published every day, but given the speed of modern communications, discovery for a single trial may turn up hundreds of thousands of documents. Computers are superb tools for organizing and visualizing information, and we have barely scratched the surface of what we can do in this area. Law is created as text, but who ever said we have to read it that way? Imagine, for example, animating a section of the U.S. Code to show how it changes over time, or “walking” through a 3-d map of legal doctrines as they split and merge.

Of course, all this is dependent on programmers and designers who have the time, energy, and financial support to create these tools. But it is equally dependent on the legal establishment — courts, legislatures, and attorneys — adopting information-management practices that enable this kind of analysis in the first place. Any such system has three essential parts:

Machine-readable documents, e.g., hypertext

Global identifiers, e.g., URIs

Free and universal access

These requirements are not technically difficult to understand, nor arduous to implement. Even a child can do it, but the establishment’s (well-meaning) attempts have failed both technically and commercially. In the mean time, clever engineers, who might tackle more interesting problems, are preoccupied with issues of access, identification, and proofreading. (I have participated in long, unfruitful discussions about reverse-engineering page numbers. Page numbers!) With the extremely limited legal corpora available in hypertext form — at present, only the U.S. Code, Supreme Court opinions, and a subset of Circuit Court opinions — we lack sufficient data for truly innovative research and applications.

This is really what we mean when we talk about “tidying” the law. We are not asking judges and lawyers to abandon their jobs to some vast, Orwellian legal calculator, but merely to work with engineers to make their profession more amenable to computerized assistance. Until that day of reconciliation, we will continue our efforts, however modest, to make the law more accessible and more comprehensible. Perhaps, along the way, we can make it just a bit tidier.

Stuart Sierra is the technical guy behind AltLaw. He says of himself, ” I live in New York City. I have a degree in theatre from NYU/Tisch, and I’m a master’s student in computer science. I work for the Program on Law & Technology at Columbia Law School, where I spend my day hacking on AltLaw, a free legal research site. I’m interested in the intersection of computers and human experience, particularly artificial intelligence, the web, and user interfaces.”

Excellent blog post. Encoding the law, legal contracts, and patents is something that’s been on my mind for the past 10 years while working on transforming documents into semi-structured databases at Parity Computing (http://www.paritycomputing.com). Looking forward to the day we can create a much more efficient and consistent legal system based on modern tools. I’ll be interested to hear more about progress in this area and places I might be able to contribute!

You are certainly right on that the first step is freeing the case law. A you say, creating a comprehensive and free repository of universally identifiable text-based documents is not a difficult technical problem. The challenge and fun would be to see what could be done with that data beyond Google-like indexing.