Ongoing work to “de-Virginia-ize” the codebase, necessary because the software was developed originally just for the Code of Virginia. This includes full abstraction of the structure storage system, moving away from Virginia-specific nomenclature (“title,” “chapter,” “code,” etc.),

Significant optimizations, including the dynamic creation of a SQL view to access structural data and moving to using section IDs instead of section numbers.

A parser for law histories.

The establishment of a general metadata table to allow the storage and display of arbitrary types of information on a per-law law.

Support for storing multiple versions of the same law (e.g., both the 2011 and 2012 revisions) simultaneously.

As with v0.1, this version is an alpha release—there’s no installer, documentation, administrative backend. It’s only for the brave or curious. Virginia Decoded and Sunshine Statutes (Florida) are not running on this release but, instead, a modified version. Each release will get closer to providing all of the functionality of these live sites with the flexibility and abstraction to provide the same functions for other states, and I remain optimistic that v0.3 will be the release that can be installed on these live sites, so that I can eat my own dog food.

The Florida implementation of The State Decoded has launched as a public alpha test. Sunshine Statutes resulted from strong interest in the project from the Florida Society of News Editors and the First Amendment Foundation, especially Rick Hirsch, the Managing Editor of the Miami Herald. Within hours of the John S. and James L. Knight Foundation announcing the $165,000 grant that funds the State Decoded project, Rick was insisting that Florida was the perfect state to start things off, and he was right about that. Open data hacker Michael Tahani did the heavy lifting of creating the parser, which reads the XML of the Florida Statutes and turns it into a format familiar to the State Decoded’s software.

The resulting site is rather beyond a proof of concept, but surely not finished. (Hence the “alpha test” moniker.) Some statutes with particularly complex structures are missing some text, and not all statute histories are being parsed correctly, but we’ll be ticking down the list of fixes and getting everything repaired soon enough. (Every statute has a link back to its listing on the Florida legislature’s website, making it easy for folks to see the official version of the text.) Once all known content-related bugs are fixed, it’ll enter “beta” status, and the dire warnings can be stripped away.

In the few days since we announced Sunshine Statutes, there’s been an outpouring of offerings of help from Floridians. Putting together a site like this—and keeping it going—is more like a barn-raising than a monolithic construction project. Any other folks who are so moved to get involved are welcome to contact us—the folks at the FSNE and the FAF would surely love the assistance, and there’s certainly a lot of work to be done.

At WebLaws.org, Robb Shecter is puzzling through how to deal with the California Code’s curious lack of titles. Most state codes provide a title for each law (known as a “catch line” in most states), such as “Enforcement of child labor law,” “Fees for filing documents or issuing certificates,” or “Money derived from forest reserve.” Not California’s. Robb provides the example of California’s § 459, the law prohibiting burglary. One must read through the law to know what it does. This, of course, makes it very difficult to navigate through the California Code.

The question that Robb asks is what we are to do about this. The problem is abstract for me—I have no immediate prospects of working on the California Code, but Robb has it online now, so the problem is very real for him.

The reason that California has been able to get by with such an odd arrangement is that private legal vendors, like West and LexisNexis, write their own titles for laws. Most attorneys surely use the terminology provided by those companies, some perhaps unaware that those are not official titles. Those titles are copyrighted by those vendors, though, and cannot be used for projects like WebLaws.org or The State Decoded. This means that we must be able to generate our own titles.

Here is the conceptual solution that I arrived at for California some months ago, which I share here in hopes that it might do others some good. I have not implemented this, so while in theory it makes sense, I cannot say for sure that it’ll work.

Like many states, California maintains an annual index of all legislation that has come before their legislature. (This is the 2011–2012 index, for example.) This allows people to look up all bills pertaining, for instance, retirement, and see the following listing:

The process is straightforward. First, match up all legislation with the existing law that it proposes to amend. Then, find every entry for all of that legislation in the index of legislation. The description in the index becomes the title of the law. For those laws that have multiple candidate descriptions (either because they’re in the index repeatedly, multiple bills propose to amend them in a given year, or there are many years of attempted amendments), the words that appear most frequently in those descriptions can be used to automatically assemble a title.

This is bound to lead to some goofy titles. And some laws have not had bills introduced that would amend them for decades, and so information about them would not be available in bulk. But in my experience, the laws that most interest people are the ones that legislators attempt to amend, so titles would be provided for those laws that are most liable to be read.

What of the rest of the laws, left untitled by this first method? Statistically improbable phrases (SIPs) are a good backup method. A phrase that occurs in a law that is very rarely found in the rest of California’s laws is liable to be a decent candidate for its title. Again, potentially goofy titles could result, and I have not tested this, but theoretically it could work pretty well. Amazon.com displays SIPs for some of their books, and I think those illustrate the range of results that one could expect from them. For instance, Nora Ephron’s newly re-popular “I Feel Bad About My Neck” has two SIPs: “serial monogamy,” “cabbage strudel.” The former is a not-unreasonable summation of of the book. The latter is obviously pretty unreasonable.

Some experimentation is going to be necessary to arrive at a decent system for generating titles for California’s laws. Ideally, whomever creates them would put them up on Google Docs for some collaborative editing, and release the resulting text under an open license, so that, at last, we will all have titles for all of the laws in the California Code.

This problem is, not incidentally, emblematic of a routine problem with state codes. It seems like they’re all missing something, some core piece of data that would make them far more useful. Each of these will require its own patch, its own work-around, to render those laws widely accessible to the general public. We’re all taking it one state at a time.

The State Decoded started off as Virginia Decoded, a project that I spent nights and weekends on for a year or so. The Knight Foundation’s $165,000 grant funds taking that Virginia-specific code and abstracting it sufficiently to apply to other states, cities, or even countries. In the three months that this has been my full-time job, much of my time has been spent on the very specific task of de-Virginia-ing the code, so that it can work anywhere.

This v0.1 release is the result of that streamlining. It required some significant architectural changes. The biggest change was providing support for the widely varying structures of legal codes. The Code of Virginia is broken into titles, which are broken into chapters, which are broken into articles, and those are made up of individual sections.* Accordingly, I had tables in the database for each of these structures. As a result, the software could only work for legal codes that used the same three-tiered structural system. That had to be tossed out and rewritten. There were a series of changes of this nature, all of which simplified and normalized the software’s functionality.

Anybody looking to launch their own implementation of the State Decoded with this v0.1 release would be disappointed by the awkwardness of the process. There’s no installer, instructions, or clever administration system. But it is functional, structurally intact, and extremely informative for anybody interested in putting it to work.

I’m marching towards v0.2, scheduled for release one month from now, and working on getting Virginia Decoded using the live release of the State Decoded software, instead of the State Decoded software being merely derived from Virginia Decoded. The two should converge somewhere around v0.3, and that’s the next major developmental milestone in this project.

* The official SGML file of the Code of Virginia has no representation of articles and, as a result, they are not employed on Virginia Decoded. That is not a permanent problem, but that is the present state of things.

Since the State Decoded project tracks every mention of every section of a code throughout that code, I thought it might be interesting to look at what the most-cited sections of the Code of Virginia are. The numbers turn out to reveal a bit about the nature of the laws that govern us.

The #1 position is held, by no small margin, by the statement of purpose of the Administrative Process Act, which is precisely as boring as it sounds. (With the important caveat that boring does not equate to unimportant!) In fact, the first seven are all government regulating itself, with the only really interesting one being § 2.2-3700, the first section of the Virginia Freedom of Information Act. The #8 position is a list of definitions that are used throughout the Motor Vehicles title of the code, the #9 law makes rape illegal, and at the #10 spot on our list is the law that prohibits drunken driving.

Of the 30,826 laws in the Code of Virginia, 5,490 (or 17.8%) are cited elsewhere in the code. Just 454 (or 1.5%) are cited 10 or more times.

Why is this interesting? Well, it provides a look at the interconnectedness within the Code of Virginia. Or, really, the lack thereof. One of the goals of this project is to provide a more logical interface for browsing legal codes, instead of the usual, rigid, hierarchical system that divides up most of them. The lack of interconnectedness of Virginia’s code is an indicator that we’ll need metrics other than cross references to establish new groupings of laws, whether internal to the code (such as the shared use of defined terms) or external to the code (shared citations in legal decisions or legislation).

It might be illuminating to compare these data about Virginia to other states as the State Decoded project is implemented elsewhere. Perhaps Virginia is an outlier in its internal cross-pollination, or perhaps it’s perfectly normal.

Throughout the planning process for the State Decoded project, I have made the basic assumption that the primary source of traffic for implementations of the software would be from search engines. People typing in things like “‘following too closely’ virginia,” “boundary law in kentucky,” or “grand larceny illinois bad checks,” who would be led directly to the law that in question, presented within a context that would make that law understandable to them. This usage pattern is one of the major concepts behind the project.

Nine weeks after launching the Virginia website, it’s been indexed by Google thoroughly, though it has few enough third-party links that it has a PageRank of just 4 (out of 10). In the past week, Virginia Decoded has had 458 keyword-bearing referrers from Google. Not one of those search phrases has been used more than 7 times. 4 of them have been used 3–7 times. 14 of them have been used 2 times. The remaining 440 were used just 1 time. This is a very flat distribution. I’d call it a “long tail” of search results, but it’s all tail—basically a snake.

Many of these search terms are extremely specific (e.g., “what does it mean when an employee returns to his position on an active employment basis for 45 consecutive calendar days or longer any succeeding period of disability shall constitute a new period of short term disability”), and dozens are for specific sections of the Code (e.g., “18.2-460”). Many appear to be people trying to solve problems (e.g. “transfer an inspection sticker to a new windshiled [sic],” “virginia code failure to file tax,” “what is the penalty for a failure to appear in va”).

Some of these search terms return results that would not otherwise yield useful results from the official Code of Virginia website. For instance, “why va. code 18.2-361(a) is unconstitutional” returns Virginia Decoded’s § 18.2-361 record because it includes all court decisions that cite § 18.2-361, notably William Scott McDonald, a/k/a William Scott MacDonald v. Commonwealth, which has an court-provided abstract that reads:

This Court finds that Code Section 18.2-361(A) is constitutional as applied to appellant because his violations involved minors and merit no protection under the Due Process Clause of the Fourteenth Amendment; appellants convictions of four counts of sodomy are affirmed

§ 18.2-361 is still on the books, and somebody looking at the law on the state’s official website would have no way of knowing that portions of it have been judged unconstitutional by the Virginia Court of Appeals. By putting court decisions on the same page as the law, Google inferred that they are related, indexed all of the content together, and returned this far-more-useful result to somebody posing a basic question about the law’s constitutionality.

People referred to Virginia Decoded via a Google search stick around for a minute and a half, which isn’t brilliant, but it’s not too shabby, either. They look at an average of 2.68 pages, which is also decent. With just 458 keyword-bearing Google referrers in the past week, though, there’s clearly a lot of room for improvement in overall referrers, something that will be helped with the passage of time, as more sites link to Virginia Decoded and its PageRank climbs.

The plan here was to turn entire state codes into enormous targets for search traffic to help people solve problems and better understand the laws that govern them. Traffic records bear out that at least the former half of that plan is being fulfilled. That accomplished, I can concentrate more on the latter, which was always going to be the real work.

Most people’s eyes gloss right over that. (Really, did you read any of that, or just glance at it and acknowledge “yup, that’s a bunch of stuff that means nothing to me…I’ll just skip that and see what he’s got say about it”?) What looks like nonsense to most people turns out to be really rich data, which is simply stored in such a way to render it basically meaningless. Let’s peer inside and see what this means, starting with Virginia.

With Virginia’s history, the first pattern to emerge is that what looks like a long string of numbers is actually broken up into stanzas by semicolons. Here’s the first stanza:

The first four numbers—1999—are the year in which this section of the code passed into law, at least in its present form. (That was accomplished with then-delegate Chip Woodrum’s HB1985, which overhauled Virginia’s FOIA laws.) And the last string of numbers—§ 2.1-342.01—is the section number that this section had at the time. (Title 2.1 was recodified as Title 2.2 in 2000, which is when this was given its present section number.) In the middle, that series of three-digit numbers (485, 518, 703, etc.) refer to the portion of the Acts of the General Assembly for that year that created or amended this section of the code. The Acts of the General Assembly are sort of like a changelog for the code (but not exactly like a changelog!), in which all of the legislation that passed the General Assembly that year is ordered by the section of the state code that it affects; when multiple bills affect the same portion of the code, they are combined. It’s the intermediate step between a bill and the final, amended code. So here we can see that there were ten portions of the 1999 Acts of the General Assembly that affected this section of the code.

With this as a key, one can step through each stanza in the history of this Virginia law and understand how and when it changed, if not what the substance of those changes was.

Presumably it’s written in this manner to save space in the printed volumes, but obviously it no longer makes sense to codify our laws in a manner optimized for printed volumes. We can do better.

I’m developing a parser for the State Decoded for these history sections, so that rather than displaying this cryptic content, instead the material will be provided in plain English. By storing this data atomically, it’ll be possible to generate a listing of all laws that were amended in a given year, all laws amended by a given portion of the Acts of the General Assembly, or find laws similar to a given law based on their shared history of being amended within the same portion of the Acts. I’m optimistic that it’ll be possible to connect many state codes’ history records back to individual pieces of legislation, rather than just the legislature’s changelog, which opens up a potential wealth of information. (This can already be seen on Virginia Decoded for all changes from 2006 onward, such as in the “Amendment Attempts” listing on § 2.2-3705.1.)

Incidentally, Florida has the same sort of exemptions to its open records law, in s. 119.071, and its history section looks like this:

There is a common belief that since laws are the result of legislation, then surely one can automatically assemble an amended version of the code based on the bills that have passed the legislature. This is both a really cool idea and wrong.

Your standard narrative of how a bill becomes law doesn’t really cover what it purports to cover. Usually what’s really being explained is how a bill passes, but not how it becomes law. There’s a whole process between the passage of a bill and the encoding of that bill in a state’s codified laws.

Legislatures pass hundreds or thousands of bills every year, some of which are budgetary, some of which are to define their own rules, some of which pertain to state administration, and some of which are resolutions. The remainder are patches to be applied to law, either proposing a new section or amending an existing one (by either adding or removing material). These patches look familiar to anybody who has seen a prettied-up diff, and it’s wholly logical to figure that amending the state’s laws should simply mean collecting all of the bills that pass and applying those patches to the laws.

The trouble is that these patches are not always the last word on the changes that are going to be made to existing law.

The Commission may correct unmistakable printer’s errors, misspellings and other unmistakable errors in the statutes as incorporated into the Code of Virginia, and may make consequential changes in the titles of officers and agencies, and other purely consequential changes made necessary by the use in the statutes of titles, terminology and references, or other language no longer appropriate.

The Commission may renumber, rename, and rearrange any Code of Virginia titles, chapters, articles, and sections in the statutes adopted, and make corresponding changes in lists of chapter, article, and section headings, catchlines, and tables, when, in the judgment of the Commission, it is necessary because of any disturbance or interruption of orderly or consecutive arrangement.

The Commission may correct unmistakable errors in cross-references to Code of Virginia sections and may change cross-references to Code of Virginia sections which have become outdated or incorrect due to subsequent amendment to, revision, or repeal of the sections to which reference is made.

The Commission may omit from the statutes incorporated into the Code of Virginia provisions which, in the judgment of the Commission, are inappropriate in a code, such as emergency clauses, clauses providing for specific nonrecurring appropriations and general repealing clauses.

(TL;DR: The Code Commission can make a lot of changes during that in-between period when a bill has passed, but it’s not quite yet a law.)

It is not unusual for the legislature to amend a bill at the very last minute, without proper review. From the marked-up format of a bill (words crossed out, others inserted) what can emerge is grammatically incorrect or even contains logical errors. It’s surely a judgment call whether such problems can be fixed by the Virginia Code Commission or whether it will require the General Assembly to fix them, which may well require a delay of nearly a year.

Here, for instance, is a selection of the 35 changes made to the Code of Virginia by Code Commission staff since the 2011 edition was published:

Second sentence, after “responsibility to” add “(i) establish an address confidentiality program in accordance with § 2.2-515.2, (ii)” and change “programs and shall report” to read “programs, and (iii) report”

10/25/2011

2.2-2338

In first paragraph, (i) change “13 voting members” to “12 voting members”; (ii) insert “and” after “Commerce and Trade,”; and (iii) delete “and the Assistant to the Governor for Commonwealth Preparedness”

8/5/2011

2.2-2699.5

Subsection B, replace “Assistant to the Governor for Commonwealth Preparedness” with “Secretary of Veterans Affairs and Homeland Security”

8/5/2011

2.2-4509

Change “AA by Moody’s” to “Aa by Moody’s”

10/25/2011

6.2-314

End of catchline, change “institution” to “institutions”

10/25/2011

6.2-412

End of catchline, change “improvement” to “improvements”

10/25/2011

In Virginia—as in other states, although I don’t know how many—an attempt to use legislation as a changelog for the state code would yield results that would be very convincing-looking, but that would deviate substantially from the official code. Bills must frequently pass through a human filter before they become laws. That’s not something that you can simulate with a Ruby gem or a PEAR package. Some things just require some thought, in ways that can’t yet be automated.

Although most states provide a copy of their laws online, some outsource this to LexisNexis. Arkansas, Colorado, Georgia, Mississippi, Tennessee, and Washington D.C. all do so. While this might seem like a decent solution at first blush, it’s actually incredibly problematic, and serves as a major obstacle to innovation within those states.

It is self-evident that state laws ought to be disseminated as widely as possible and be as accessible as possible. To follow the law, people must first be able to know what it says. Projects like the State Decoded (or Legix.info’s California Codes, or OregonLaws.org, or Justia’s US Law directory) rely on access to the text of the law. These services take the raw material of the laws and make substantial improvements to them, making this important information more accessible and understandable than they are on their state-sanctioned websites.

When states punt to LexisNexis, they make their state codes a dead end.

Washington D.C. used to provide their code on dccouncil.washington.dc.us. No longer. Now it’s found only on LexisNexis’s website. Any D.C. resident who wants to read the code—their own laws—must first agree to LexisNexis’s terms of use, which allow visitors merely “the right to download using the commands of the Online Services and store in machine-readable form, primarily for that Authorized User’s exclusive use, a single copy of insubstantial portions of those Authorized Legal Materials.” LexisNexis goes on to explain that, “for the avoidance of doubt, downloading and storing Materials in an archival database is prohibited.” LexisNexis’s terms of use make it impossible to do anything of interest with D.C.’s code. No value can be added to it. The strangely specific prohibition on storing data in a database (e.g., Zotero) ensures that.

The problem is not LexisNexis per se, but rather their strikingly restrictive licensing terms of materials that, were it not for those terms, could be reproduced freely.

Unless Arkansas, Colorado, Georgia, Mississippi, Tennessee, and Washington D.C. provide bulk downloads—which is rare—or can be persuaded to provide an electronic copy of their laws, LexisNexis’s licensing terms are an immovable object that prevents the advance of any private-sector effort to enhance the display of those laws.

With yesterday’s launch of Virginia Decoded, there are suddenly a lot of people who would love to set up The State Decoded for their own state, something that isn’t possible just yet. This calls for an explanation of what the plan is.

Building Virginia Decoded was an evenings-and-weekends hobby for me starting in the summer of 2010. I learned the structure of the Code of Virginia as I went, and built the site explicitly to mirror that structure. Some friends who were alpha testing the site in late 2010 insisted that it could be used in other states. I applied to the John S. and James L. Knight Foundation for the funding to overhaul the Virginia Decoded code base, abstract it enough that it could support the widely varying structure of legal codes throughout the United States, and turn it into a proper open source project. In June the Knight Foundation named the State Decoded project one of the winners of the 2011 News Challenge. The funding came through a few days before the end of 2011, and I was able to get started on the project two weeks ago.

Launching Virginia Decoded was easy, because I’d created the site long before getting started on The State Decoded.

The next task is to scrub the Virginia Decoded source of all material that shouldn’t be released publicly (passwords, API keys, etc.), at which point it can all go up on Github. (You can find Virginia Decoded’s parser on Github already.)

Then comes the real work, which is eliminating all of the functionality that is fundamentally Virginian, and replacing it with more flexible functionality. For instance, the Code of Virginia is broken into titles, which are broken into chapters, which is broken into sections (each section is a single law). But California’s laws are broken down into codes, which are broken into divisions, which are broken into chapters, which are broken into articles, which are broken into sections. As a result, California’s laws just won’t work in the software’s existing framework, which is premised on the assumption that codes are divided into three levels, and that those levels are called “titles,” “chapters,” and “sections.” This and other, similar problems are wholly solvable—they’ll just require some reflection and some time. Luckily, solving those problems is my full-time job for the foreseeable future, courtesy of the Knight Foundation.

When the State Decoded code base is sufficiently abstracted to work across states, that’s when things will get fun.

Post navigation

A platform to display state codes, court decisions, and information from legislative tracking services to make our laws understandable by regular people. Featuring beautiful typography, embedded definitions of legal terms, and a robust API. Funded by a grant from the John S. and James L. Knight Foundation and winner of the 2011 News Challenge. →