Friday, April 21, 2017

What is law? - Part 10

Earlier on in this series, we imagined an infinitely patient and efficient person who has somehow managed to acquire the entire corpus of law at time T and has read it all for us and can now "replay" it to us on demand. We mentioned previously that the corpus is not a closed world and that meaning cannot really be locked down inside the corpus itself. It is not corpus of mathematical truths, dependent only on a handful of axioms. This is not a bug to be fixed. It is a feature to be preserved.

We know we need to add a layer of interpretation and we recognize from the outset that different people (or different software algorithms) could take this same corpus and interpret it differently. This is ok because, as we have seen, it is (a) necessary and (b) part of the way law actually works. Interpreters differ in the opinions they arrive at in reading the corpus. Opinions get weighed against each other, opinions can be over-ruled by higher courts. Some courts can even over-rule their own previous opinions. Strongly established opinions may then end up appearing directly in primary law or regulations, new primary legislation might be created to clarify meaning...and the whole opinion generation/adjudication/synthesis loop goes round and round forever... In law, all interpretation is contemporaneous, tentative and de-feasible. There are some mathematical truths in there but not many.

It is tempting - but incorrect in my opinion - to imagine that the interpretation process works with the stream of words coming into our brains off of the pages, that then get assembled into sentences and paragraphs and sections and so on in a straightforward way.

The main reason it is not so easy may be surprising. Tables! The legal corpus is awash with complex table layouts. I included some examples in a previous post about the complexties of law[1]. The upshot of the use of ubiquitous use of tables is that reading law is not just about reading the words. It is about seeing the visual layout of the words and associating meaning with the layout. Tables are such a common tool in legal documents that we tend to forget just how powerful they are at encoding semantics. So powerful, that we have yet to figure out a good way of extracting back out the semantics that our brains can readily see in law, using machines to do the "reading".

Compared to, say, detecting the presence of headings or cross-references or definitions, correctly detecting the meaning implicit in the tables is a much bigger problem. Ironically, perhaps, much bigger than dealing with high visual items such as maps in redistricting legislation[2] because the actual redistricting laws are generally expressed purely in words using, for example, eastings and northings to encode the geography.

If I could wave a magic wand just once at the problem of digital representation of the legal corpus I would wave it at the tables. An explicit semantic representation of tables, combined with some controlled natural language forms[4] would be, I believe, as good a serialization format as we could reasonably hope for, for digital law. It would still have the Closed World of Knowledge problem of course. It would also still have the Unbounded Opinion Requirement but at least we would be in position to remove most of the need for a visual cortex in this first layer of interpreting and reasoning about the legal corpus.

The benefits to computational law would be immense. We could imagine a digital representation of the corpus of law as an enormous abstract syntax tree[5] which we could begin to traverse to get to the central question about how humans traverse this tree to reason about it, form opinions about it, and create legal arguments in support of their opinions.