Saturday, June 19, 2010

KLISS: The importance of naming things well

Last time, in this series on KLISS, I talked a little about the useful overlaps between paradigms/tools/techniques for managing corpora of source code and corpora of law. I mentioned that when I personally look at law from my engineering perspective, I see the same sorts of things I see when I look at source code, namely highly structured, densely inter-linked, temporally bound units of text.

There is a temptation – one I intend to avoid – to jump at this point into concerns about the units of text themselves and in particular, to worry about what format the units should be stored in. In order words, to worry about syntax. Should the law be HTML? Should it be Docbook? Should it be <insert name of word-processor or DTP package here>? I don't want to go there today. Not because the question is not important. It is *very* important. But there are bigger, more important questions that need to be addressed first. First amongst them being the question of naming. Yes, as trivial as it sounds, I want to talk about naming things.

Phil Karlton once said that there are two hard problems in computer science: cache invalidation and naming things. Anybody who has written any software knows the effort that goes into naming things. Files have names. Files live in folders that have names. Files contribute to modules that have names. Modules are made up of classes, methods, functions,variables which...yes...all have names. Functions/methods consist of statements that either create new names or reference existing names or other functions/methods, modules... Names everywhere.

Law is similar. Bills have names. Statute titles of names. Committees have names. Agencies have names. Parliamentary procedures have names. Voting Members have names. Bills refer to statute by name. Statue refers to statute by name. Committees refer to Bills by name. Journals refer to Committees by name. Calendars refer to committees by name... Names everywhere.

I labor this point because names are the vehicle through which the dense inter-linkages are expressed : both in source code and in law. It is not possible, in my opinion, to have an information model in either domain without a detailed conceptualization of naming. You could call it a "naming convention" and that would be fine but I prefer to call it a "referencing model" because so much of the value in a naming convention comes from its use to reference – to pick out – information objects.

So how, historically has law gone about "picking out" information objects like bills and statute? How (in its much shorter history) has software gone about "picking out" information objects like functions and modules?

Three examples from law, each with a short explanation about what I find interesting about it from a referencing model perspective:

United States v. Lane, 474 U.S. 438 (1986): Picks out a particular unit of text (in this example case law) by providing a set of attributes that includes a timestamp. No other context required.

HB 2130 approved on final action: Extra context required in order to pick out a unit of text (in this case a house bill), because numbers like "2130" are re-used every legislative biennium.

K.S.A. 74-8905 and amendments thereto: Picks out a unit of text (in this example, a statute) but implicitly adds "as it looks today" by adding "and amendments thereto".

Notice how time is critical in all three examples in order to pick out a definitive unit of text. The first locks down time explicitly with a timestamp. The second cannot be used to yield a unit of text without further context i.e. what Biennium (and indeed, what Legislature) is being referred to here? The third one picks out a unit of text but allows for the unit of text to change, depending on when you de-reference this reference. If you "look" tomorrow, 74-8905 might say something different from what it says today.

Each of these referencing approaches can be found in software too:

from string import regex: Yields a unit of text but without knowing what version of Python is installed, we cannot be sure what is in it.

java -jar poi-3.6-20091214.jar: Yields a unit of text unambiguously by virtue of the version and timestamp information included in the name of the jar file. (Extra surety is provided by an MD5 hash value so that we can know that our poi-3.6-20091214.jar is the same as that published by the developers.)

google docs edit -- title "Shopping list": Picks out a unit of text (the Google docs application) but allows the application to vary. In other words, what you get when you de-reference is the application as it exists right now. It might be different tomorrow.

Naming things is just plain hard. If you did not believe that before now, I hope I have helped convince you. Picking a unit of text unambiguously out of the ether and keeping its semantics in exact accordance with the intent of the original creator is a deep, deep problem in many walks of life. Two of which are law and computer science. Doing the problem justice would require a very long detour into semiotics, semantics, pragmatics, linguistics, epistemology and situation theory to name a few. Although fascinating stuff if it floats your boat (it floats mine although, like law, I am strictly a lay man in this field), we will limit the discussion to the smallest amount of language theory necessary for me to explain how KLISS works with respect to naming things. Namely (!), what is known as the descriptivist theory of names and in particular Kripke's concept of Rigid designation.

The problem of naming things is as old as human communication and remains "unsolved" to this day. When I say it is unsolved, I mean that we spend most of our time as humans referring to things ambiguously and we use context and probabilities to disambiguate. If I say "Python" (there, I just said it!) you will probably think "Python the programming language" because of the context in which you read this text. You will not (I suspect) immediately think "Python the snake" but you won't completely rule it out either. It is just more likely that I'm referring to – picking out - the programming language. Similarly "HB2145" is ambiguous without more context but if you read about HB2145 in the Journal of the House in the great state of Tumbolia in 2010, you will likely conclude that it refers to – picks out – HB2145 in Tumbolia in the 2010 legislative session. In fact, if there is no other surrounding context you may conclude that the unit of text being referred to is HB2145 as introduced – as distinct from as amended by committee or floor action for example.

Bertrand Russell and Gottlebe Frege are two of the philosophers who thought about the problem of naming things and were (very broadly speaking) of the opinion that names where really query expressions in disguise. i.e. a name like "HB2145" is really a short code - an alias - for the full name which is something like "HB2145 as introduced in the Tumbolia state legisature, 2010".

Kripke (very broadly speaking again!) disagreed. The details need not concern us here. Suffice it to say that Kripke coined the term "rigid designator" to mean a name that picks out the same thing in all possible worlds.

In all possible worlds...What a great thing to have! If you read my earlier KLISS post about the worryingly quantum mechanical nature of digital data you will see why I find the notion of a rigid designator so appealing.

If I had a rigid designator for each unit of text in my corpus of law (or source code):

I would not need any other context to get at the unit of text (the "referent" as it is known in the vernacular)

I would not need to worry about who is referencing, when they are referencing, where they are doing the referencing from etc. The same unit of text will be yielded every time.

That sounds just perfect for legislative informatics! Next time, I'll talk about how we incorporate rigid designation into KLISS.

For now, let me finish by mentioning a conversation I had with Bertrand Russell once. I think I have the link here...try this or maybe this...

Do both links bring you to the same place? Are both links the same? :-)

6 comments:

Great post; it will be interesting to see how you don't end up creating sub-atomic particle physics to end up with your rigid designator. Without shorthand for context, you must consider all possible contexts, no matter how nuanced. Perhaps if you dive deeper you will find true quantum meaning.