… in more ways than one. Back in May of 1994, a week or so before the annual CALI conference for legal technologists, three of us circulated the document you see below. Composed by me, Peter Martin, and Will Sadler, it set out what we saw as the next steps for the legal web. It’s scary how much of this stuff we’re still trying to work out — admittedly with better technology.

Although (gotta say it) I’m hearing exactly the same bitching about the difficulties of Linked Open Data — in more or less the same words — that Gopher adherents used to complain about how hard that newfangled WWW hypertext stuff was. Plus ca change and apres moi la parapluie.

[ Author note: I’d like to start by saying thanks to all of our pals at Justia, and especially Tim Stanley and Nick Moline, for their help with all that is described in the following, and to the denizens of law-lib and contest winners Scott Vanderlin and David Curle in particular for help with testing images.]

Some weeks back, our friends at Justia.com very generously arranged for us to have access to Google Glass. We weren’t at all sure how Glass fits into the legal-information world. We still aren’t, and that’s what motivates this post. But Glass is very, very cool, and it or something very like it will be transformative.

Because it was very, very cool, we wanted to develop an app for it. Like all garage-bound experimenters, we looked to see what we had laying around that might be made to work with it. It turned out that Wayne Weibel had done something very smart a while back (and that is not meant to imply that there is anything remarkable about Wayne doing smart things; he does that all the time). We have a tool called Citationer that extracts citations from documents, and it has a Tesseract-based OCR component that we use with image-based PDFs. Turns out that it’ll work with any image format, and Wayne had built that capacity out a bit, in the expectation that people would want to use it with document images sent from phones. And, as it turns out, from Glass.

So Wayne and Sara and I took two days out of the office to see if we could whack something together that would let you take a picture with Glass and send it off to a server-based application that would send you back a link or links to anything cited in the image. The result, almost entirely Wayne’s work, was an app called “Signtater”. It works well with documents and some signage, and it raises a lot of questions.

But let’s talk about its limitations, first. I had a rush of brains to the head and realized that we needed things other than documents to test with. There is a lot of signage out there with US Code and CFR citations on it. I’ve collected a few pictures of such things myself, and a quick look at Google Images and Flickr suggested that others might have done likewise. We ran a little contest in which we enlisted law librarians to help us collect some of these images in Pinterest. [ parenthetical note: if I had this to do over again, I wouldn’t use Pinterest, because the apparatus for random participation by multiple individuals is terrible. I’d either ask people to tweet them with a particular hashtag, or collect them in Flickr.] A lot of people helped out, and we collected a lot of very good images of signs with citations, some of which appear in this post.

What we didn’t know was that we’d stumbled into a very hard problem in computer vision (“hard” in the “over one hundred research papers written last year on tiny aspects of the problem” sense of “hard”). Extracting text from natural scenes, as it’s known, is something that many people would like to be able to do. But it presents a lot of challenges, and is not very far advanced. We were able to do some improvement of Signtater’s performance by doing some simple image preprocessing using ImageMagick, and I’d like to do more when time permits. But for right now Signtater’s capabilities are limited to documents and signage that can be made to look like documents in the image — that is, mostly dark lettering on white backgrounds in an area that mostly fills the screen. We were disappointed by the lack of range… but Signtater is still really, really cool.

But then we started thinking

Some of the images were problematic for other reasons. Lots had incomplete context, like the one here that is missing its CFR Title number. It’s an example of what Robin Wendler has called the “on a horse” problem. Robin once worked on metadata for a collection of images of Ulysses S. Grant. The original cataloger had written descriptions every one of which assumed that you knew that the context was a collection of photos of Ulysses S. Grant. So you’d find descriptions like “On a horse”, implying “Ulysses S. Grant on a horse”. Such context-dependent descriptions, and identifiers are the bane of Linked Data people. And before you start making fun of the hapless Park Service employee who spends his whole life in Title 36 and assumes that everyone else does too, stop and think about Congressional bill numbers, which are no different, or the nest of snakes called SuDoc numbers, which changes with changes in government structure.

At a much more fundamental level, what do any of these signs even mean? Until we came along with Signtater (which, by the way, is really, really cool), there was almost zero possibility than anyone would actually dereference any of these public-notice citations to find out what the law actually said. Well, OK, I guess I should be more generous — you don’t need anything as really, really cool as Signtater, you could do the same thing from a phone. But before 5 or 6 years ago there was almost no chance that anyone would ever look to see what the law actually said. I’m intrigued by the idea of a backpacker in a national park lugging around the CFR so she could check on such things, but I’d say there’s little chance that ever happened.

So what’s really being said when somebody puts a citation on a sign in a public park? One of two things, really:

a) We claim to have the authority to do this, and here it is, so sit down and shut up, or

b) We live so much in our own world that we don’t even realize that most people don’t live there with us.

In the first, uncharitable explanation, the citation is just something official-looking that is put there to tell you that officials are official and are telling you something that you are officially required to obey. It says what, but not in any language that anyone other than the author and a few others operating in the same narrow context are likely to understand. As it happens, 36 CFR 261.50 (a) and (b) are, more or less, simple assignments of authority that don’t specify particular behaviors. You could argue, and I would agree, that some things like “501(c)(3)” and “S corporation” are sufficiently part of the culture that they can, in fact, be dereferenced by most of the affected populations, but that ain’t the case most of the time. That raises the question of how much authority we should grant to any reference that can’t be followed and read by those who are supposed to obey it.

In the second, more charitable interpretation, the author is simply someone who lives and breathes Title 36, has done so for much of his working life, and has committed the somewhat more pardonable sin of assuming that everybody else does too. Of course it’s Title 36 — we’re in a park, silly.

You might find such behaviors bad in varying degrees depending on what you think about the motivations of those involved. A cynic would see the goal of a) as intimidation, where another might see b) as having its roots in something akin to the behavior of absent-minded professors. Fact is, technical specialists of many kinds are vulnerable to the interpretation of b)-based behaviors as a)-based misbehaviors. That’s why I’ve always found lawyers and law professors who complain about the poor explanatory skills of computer technicians to be so ridiculous. Talk about holding a mirror up to nature. But I digress.

Wait, where are we?

There’s a still more intriguing question here. The signs are an attempt to associate some part of the law with a particular place. Glass has geolocation capability, and could do all that through an application that understood something I guess you’d call the law of where I’m standing right now. But what is that, really? We’d need to know a lot more in order to find out, even though theoretically a lot of the data is retrievable. Apart from simple (or sometimes complex) questions of jurisdiction, places are regulated in many ways for many purposes. A person standing in a Weis supermarket somewhere near Altoona, PA might want:

Food safety and inspection regulations for supermarkets, either at the local, state, or Federal level.

The local zoning ordinances, signage regulations, or whatever law relates to any commercial establishment (follow the link and see how many local ordinances of whatever kind are in fact location-dependent).

A Supreme Court case dealing with free speech. (interestingly, three of the first four paragraphs of the majority opinion in this case amount to drawing a map)

That suggests that before a legal-information retrieval application asks “Where are you?” it should ask, “Who are you?”. And for those purposes I might be different people at different times.

But leaving that aside, all the new location-aware devices raise the same question: “What’s the law of where I’m standing, right now?” And while that is not a question that information-retrieval systems can respond to directly, at least not without some further context, it is one that is helpful in thinking about what the design issues for such systems really are.

Karl Llewellyn once said, “Each concrete fact of the case arranges itself, I say, as the representative of a much wider abstract category of facts, and it is not in itself but as a member of the category that you attribute significance to it.” That this is a problem for legal information retrieval has been pointed out by many, most notably Dan Dabney in his piece on the “Universe of Thinkable Thoughts” (curiously, Dabney does use at least one location-dependent example, saying that there are no laws in any jurisdiction forbidding cruelty to fountain pens. This claim is not substantiated by research). The point, though, is that any use of geolocation in information retrieval needs to be accompanied by a lot of contextual information about the asker and about the problem; location is differentially material. Finding that context, and determining whether location is material, is a particular problem for non-lawyers and it is one where we might give them a good bit of help.

I’m proud to announce the debut of the Journal of Open Access to Law, a multidisciplinary journal that will publish the work that its title suggests: research related to legal information that is made openly available on the Internet. Conceived in a series of meetings over the last five years, and put finally into motion at the last two LVI conferences, it has been a long time coming. But I think you’ll see that the wait has been worth it. We should all give a big round of applause to Pompeu Casanovas and Enrico Francesconi for long and diligent work, and most especially to Ginevra Peruginelli, who has suggested, demanded, coaxed, scolded, edited, negotiated, and otherwise caused the new publication to come into being. A host of others — visible on the masthead — have served as section editors, reviewers, and (of course) as authors. A special acknowledgement goes to Jon Bing and to Peter Suber, who have written forewords framing our collective effort.

Two ideas motivate JOAL. The first is that there should be a place to present work about open access to law that can stand on its own. Because it is so often imagined as “law-and” research, our work is communicated via the journals of other disciplines, and sometimes its unique flavor has been lost. Too, open access to law touches and is touched by research on a number of levels: work in information science that provides practical publishing, organizing, and retrieval techniques; policy research that addresses the “why” of open access to law; and open access as a new-found agora in which the public is encountering legal information and, as a result, acting in ways that are very poorly understood. The second idea is that academic research needs, most of all, to find an audience within the community of legal publishers who can make good use of it for practical ends.

Two fundraising-related things went across my radar screen this morning. The first was a post in Jeff Brook’s excellent “Future Fundraising Now” blog. The second was a TV ad for a child-hunger-relief charity. They crashed together with a loud clang.

I have learned a lot from Jeff Brooks over the last year, almost all of it absolutely on-target and helpful. I’ve been trying to educate myself about fundraising and fundraisers, so I went from reading 20 or so fundraising blogs intermittently, to steadily reading the best and most useful. His blog really stands out from the pack. The ones I read attentively are selected both for usefulness to a novice like me, and because they often promote fundraising ideas and messaging that I find at least counterintuitive and sometimes very difficult to agree with. That’s good for me, and for the LII’s fundraising appeals. I lack experience, and I know it (and I’m about to demonstrate that).

You should read the Brooks piece here. The thrust of “Who’s destroying your fundraising messages?” is that inexperienced executive directors are gutting fundraising efforts by insisting on dry narratives that contain only facts, figures, and program descriptions that lack emotional appeal for donors. Of course fundraising messages need to connect with donors at an emotional level — that’s just good sense. But the hyper-emotional appeal is not the only available strategy, and I don’t believe that it is equally effective for all people, all organizations, all causes, or all donor cultures.

It can surely be taken too far. About a half-hour after I read the Brooks piece, I saw a TV spot for an organization that works with child hunger. A Very Well Fed Fellow In A Bush Jacket, wandering around in an Impoverished Place That Is Clearly Not The United States, shows us a parade of impoverished children. Pictures of the starving are accompanied by a grandfatherly narration about how much they’re suffering. He ends with a description of “little Daniela” who will “be hungry again tonight”. That’s when the bell went off. Why, after her stint as a spokesmodel, is she going to be hungry? They couldn’t leave a tip? I really hope the sound guy or the cameraman or Bush Jacket Grandpa gave little Daniela a Clif bar and a bottle of water. Or maybe they just threw the gear in the Land Cruiser and tooled off to find more kids who are starving in an appropriately photogenic way.

That’s an extreme example and — let me say it again — I’ve got no beef with the idea that good causes have to find ways to connect with their supporters. However challenging it may be, we need to demonstrate the impact of what we do in ways that are effective and meaningful at an emotional level. Of course we do. We want people to support us. We want to show both the value that our work has for real people, and that their support is being used in ways that are actually accomplishing something in the world — and that means show, and not just tell. But to insist that every mission and every program provide a 100-kilonewton tug to the heartstrings is sometimes inappropriate and sometimes — very infrequently, but sometimes — counterproductive for the mission of the organization.

To the fundraiser who found his ED so uncooperative, I’d say this:

a) Many EDs, as people, are simply not comfortable making what they see as hyperbolic, hyper-emotional statements about what they do. They don’t do it in person and they don’t like seeing their organizations do it. They feel like they’re showing their underwear in public, making claims and being manipulative in a way that is fundamentally immodest. Most of them are deeply (and emotionally!) committed to the cause they work for, but they’re shy about how they say so. They believe in appeals to reason; they find them highly motivating. And by the way, that’s true of the geek culture here at the LII and of the donors (who tend to be geeks, lawyers, or both) who support us. The very popular book, “Kiss, Bow or Shake Hands”, a collection of crash-courses for those doing business with other cultures, spills a lot of ink over the question of what other cultures accept as evidence; it’s worth thinking about. Some of those cultures are closer to home than you think.

It’s not that EDs who don’t like hyper-emotional appeals don’t respect the work of fundraisers, or that they don’t understand it (though very few, including me, actually do). It’s that they believe in policy and they believe in technical and structural solutions, and in order to be successful program directors they have learned to channel their own emotional energy into the dispassionate place where administration, evaluation, and strategic thinking have to take place. The best fundraisers I know are completely schizophrenic — warm on emotional appeals and personal connections, dead cold on the numbers and on evaluation.

Besides, there’s a way to deal with this. How come the copywriter didn’t suggest to his ED that they simply do an A/B test? The ED is persuaded by numbers. Give him some.

b) Somebody I once did theater with used to insist that the first rule of comedy is this: If you bring a paper shredder onstage somebody’s necktie *has* to go into it.

In other words, you have to deliver on implied promises. I’m not sure what the stewardship implications of highly emotional appeals are. My guess is that those who write them are figuring that that’s somebody else’s problem, or that rationality and practicality can come later. If I call the children’s charity and say, “Hey, how’s Daniela doing?”, are they going to have an answer for me? What if I decide that Daniela’s the only kid in the entire world that I’m willing to sponsor? The deliberate impracticality of strong emotional appeals raises practical issues. They can, and do, implicitly overclaim. Just as the heart responds to the emotional appeal, the heart envisions a happy solution that may or may not be possible for the organization to deliver. If we imply a promise to change the world overnight, what happens when we don’t?

c) I’ve seen copywriters who can’t find an emotional hook for their message give up and look no further, or worse, fabricate something. They assume that if they can’t summon up an emotionally appealing beneficiary in the first hour they work for the organization, then there is no message to be had. If they stop there — and some do — they end up knowing far less than they ought to about the mission, operations, and impact of the organizations they work for. If they can’t find an “if it bleeds, it leads” story to tell, they think there’s nothing to say.

Trust me, every organization has a story or two to tell that will make a connection with donors, and I suspect that in dismissing everything that doesn’t have immediate and obvious emotional impact a lot of very valuable stuff gets lost, including, sometimes, the true appeal of the organization (one of ours is objectivity, see below). The answer to that ED and his problems with your heart-rending story is to simply ask him why he does what he does. And ask the rest of the staff. And ask the donors why they give. Find out what your organization does, how it does it, and what the people who do it feel about what they do. Be creative.

d) Overfocus on emotional appeals can, in some fairly rare cases, distort the organization’s mission and dilute its effectiveness. The LII is, I think, one of those rarities. I’m well aware that Special Snowflake Syndrome is a risk for all EDs, but I really do think that what we do here is a little different and more challenging.

Our job here is to make legal information available to people — Federal law and regulations, and the writings of the Supreme Court — all products of public institutions that at any given moment may be more or less popular with potential donors. We are based only on the Internet, where we attract more than 24 million unique visitors every year. We know very little about all but a very few of them. But every donor or audience survey we have ever done in our 20-year history lists objectivity as a key component of our value in the minds of those we serve. We don’t want to compromise that with over-emotional appeals that would, inevitably, try to make their case by invoking partisan sentiment about government. We’ve seen this happen in other organizations. We have many colleagues and allies at organizations that promote government transparency. And we’ve watched over the years as some of those organizations have rallied their troops and raised money by taking deliberately oppositional stances that diminish their effectiveness with the very people in government whose help they most need. Whipping up the emotions rallies the supporters and it brings money in the door, but it can also make a lot of people less cooperative.

Bottom line: the right answers are negotiated between the two poles represented by the ED and the copywriter. And then they’re tested. I wonder how much of the money that was lost by the organization in the Brooks story was lost because the copywriter was trying to teach the ED a lesson.

We’re grappling with all these questions here at the LII. Going into this season we’ve thought a lot about our impacts, our message, and what we want to say to the thousands of people who give us a little bit of money because we helped them, or because they want to help someone else. Because we deliver our services in what is essentially an anonymous, broadcast medium, it can be hard to know what concrete benefits we have for a particular individual in a particular place. We know that we help a lot of them for not very much money (about 5 cents for every person we serve in a year). They have their own reasons to need access to law, and the law is itself, in turn, a tool for accomplishing something in their lives, for solving problems that they have. To borrow a bit from Harvard Business School marketer Ted Levitt, we sell electric drills to people who want quarter-inch holes. And, come to think of it — they don’t really want the holes, either. They want to hang something on the wall, and we don’t know what.

We’re learning a lot more about what some of those purposes are and how we can change and improve our collections and our technology to help them find and understand what they need. We don’t have very many dramatic stories to tell — yet. Our biggest job is to help people reduce the amount of drama in their lives, by helping them solve problems that involve getting a little knowledge of what the law says. Among the people that we help are lawyers at non-profit organizations (literally hundreds of them) who would rather spend money on their mission than on access to the tax laws. So we are saving the world — we’re just doing it through others, one statute at a time.

We offer that — all 500,000 pages of information — freely, to 24 million people each year, at a cost of about a nickel apiece. We do it without drama and with as much objectivity as we can manage. We’d like your help. Please give by clicking here.

And if you’ve got a good story to tell about how we helped you, please send it along.

Steve Young, an academic law librarian at Catholic University, has written a short but fascinating piece that ponders law-library complicity in the legal education crisis. (those who follow the link in the previous sentence will need to scroll past the pictures of law librarians, food, and beverages to page 7, because LLSDC only publishes in PDF, presumably to ensure authenticity). He dares to suggest that perhaps law libraries are showing too much deference to non-essential demands on the part of law faculty, participating in a collection-development arms race (to be fair, not one of their own making), and wasting a bunch of energy feeling sorry for themselves. I suspect that most everybody could spin a tale or two that goes well beyond what he describes in his piece. Some of our higher-end compatriots might, for example, speak of the role that add-on library services play in the recruitment of highly specialized faculty. I would argue that the teaching of legal research as an uncompromising and limitlessly-funded activity is likely not such a great idea either.

But whether Steve covered all the bases or got the facts exactly right is not the point. For one thing, there are very few situations or remedies for them that are universal across institutions. But that doesn’t really matter. There’s a broad truth in what he’s saying, and we should listen up. Will we? Any time anyone suggests that maybe law schools or their libraries haven’t been doing such a great job, and might need to change, there will be platoons of serial-spotters-of-one-more-detail-he-neglected and threatened-rice-bowl-owners who want to represent the author as uninformed, slipshod, embittered, or simply Not One of Us. A profession interested in developing sustainable models for its own continued existence might collectively suggest that that crew sit down and shut up, at least long enough for some pragmatic soul-searching to take place.

Res ipsa, homies.

[ PS: anyone interested in the nest of snakes that inhabit “professional deference” might want to take a look at Andrew Abbott’s “The System of Professions”. Unfortunately, while he deals extensively with subordinated professions in (eg) medical settings, he doesn’t carry the discussion into his otherwise useful discussion of librarians, but the notions apply nevertheless. Selah. ]

What

This might be the dumbest idea we’ve ever had, or the best, but either way it has high Fun Potential. We’re collecting photographs of signs, notices, and other public postings that contain citations to statutes and regulations — on Pinterest. Contest closes next Thursday at midnight, but you should feel free to post pix whenever.

How

Entering is simple. Just pin something at http://www.pinterest.com/liicornell/public-notices-with-citations/ . We’ve put a few examples there that will show you what we have in mind. As for where you get the photos, you can either take them yourself, or scour them from the likes of Flickr, Google Images, or Pinterest itself. Because you have all those mad search skilz, yo.

[ OH NOES: We have to add people as contributors before they can pin to the board. What a pain in the butt (see infra, “sneering at Pinterest”). That said, mail me and I’ll add you. Because I am committed to the success of this effort, dammit. ]

Prizes

Can I win a $100 gift card for Barnes and Noble?

No

Can I win a dream vacation to Hawaii?

Yes, if you also agree to visit our new condominium development, “Underwater Dreams”.

Can I win a free law librarian?

Yes, if we can find a volunteer to act as the “prize”. What you do with them is entirely between you, and them. There are sections of Title 18 that deal with such things. Which reminds me: no pornographic selfies with criminal-code citations superimposed, you scamps, you.

C’mon, it’s a contest and I went to law school. I have to win something.

Bragging rights, mostly, but if you want, our Esteemed Director will Skype with you and tell knock-knock jokes for a while.

Knock it off. What are the prizes, really?

You can place first, second or third in any of the following categories:

Extra credit for having location information either in the photo itself or in the description.

Extra credit for CFR and US Code citations — not as interested in municipal regs

What are you guys really up to?

Three things:

a project so secret that if we told you about it we’d have to kill you. That’s because a) we want it to be a surprise and b) it is outrageous enough that it may fail in some highly embarrassing manner.

such a collection of photos might have something interesting to say about where the law shows up and how — useful for making a variety of points about policy, and also something somebody might actually want to study.

we want to know if there is any compelling reason to stop sneering at Pinterest.

[ UH-OHS: As many have pointed out, you can’t pin to our board. Hell, I thought they were group-sharable, and they are, but I have to add all youse guise as contributors. That’s OK – just mail me and I shall do so. ]

linking agency guidance and interpretations to the regs. Want to know how the agency thinks about whether or not you’re affected?

All of that stuff would be vastly easier to build if the issuing agencies would do a few simple things. As always with agencies, the “simple” is in scare quotes, because, well, very little is actually simple, but bear with me. I know you have no budget for this. I know you have no mandate for this. I know you have no time for this. But I am pretty sure that if you do it your compliance and enforcement costs will go down and the quality of the commentary that you get in notice-and-comment periods will go up. And it ain’t that expensive, and what the hell, let’s go wild here, public understanding is really part of your mission.

Here’s another thing: the perception of excessive regulatory burden is most often about costs, but for political purposes it is just as often expressed in terms of the difficulty of finding and understanding obligations. Two minas, a shekel, and two parts, if you want to talk about where the costs really are. Ask the folks at the Obamacare site.

Everything that follows is based on a single idea: the stuff you write and put on your websites is now reaching your regulated communities almost exclusively through an intervening technology layer that is only possible if your information is consumed and arranged by machines. Google is a sophisticated example of such a layer — one that functions pretty well without much work on your part. But you and I both know that for some purposes it is a blunt instrument. We can build better stuff, more helpful and more aware of context and the substance of what you do, if you help us. Machines don’t do language all that well, and while they’re getting better at it, they need your help. Your audience needs your help even more.

Here are some concrete suggestions.

Make machine-consumable site maps and clearly identify guidance documents

Most agencies have site maps of some kind. They help human readers discover what’s on the site and navigate to it. There’s a standard for doing the very same thing in ways that help machines find stuff and index it. My suggestion: make two of these — one for the whole site, and one that identifies guidance documents specifically. And that means the documents, not some program description that leads to a narrative essay that leads to an index page on which a PDF link has been concealed. That means a map of the documents themselves. Some folks already do a good, concise job of this in various kinds of listing pages meant for humans but easily scraped by machines — IRS is good at it, for example. Many don’t — common problems include hiding the documents behind “searchable” databases, as Commerce does, scattering them through a welter of program descriptions and disconnected stuff, as EPA does, and so on.

All of those sins can be more than atoned for by providing separate, organized maps using the standard described here. Make two: one describing the whole site, called sitemap.xml. Make another called guidancemap.xml and place both in the root document directory for your whole site. The whole site. The one identified with the agency, the one with the agency acronym in the domain name, not some subsite associated with the Office for the Regulation of Left-handed, Red-headed Inheritors of Somebody Else’s Problem.

PS: APIs are not necessarily the answer to harvesting problems like this, but they help. Federal Register 2.0 is a pretty good example.

Use titling to identify guidance and put in links to help find it

A significant amount of interpretive information appears in the Federal Register, either on its own hook or as preambles to final rules or as notices of the availability of various kinds of publications, including print. Final rules are easy enough to find, and separating out the preambles for indexing should not be that hard (that’s next on our list of things to try here). Other interpretive information is titled using the word “Guidance” somewhere. Maybe all of it is, but we can’t be sure.

One thing we do know is that offline interpretive materials — pamphlets, for example — are often mentioned in a “notice of availability” or some such. These need to be clearly titled as interpretive material. And if — as is often the case — the material is intended for print distribution, but a digital version is available somewhere on the web, please put a URL for the digital version in the notice. Please, please, please.

Put your goddamned ALJ opinions up

Res ipsa, homies. The following fun-size reenactment illustrates the experience so far:

Gummint official: What would the legal community like to see in our open data collection?

Use citation instead of insider dialects and references to the Act

A man from Mars reading guidance documents would wonder if we actually have codifications — of either statutes or regs. References like “section 101 of the Act” or “Regulation M” are not easy for outsiders to follow. We have citations for such things, expressed as references to the US Code or to the Code of Federal Regulations. Where possible, use them. If it’s too hard to scatter them through the text where they appear, add a header to the document called “CFR Parts Interpreted” or some such. FR provides this as “parts affected” information. I know that some of you are using references to the popular name of the legislation to send a your-elected-officials-did-this-to-you-not-us message in your guidance. But really, somebody who is trying to find out what your regulations require her to do is not going to be helped much by repeated references to her “obligations under RCRA”, in toto — but approaches like this one help a lot.

Incidentally, the same is true of other kinds of information, such as enforcement data. For example, EPA categorizes all of its information about enforcement actions using the name of the program under which enforcement is taking place. It does not specifically say which rule(s) are being enforced. That information would be nice to have, in the form of citation to a CFR Part. If that practice risks a short shelf life, or is otherwise too inflexible to be used, providing a cross reference between program names and the chunks of CFR for which the program is responsible would be a good compromise.

I see a hand at the back of the room with an objection having to do with the accuracy or (snif!) appropriateness of codifications. Hell, if you want, cite to Stat. L. or the Federal Register, or whatever uncodified version you think is authoritative. But cite to something that is both specific and machine-retrievable with a reference that can be understood outside your immediate community of practice.

A final word

It’s not my intention to cause heartburn here, and if the witticisms are a little over the top, it’s because they’re intended to make the substance memorable but not bruising. I know all too well that anything that can be negatively interpreted in any way by the most bitter enemy of a federal agency will be seen by that agency as causing more harm than good. That’s simply incorrect, and in any case it’s hard to make suggestions about how people can do their jobs better without opening them to charges that they’re not doing their jobs. But we’ve reached the point where a certain amount of boldness is called for. It’s not a bad time for Federal agencies to show some forward-looking mastery of technology. Anybody who’s read a newspaper or a blog in the last three weeks knows what the alternative looks like.

After yesterday’s #calicon13 session on open access operations, I decided that it might be nice to throw together a quick resource guide for folks thinking about going into the business. This is pathetically inadequate, but a start. Please feel free to suggest other sources in the comments, and in any case I’ll be adding more stuff over the next day or two.

A few weeks ago I made a characteristically intemperate remark via Twitter, which drew this response from a friend of mine:

I was struck by your tweet during the Legislative Data Transparency Conference from a couple of weeks ago: “If the gods reached down and banished all PDFs from the face of the earth, I’d be fine with that. #ldtc”

Maybe I don’t understand your tweet. If you believe legislative docs shouldn’t be rendered in PDF, how should they be rendered? For legislation, PDF serves a critical function – it is a publicly available rendition of the legal paper document. The concerns I have with the idea of sole use of non-PDF formats (i.e., XML) are: (1) authenticity, (2) missing page and line numbering for navigation and amendment language, (3) access time to load XML+XSL over the web for large files, and (4) dropped or added text due to XSLT errors and updates. I’d prefer that you ask the gods to include the source documents in all government-created PDF files – regardless of source file format.

What’s your beef with PDF?

This came from a guy who has worked with legislative data much longer than I have, knows Congressional documents cold, and generally has an excellent idea of what he is doing. I have a lot of respect for his judgment. I have three practical responses, one impractical one, and a conclusion that actually agrees with his, though somewhat reluctantly:

1) The idea that authenticity is tied to a particular rendering format is simply wrong, though the use of PDF for this purpose is comforting for those who crave a format that is recognizably print-like. That’s been talked about in many places, notably in recommendations to the House of Representatives Bulk Data Task Force a couple of years back.

2) Missing page and line numbering is indeed a problem during a limited, strongly-bounded portion of the legislative process. I believe it has been solved elsewhere using XML (at least if my notes from the UN ICT and Parliaments meeting on open document standards, held at the House of Representatives in 2012, are correct, though they may not be). It is certainly possible to represent page and line numbers in XML, though it is awkward, difficult, and maybe impossible to round-trip XML to and from PDF if the PDF is altered.

But wait, why? Get rid of page and line numbering, because the new media don’t use pages. Judicial opinions are moving, however slowly, to paragraph numbering for purposes of citation, although those don’t supply the granularity needed for amendment — unless you make all of your amending processes use the paragraph as their minimum chunk. Better still would be to use a point-in-time drafting and publication system for the full lifecycle of the legislative drafting, passage, codification, and publication process. The Australian state of Tasmania has been doing this since 1997. Of course, this is the impractical response I mentioned. Congress is unlikely to change anytime soon (and I have been one of the loudest in saying that application developers are being unreasonable when they expect it to).

3) Access-time issues are a red herring. The data may bulk larger than PDF (I would want to see that tested over a wide range of samples before I’d buy it completely), but for viewing purposes you’d almost certainly pre-render it as HTML.

4) Problems with dropped or added text owed specifically to XSLT transformations are either a red herring or a truly frightening commentary. Anything wrought by human hands, especially when the power of the human hands in question is augmented by a computer, can be done badly. If we banned all technologies that have the power to alter or drop text from electronic publishing systems, we would have no publishing systems. If the argument is that XSLT is more likely to produce bad results than other technologies, I sort of agree, while still finding the risks acceptable. XSLT feels closer to a declarative programming language like Prolog (remember Prolog?) than anything else, it’s hard for procedural programmers to master, and yes, it can make a big mess. But there are lots of people who are quite comfortable using it in all sorts of bet-the-farm business operations, and there’s no reason to think it would do any worse in government.

But that’s not really what’s griping me, or him, come to think of it. He asks what my beef is with PDF. Here it is:

This is most certainly not a Congressional document, but it is a perfect example of what is wrong with PDF when it is put in risk-averse hands. It plainly got its start in life as a spreadsheet. And someone thought that publishing it as a spreadsheet, in a way that allowed parsing and repurposing of the data, would be somehow dangerous, so hey presto, PDF. More likely, they didn’t give it a moment’s thought; PDF is just how you do things. In this particular case, it’s all the more painful because the responsible party actually put data in the spreadsheet that would allow you to very usefully link its contents to the relevant parts of the CFR. So close, and yet so far.

So I guess I have two beefs with PDF, not just one:

a) It can’t be easily parsed or otherwise processed by computers. It locks up text in a way that prevents anyone but a human reader from doing anything with it (yes, I am aware that authenticity-scolds see this as a feature). That is not news.

b) It presents an almost irresistible temptation for the risk-averse, who see it as a safe, comforting format that is beautifully like their beloved print products and, best of all, prevents recompilation of data. The ability to repurpose data implies the ability to reconsider it in ways that might lead someone to question the conclusions drawn from it, or otherwise do something you might not like, and that is a very scary possibility.

My correspondent is absolutely right that the solution to a) is to publish in parallel formats — PDF for human readers, original manipulable electronic format for those who need to process, mark up, or otherwise work with the text or data in fluid ways. I wholeheartedly agree, and hereby amend my request to the gods. Fortunately, the gods do not use PDF for their Official Request Forms, so I am able to do so in place with no need to reissue it.

But I wonder how likely we are to see a solution to a) given that we have no solution to b), and apparently no hope of one. I would like to believe that a few solid projects and demonstrations would bring people to the point where they would make proper use of PDF, because they had somehow seen enough functionality in non-PDF environments to change their minds about the risks.

I really have no idea whether that is possible without banning PDF altogether. It is an attractive format for those who find print comforting, and for those who worry that somehow allowing any use of data will result in some form of misuse. It is, in fact, too attractive. We have nearly 20 years worth of demonstrations that publishing data in formats that allow repurposing is a really, really good idea that saves money and promotes innovation, and there has been no response. That leaves me unsure whether we can get everyone on board with that idea without making it impossible for them to do anything else, because we have seen far too little of the change that would validate a just-build-it-and-they-will come strategy.

So, sure, I agree. Dual-format release is the way to go, I guess, at least in the areas where PDF can justify its existence. But I would rather that the burden be placed on PDF to prove its worth, rather than on open data formats. Peter Drucker once famously recommended that a large company begin cleaning up its management practices by banning all reports and only allowing individual reports to return if the author could provide compelling written justification for them. That might not be a bad way to go.

These days, people are sticking legislation into GitHub at a furious pace. It is all the rage among the legal-information smart set. The whole thing seems to have started about fifteen months ago with a quote from a toiler in the vineyards of the New York State Senate, written up in Wired, the Boy’s Life of the technorati. Said he: “I’m just in love with the idea of a constituent being able to send their state senator a pull request”.

I could speculate hilariously as to what some of our scandal-ridden New York State Senators might think a “pull request” is, but that’s neither here nor there. A lot more people appear to be just in love with this idea, too, because in the last year there has been a rising tide of legislative gittification here and elsewhere around the globe. I myself am just in love with the name of the German “BundesGit” project, which the smart money is putting at 5 to 1 to win the Greatest Cognitive Dissonance Packaged in a Compound Word category at this year’s Noamy Awards.

Trouble is, I’m not so just in love with gittification as everybody else seems to be. Here’s why:

Git and GitHub are, collectively, a fine revision-control system, and a good system for distributing and managing open-source coding efforts like the ones at https://github.com/unitedstates. Unfortunately, straightforward revision and versioning are not really what happens with most legislation hereabouts. American Federal legislation is not a straightforward revision process at all. That is especially so when post-hoc codification results in an issue-centric bill being splattered all over the topical map of the US Code. Other jurisdictions — notably civil-law countries — at least pretend to have a more rational process for legislative revision, though I am told that in practice it is not so pretty as all that. They are, by and large, having some success with FRBR-based models which closely resemble revisions control, but for a number of reasons those don’t work as well as they might for Federal legislation. Simple processes in which a single version of text is successively modified and the modifications absorbed into a series of versions and branches are not quite enough to map the eddies and backwaters of our process, in which multiple competing drafts of a bill can exist at the same time, bills can be reintroduced in later sessions, and so on.

I am far from the first person to make this point. Others have done so very effectively right along, but the story does not end there. The beauties of revision management do not explain why we are hearing so much git-love. There must be more to be just in love with than the idea that you might keep track of changes in the language of a bill.

I think there are three pieces to it, really. One is the idea that somehow the gittification paradigm describes what the system *ought* to be, and represents the aspirations of its proponents; one is the idea that putting law in github somehow magically puts ownership of the law where it belongs; and one is the idea that gittification is somehow democratizing.

As to the first, everybody would like a simpler system. Belief that putting the text into a particular instrument could or would bring that about is a species of wistful, wishful thinking that is the unique province of technicians. Technical people of all stripes believe that about a lot of things — the idea that somehow just having the right tool changes both the materials and the quality of the workmanship are an understandable and appealing part of the romance of geekdom. And sometimes a change in tooling brings about an unmistakable and positive change in way things really are. I am thinking, for some reason, of the invention of interchangeable parts, which obviously brought about vast changes in manufacturing and ultimately in everyone’s standard of living — and also gave rise to a crop of industrial utopias founded in the belief that virtue would flow from industrial organization: the Ephrata Cloisters and New Harmonies of more a century-and-half back.

I’m OK with certain amount of techno-utopianism; a lot of good ideas got their start in those cloisters. But the romance of gittification is part of a more expansive intellectual conceit — the idea of law as code. Lessig brilliantly described idea of code (and technology generally) shaping behavior and potential in ways traditionally reserved for law. But code-is-law is not reversible into law-is-code. There are lots of reasons why not; some are facets of the process by which law is created, and some have to do with how the language works and what it is expected to do. Law is nowhere near as deterministic or precise as code and the process that creates it is a lot messier. There is often carefully calculated imprecision in statutes and regulations. Geeks don’t like that, because they want law to be more computationally tractable, and they often say so loudly in the same forums where legislative gittification gets a big round of high-+1s. Imprecision has a very valuable purpose in law, where flexibility of interpretation is often desirable, and not so much in code. If you want to know what “law is code” looks like, consider the rigidly precise algorithms of the Federal sentencing guidelines, or the “three strikes” law, and tell me if it looks like Utopia to you.

But I’m just being cranky, sorta. There’s nothing wrong with romance, even if it is unlikely to produce meaningful change, so long as we avoid confusing it with having actually caught the unicorn. There is also a lot that is admirable about a community wanting ownership of the law that it is expected to live by. The slogan of SwaziLII — an open-access legal publisher in Swaziland — is “kwetfu” . It means “it is ours”. In South Africa, visitors to Johannesburg are shown, with great pride, the public monument to the 1994 Constitution. It’s important that people feel ownership of the law. Postcolonial societies feel it strongly, and they celebrate it and they build monuments to it. Geeks stick things in github as a way of claiming it for their culture. In that respect gittification is a symbolic act that says a lot about where we are in 2013 and why. I respect the symbolism, and think it’s a shame that we have been collectively driven to such a need to reassert ownership. I just don’t want to confuse the symbolism with something that improves the substance.

But what about democratization? We may be just in love with the citizenry submitting pull requests, but that doesn’t mean the citizens have any idea how. And it is a little disturbing to think that we might be subconsciously restricting our definition of citizenship to those who can submit pull requests. I say that mostly for effect — I don’t think that anyone is being consciously elitist here. But I do think that many of the same people who celebrate gittification also routinely (and loudly) condemn government behavior that is unwittingly exclusionary in exactly the same way that GitHub is. It’s natural for technicians of all stripes — whether they are legislative, political, and policy wonks, or people who frequent hackathons, or all of the above — to become so acclimatized to their own technical knowledge and environment that they forget that others just don’t have a clue about any of that stuff, and are effectively shut out. The biggest problem with legal information has always been that the people who create it have no reason to realize that there is a problem with access, because they themselves have it. And in that respect the Gitterati are no different.

Once upon a time, regulations.gov was useless if you didn’t know a lot about what agency regulates what. Today, you can’t be a citizen of New GitHarmony if you don’t know how to turn the knobs of GitHub. That’s not an argument against making systems that permit citizen participation in the legislative process. I’ve never liked the sort of don’t-bring-gum-to-school-unless-you-bring-enough-for-everyone, digital-divide arguments that some use to bludgeon Internet projects. You have to start somewhere, and maybe it’s worth remembering that we haven’t shut down the libraries because the basic literacy rate in the US is under 100%. But don’t imagine for one moment that an average constituent is going to submit a pull request. And think about who you’re really speaking to. As of January, 2013, the population of the United States was estimated at 315,968,000. GitHub claims 3 million users, not all of whom are in the US. Sounds an awful lot like the 1 percent to me.