Felix Salmon, finance blogger extraordinare, was inspired by some reporting by Bloomberg to have a look at Treasury’s website. Apparently Tim Geithner visited Jon Stewart back in April, and Felix was understandably interested in seeing the evidence for himself. He went to the Treasury website, and then… well, things took a turn for the worse:

First, you go to the Treasury homepage. Then you ignore all of the links and navigation, and go straight down to the footer at the very bottom of the page, where there’s a link saying FOIA. Click on that, and then on the link saying Electronic Reading Room. Once you’re there, you want Other Records. Where, finally, you can see Secretary Geithner’s Calendar April – August 2010.

Be careful clicking on that last link, because it’s a 31.5 MB file, comprising Geithner’s scanned diary. Search for “Stewart” and you won’t find anything, because what we’re looking at is just a picture of his name as it’s printed out on a piece of paper.

In other words, these diaries, posted for transparency, are about as opaque as it can get. Finding the file is very hard, and then once you’ve found it, it’s even harder to, say, count up the number of phone calls between Geithner and Rahm Emanuel. You can’t just search for Rahm’s name; you have to go through each of the 52 pages yourself, counting every appearance manually.

Is this really how Obama’s web-savvy administration wants to behave? The Treasury website is still functionally identical to the dreadful one we had under Bush, and we’ve passed the midterm elections already. I realize that Treasury’s had a lot on its plate these past two years, but much more transparent and usable website is long overdue.

This all sounds sadly familiar to me. I still remember when Treasury started posting TARP disbursement reports as CSVs instead of PDFs. I was working on Subsidyscope at the time, and had to load those reports on a weekly basis. It’s more than a little sad how much better my life got when they made that change.

But I think it’s important to note that Felix’s frustration isn’t just the product of bad technology. Sure, the rat’s nest of links on treasury.gov could use improvement (I’m sure Ali could sort them out in no time). It would be nice if this PDF file was composed a little more thoughtfully (30M seems a bit excessive). The file should probably be linked with more descriptive information, so that search engines can find it. It should ideally be cross-linked from more centralized locations — I’m not sure that data.gov is a good fit for officials’ schedule information, but surely the White House’s website could benefit from providing details about what the administration’s highest-profile members are up to.

And of course it would be really nice if the PDF wasn’t a PDF at all — if Geithner’s schedule was released in a structured format that allowed it to be examined, remixed and reused by other tools. It’s clear that the schedule was generated by a program (it looks like Lotus Notes) and presumably that program could have been used to export the data in a useful format. But there’s a reason that it wasn’t.

You might remember me complaining about the format of the data we had to use for Elena’s Inbox. After working with the source documents, it looked to me like digital records had been printed out, then rescanned and OCRed. A few months ago I had the opportunity to testify at the Archives about automated approaches to document declassification. After my presentation, some folks came up to me and told me that I was exactly right about the Kagan emails: they’d been the ones who oversaw that process, which did involve printing digital files and then scanning the printouts. They had to, they said: the documents needed to be reviewed and redacted.

There are better reasons for this approach than you might think. Software to facilitate organized redaction projects isn’t all that common; and of course the government has been burned manytimes by people who used digital redaction incorrectly (for example, by drawing a black box over sensitive text in a PDF: the text can sometimes still be copied & pasted out of the document). For a process that’s as laborious as document review and redaction, there aren’t obvious downsides to paper. And paper provides peace of mind about redactions “sticking”.

But of course there are downsides — Felix listed some of them; I’m sure most of this blog’s readers could name a bunch more. And frequently, security isn’t genuinely necessary.

But one thing I am sure of is that the government classifies way too much stuff. Whatever you think about the organization or the wisdom of what they do, Wikileaks’ recent document dumps have provided a renewedopportunity to consider the problem of overclassification. Yes, some information really is too sensitive for public release. But most of the time government has far too little incentive to choose openness when it’s considering whether or not to keep something secret. Choose secrecy, and suddenly your problem becomes the problem of some poor declassifier who may not even have been born yet, and/or the people who actually want to use the data.

The specific example of Tim Geithner’s schedule is a reminder that we have to tackle these problems in a number of ways. We should keep talking to government about limiting its use of PDF to places where it’s appropriate. We should work to improve digital redaction tools so that well-formed data can stay that way (I haven’t been able to find any open source projects that tackle this problem). And we should engage with the communities familiar with the declassification debate, because that’s where the root of this problem comes from. It’s a big challenge, but it’s worth it to get our government properly wired: just consider the peace of mind that would come with setting up a reliable Google alert for “government microchip implantation.”

(Well, okay, maybe not that. But being able to automatically parse officials’ schedules for the names of CEOs, lobbyists and celebrities really would be useful.)