In a committee report on the legislative branch appropriations bill H.R. 5882, the subcommittee responded to requests from GovTrack, the Sunlight Foundation, Washington Watch, and other watchdog groups about particular technological measures Congress can take to improve legislative transparency. I wrote about our request most recently in March, though I have been asking for the improvements for more than 10 years. We asked for “bulk data,” which means comprehensive records in a format that is machine-processable.

The committee’s response to our request starting on the bottom of page 17 is steeped in technical language, making it hard to quote here. Here are the important parts:

The Committee has heard requests for the increased dissemination of congressional information via bulk data download from non-governmental groups supporting openness and transparency in the legislative process. While sharing these goals, the Committee is also concerned that Congress maintains the ability to ensure that its legislative data files remain intact . . . once they are removed from the Government’s domain to private sites.

. . . [How would we pay for] Congress to confirm or invalidate third party analyses of legislative data based on bulk downloads in XML?

What they’re saying is that they fear that if the American public is given more detailed and precise records about Congress that we’ll distort it and, well, hurt ourselves I guess. And then, according to the committee, Congress will have to go around correcting us. How insulting!

Especially since for the last eight years I’ve been making this sort of information available on GovTrack, and last I checked that was a good thing. Even Congress’s staff uses GovTrack: Crenshaw’s own staff has probably used GovTrack for their research.

A world without bulk legislative data is a world where everyday citizens couldn’t ask simple questions such as how often is a bill enacted, where in the legislative process is a bill, is my representative moderate or extreme, and what bills are coming up next week. Without bulk data there are no tools for legislative tracking, like email updates, discussion forums, and write-your-rep websites. And in the world Crenshaw envisions there is no need for journalists, since the only information the public needs can be found in whatever “intact” files Congress thinks is sufficient.

The report focuses on “digital signatures on XML documents,” a way to be able to distinguish official government documents from fakes. While that’s important, it’s not relevant to the question of bulk data. First, the report claims there is no such thing as a digital signature for an XML document, but that’s simply false. In fact, Congress has already been using digital signatures for XML documents for years. (Thanks to Eric Mill for pointing that out.)

Second, and most importantly, no one actually cares. The millions of individuals who use GovTrack to find the status of a bill are not looking for an official government document. They want an explanation of what is going on with the bill, and that’s not provided by the government. But it’s something we can create more easily with bulk data.

Third, Crenshaw’s colleagues in the House Republican leadership have been putting out all sorts of new bulk data in the last year without digital signatures, and it hasn’t been a problem. A few weeks ago I blogged that the House did good work in making the week ahead’s schedule more available as bulk XML data. I try to look on the positive side of these things. In that blog post I congratulated the House on its achievements with XML. But I can’t find a positive way to look at this committee report.

Daniel Schuman at Sunlight Foundation has been covering the recent developments as well. Check out his blog post for more background on what’s going on here.

What I’m asking Congress and the Library of Congress to do is to share their internal database of legislative information that powers their official THOMAS website. That database would make GovTrack and dozens of other websites more accurate. Over the last six months, GovTrack and its data partners have been used by millions of individuals — again, including Congress’s own staff. More precise data would go immediately toward helping millions of individuals.

“Bulk data” is today considered a core component of any government information dissemination program. In 2009, the Government Printing Office began offering bulk data for bill text and other publications. Executive branch agencies are all now under a directive to embrace data. This is a no-brainer.

I’m sympathetic to other reasons to put off bulk data, such as cost (actually it’s cheap) or there being other priorities for Congress to address. I think bulk data is important, but I can understand if not everyone thinks it’s so important to do right now. Although cost was mentioned in the report, the main gist was the techno-nonsense about digital signatures.

Crenshaw has some explaining to do if he doesn’t think Americans can handle the data. I reached out to his office for a comment but did not receive a reply.

I think Crenshaw can live up to his name. What happens when you step on a melon?
Or maybe he would like a ride to my old neighborhood in LA – on Crenshaw Ave. – A little tour of my old digs…
Too feeble to comprehend the data? We’ll show him feeble…It will be the last thing he sees….

I trust govtrack but I can envision misuse and misinterpretation of “bulk data” .. And once something is published, its hard to correct. why would bulk data access for all translate to better public information? You dont explain how the data are filtered between bulk and THOMAS. what judgments are being made, and who should be trusted to process the data? If that is the issue being debated, I dont think it’s productive to call for a letter campaign.

Gary: That’s exactly the argument Crenshaw is making. But here’s why it’s mistaken: Everything you like about GovTrack and similar sites, including for instance the nice way the New York Times displays congressional votes — that’s all based on having bulk XML data.

So there is bulk XML data already, it’s mostly just not produced by the government. We (at GovTrack, the NYTimes, etc.) have to bend over backwards to assemble all of the information. The bulk data I assemble powers dozens of other websites that help the public understand what’s going on in Congress. And, in fact, the House produces some important bulk XML too.

If there were going to be any negative consequences to bulk XML data, we’d already see it. Instead, having bulk XML data would help us make our information more precise, reliable, and timely, and that translates into better tools for our users.

So yes, of course there can be misinterpretations, but there will be a lot *fewer* when the basis of our interpretations — the data — is more precise.

While I agree that consistent data formatting would be a good administrative objective, I don’t agree that it requires any new appropriations to accomplish.
The Committee complaint about data integrity can be resolved by requiring a simple notice of the link to original source material.
I think it’s bad policy to demand taxpayer dollars to make life easier for third-party vendors … even GovTrack!

Westmiller: Like I said in the post, if the committee said it would cost too much I’m open to that objection even if I disagree. But that wasn’t the committee’s objection.

And like you said, their objection can be satisfied easily. Although it would be an obvious 1st Amendment problem to require people to say particular things (i.e. requiring a link) whenever we make any use of Congressional information.

In the end, it’s not Congress’s responsibility to make sure public records are only used for good (and who defines good), and neither is there any evidence that the records won’t, on balance, be used for greatness.

It is well past time for Mr. Crenshaw to go home and allow someone else, who has at least read the constitution, to take his seat. He is not in charge of us; he works for the people, and they have every right to know exactly what this man is doing and what the congress, president and courts are doing!

The simple truth is this: As long as We The People are not fully informed in a timely manner, (something bulk XML data transfer would insure,) congressmen and women can keep their actions distorted, inaccurate, and twisted…to keep us believing our elected officials are working for us. But, the majority have all gone to greed for gold and working only for themselves. Our republic and democracy has been sold out to the highest bidders. Congress passed a law declaring they can engage in INSIDER TRADING without penalty.
Our military defenses are being depleted and our safety put in jeopardy by allowing other nations to manufacture our military offensive/defensive weapons and electronics. Even military clothing and computers are made by other countries. (Walk into any military PX and check out the Made In labels.)
Our jobs are still shipped out of the country, (at our expense,) which has caused our tax base into recession, as our countrymen are shoved down to lower wages. (Lower wages result in lesser tax revenues. (Do we need more tax revenues? Yes!) Let’s also remember as the wages fall, the well-being of our fellow citizens plummets.) And it is not the responsibility of the United States to feed, clothe, teach, and care for people who ILLEGALLY enter our country. We have real citizens we could use the money to support. Our nation’s schools are in desperate need. And, we have starving children, too. (Starving US citizens is a travesty of justice.)
Majority of US citizens do not get fully informed before the laws are passed.
The point is: As long as Congress can keep the issues confused, they keep us confused and fighting among ourselves over half-truths and misleading information. Requiring bulk XML data transfer would increase our full knowledge and in a more timely manner. It is what we need.