Last year, a group of us who work daily with open government data — Josh Tauberer of GovTrack.us, Derek Willis at The New York Times, and myself — decided to stop each building the same basic tools over and over, and start building a foundation we could share.

We set up a small home at github.com/unitedstates, and kicked it off with a couple of projects to gather data on the people and work of Congress. Using a mix of automation and curation, they gather basic information from all over the government — THOMAS.gov, the House and Senate, the Congressional Bioguide, GPO’s FDSys, and others — that everyone needs to report, analyze, or build nearly anything to do with Congress.

This is an unusual, and occasionally chaotic, model for an open data project. the /unitedstates project is a neutral space; GitHub’s permissions system allows many of us to share the keys, so no one person or institution controls it. What this means is that while we all benefit from each other’s work, no one is dependent or “downstream” from anyone else. It’s a shared commons in the public domain.

* We collaborate in public. When we have questions or ideas, we bring them up and talk them out using GitHub’s issue tracker. Questions get answers very quickly, unexpected participants hop in, and (as with other Q&A systems like Stack Overflow and Quora) discussions theselves become valuable long-term artifacts. GitHub is extremely well designed for this.

* Our congressional tools can be used in a standalone, language-agnostic way, with no required configuration. You just need a command line, and data gets placed on disk in bulk. Nothing depends on a database.

* We started using our new data in a live product right away. Instead of waiting for something that felt “1.0”, Sunlight and GovTrack replaced their pre-existing collection infrastructure with our new tools as soon as they were functional. Because of this, we were forced to promptly fix bugs and fill gaps, and create a stable platform to iterate on. This guarantees momentum.

* No brand names. Our organization’s name, “unitedstates”, is harder to describe to someone in an elevator, but it makes it clearer to volunteers that they’re contributing to the public domain and the common good. Repository names project authority by being clear and descriptive, rather than catchy.

These projects don’t do anything fundamentally new. People have solved these problems before. But usually, developers will just write these sorts of things quickly to get them out of the way, and leave them tightly integrated into some larger system. Even when this is made open source, it’s tough to reuse code written this way. Newcomers find the learning curve intimidating, and the author rarely feels like re-engineering working code.

Instead, when we notice common problems — even small ones — we’re solving them as independent projects that are easy to share. This is basically all upside; anyone can build and brand anything they want on top of these tools, and benefit from the fixes and improvements of others. It’s a healthy arrangement, and the kind we should see more of in the open government community.