The Politics of Open Sourcing Governance

Carl Malamud is a public domain advocate heading public.resource.org. His approach is the publication of public domain information from local, state, and federal government agencies. Over the years the publication of governmental data has become a surprisingly lucrative business for niche publishers. The gain of open access to the general public is distributed, while the loss of a revenue stream to the individual publishers is very clearly felt, so they have invested a lot to prevent the government from opening up.

Dear Mr. Kundra and Mr. Chopra:
I am writing to request your assistance in making available at no charge and for bulk
access two of the most important legal databases maintained by the executive branch:

Publications in the Federal Register System, maintained by the National Archives
and Records Administration (NARA).

Patents, maintained by the U.S. Patent and Trademark Office (USPTO).
While the U.S. government maintains a minimal web presence for both databases,
those web sites are only useful for casual browsing. In both cases, the underlying
source code for the documents is only available for substantial fees.
A yearly subscription to the Code of Federal Regulations for bulk access to the â€œSGMLâ€
source code with images is $17,000/year. The same $17,000 fee applies to other
NARA databases such as the Federal Register. While there are PDF versions of the
Federal Register and text versions of the Code of Federal Regulations available for
browsing, it is impossible to easily download them in bulk, and the underlying source
code which could be used for creating new versions of these documents is prohibitively
priced.
Likewise, the U.S. Patent and Trademark Office makes a web site available for casual
searching and browsing, but the only bulk access to patent data is limited to the first
page of a patent. To get the full text of current and historical patents requires a very
substantial fee. For example, the Patent Grant Data/XML v. 4.2 ICE (Current Calendar
Year Subscription) (EIP-5300P-OL) costs a breathtaking $39,000.
These fees are so substantial that they actively discourage the use of these key U.S.
government databases by public interest groups and scholars, limiting access to a few
well-heeled corporations. In particular, at Public.Resource.Org, we would make much
more extensive use of these databases if we could afford access, helping fulfill our
mission of making Americaâ€™s primary legal materials available to the public.

Our desire to work with this data is shared by many other groups, including our
colleagues at the Sunlight Foundation, Columbia University, Cornell University,
University of Colorado, Harvard University, Northwestern University, Stanford
University, and GovTrack.US. All of these academic and nonprofit groups have notable
track records for providing innovative uses of government data, and the lack of bulk
access to these databases has greatly discouraged development of new applications.
Patents and â€œthe lawâ€ have a very special place in our system of government, being the
only two executive branch databases specifically called out in the U.S. Constitution:
The very purpose of the patent database is to â€œPromote the Progress of Science
and useful Arts.â€ The very essence of a patent is publication, and deliberately
restricting access goes against the explicit language of the Constitution. While
we are sympathetic with the desire of the U.S. Patent Office to derive revenue
from the sale of these bulk feeds, such a policy runs directly contrary to their
primary mission. Filing fees imposed on those that seek economic gain from
the public through the issuance of a patent are more than sufficient to make up
any revenue shortfall created by making bulk data available at no cost.
Likewise, the purpose of the Federal Register system is to provide a systematic
vehicle for notification and publication of regulations that are enacted by the
government. Restricting access to this data by putting it behind a series of
$17,000 pay walls yields less than $200,000 in annual revenue to the
government, yet is costly enough that only a few well-heeled corporations have
access. The public interest simply canâ€™t afford to play.
Initiatives such as Data.Gov have been very successful and you are both to be
applauded for the dramatic change in philosophy in the U.S. Government when it
comes to release and dissemination of information. However, it is my worry, a worry
shared with my colleagues listed above, that any progress on releasing the USPTO and
NARA databases in bulk will become entangled in bureaucratic delay, and I am writing
to urge that you make these crucial documents of our democracy available sooner
rather than later.
Respectfully yours,

Carl Malamud

The exciting idea of “bulk access” is to make government data available not in edited form, but in machine readable (xml) formats, so that the user can decide what to do with it. In the words of Ed Felten et al:

Rather than struggling, as it currently does, to design sites that meet each end-user need, we argue that the executive branch should focus on creating a simple, reliable and publicly accessible infrastructure that exposes the underlying data. Private actors, either nonprofit or commercial, are better suited to deliver government information to citizens and can constantly create and reshape the tools individuals use to find and leverage public data. The best way to ensure that the government allows private parties to compete on equal terms in the provision of government data is to require that federal websites themselves use the same open systems for accessing the underlying data as they make available to the public at large.

Is this salient for your work? How is it different in Europe, the Americas, Africa, or Asia? What is your experience with fully opening up databases to the public?

About Philipp

Philipp Müller works in the IT industry and is academic dean of the SMBS. Author of "Machiavelli.net". Proud father of three amazing children. The views expressed in this blog are his own.

Comments (3)

Bulk access, or perhaps “open access at the XML layer” is a great idea in general, since it would allow many potential audiences and their respective technology enablers to have their own viewpoint on the data. Carl is correct (as usual) in his perception that there is a move away from the concept of web sites as the focus for interaction, and towards “feeds” of data, content, narrative, etc. which are aggregated, parsed, and presented in different ways for different audiences and purposes.

Experience with the concept of sharing knowledge openly, especially building up a knowledge hub, I have been fostering since 2005 via the group LeanThinking, http://xing.com/net/lean. Data sharing is just part of the equation, it always needs the people behind it. ONLY the human creativity will make effective use of the data :-)