Privatization of GPO, Defunding of FDsys, and the Future of the FDLP

On July 22, the House passed a bill that would remove funding for FDsys, reduce funding for GPO by 20%, and reduce funding for the Superintendent of Documents by 16% (Kelley). The House Report on the bill also directs the Government Accountability Office to conduct a study on “the privatization of the GPO” and the transfer of the Superintendent of Documents and the FDLP to the Library of Congress (page 25).

The bill includes many other changes that are relevant to the dissemination of government information (see House Bill Questions Future of GPO and the comments to that post, and the stories in Library Journal and OMB Watch), but the ones related to FDsys and the privatization of the GPO are the ones which, if ultimately approved, would have the greatest negative impact on long-term free public access to government information.

Passage of only some of these bad ideas would almost certainly result in a catastrophic loss of long-term access to and preservation of government information. These bad ideas are, however, only symptoms of a still bigger problem. There is, luckily, an obvious, logical path around all these threats.

Proposals not new

While these proposed cuts and changes are drastic, they are not new. Similar proposals were considered in 1982 and 2001 by NCLIS, in 1988 by the Office of Technology Assessment, and in 1993 and 1994 by bills in the House of Representatives.

In addition to these official recommendations, the information industry has long argued that the private sector, not government agencies, should disseminate government information. It has characterized almost any government information activity as unfair competition with the private sector. Industry commissioned reports and official statements in 2000 and 2004 (Wasch) suggested that governments should only distribute raw data and should refrain from making data easier to use if there is even a potential commercial market for such information.

These private sector ideas have re-emerged in the last few years as governments have made raw data more easily accessible and technological mashups of government data have become almost commonplace. Calls for government to limit its role to the delivery of raw data and for a reliance on the private sector to make the data useful have become popular. (Robinson)

Bad Policies

Whether such proposals suggest turning over government information dissemination to the private sector or commercializing the distribution of information by agencies themselves, when such proposals have been examined from the perspective of the user and from the perspective of information access in a democracy, they have been found to be severely wanting.

An examination of the literature reveals three reasons that proposals to privatize and commercialize government information make bad policy. First, by commoditizing public information, they conflict with the needs by citizens in a democracy for free access to accurate information about the activities of government. Second, they ignore that producing and disseminating public information is an essential role of government, not something that can be left to the whims of the market. Third, it has never been demonstrated that the privatization of major federal information dissemination activities is cost-effective or beneficial for important governmental functions.

It is also worth remembering that GPO was originally created because relying on private printing did not work well. Private printers often delivered jobs late and the printers themselves found that they lost money on public printing contracts (MacGilvray). Today, GPO contracts with many private publishers while maintaining overall control of the entire throughput.

Even information-industry-friendly reports on similar proposals have recognized an important, even essential, role for the government in government information dissemination. The 1982 NCLIS report, while strongly promoting a major role for the private-sector, nevertheless said that government information should be openly available without any constraints on subsequent use. It also advocated depositing documents into FDLP libraries for free accessibility. The 2001 NCLIS report similarly supported private-sector involvement but also concluded that, “…the federal government must continue to have primary responsibility for the entire life cycle of government information, including the dissemination and permanent public availability to public information resources to the American public without restrictions on its use or reuse.” (emphasis added)

Nevertheless, there are still those who promote policies that would rely on market forces to determine what public information would be available to the public and at what cost.

Strong opposition to that approach comes from non-profits, libraries, citizen advocacy groups, scientists, journalists, historians, and government agencies. These groups understand that “there is [a] need to ensure equitable, open access by the public in general to information which has been generated, collected, processed, and/or distributed with taxpayer funds.” (NCLIS p.ix). And in 1988, OTA said, some government information dissemination activities are “inherently governmental” because they “facilitate an informed citizenry [and] assist the mission agencies in carrying out their statutory responsibilities.” (p301)

Failed Attempts

Some attempts by the government to commercialize or commoditize its information have failed either economically or functionally. For example, although STAT-USA existed on a “revolving fund” without Congressional funding for many years, charging a cost recovery fee for access to economic and trade information from federal agencies, in the long run it found that its fee-for-service business model was no longer viable and it shut down its operations. (Krasowska)

If you have been around government information issues for less than fifteen years, you may not be aware that the first incarnation of GPO Access attempted a cost recovery model similar to NTIS or STAT-USA. It charged annual suscription fees ranging from hundreds to thousands of dollars for access to the Congressional Record and Federal Register (GPO Access Status Report ). This model failed and was abandoned after less than two years (Relyea), partly as a result of libraries creating gateways that made this same information available without charge.

Even when fee-based services, such as PACER (Public Access to Court Electronic Records), seem to survive financially, their functional failure to provide free information is obvious and it is arguably true that they fail to adequately meet the needs of all.

Catch 22

There are Catch-22s implicit in proposals to privatize or commercialize government information. First, the information-industry suggests that it should have exclusive rights to information products that are profitable and leave to governments those that are not profitable. This sets up governments for failure when they try to support their activities with income from demonstrably non-profitable information products.

Second, governments set themselves up for failure when they charge for access to public information. If they simultaneously attempt to honor their role of providing information to the public without charge while charging for that same information (as in the early years of GPO Access), they compete against themselves. But if they attempt to protect their ability to charge for their information “products,” they find themselves in the awkward position of attempting to control and restrict access to public information that is in the public domain. (Gellman)

Ultimately, attempts to commercialize government information therefore conflict fundamentally with the essential and inherent duty of government to make public information freely available and usable.

The current threat

Unfortunately, just because a proposal is bad policy, self-contradictory, or doomed to failure does not keep it from being implemented. The current budgetary and political situations in Washington DC create an environment where one or more of these bad ideas is more likely to pass than ever before.

Regardless of how important this issue is, it is unlikely that it will get much media attention. It is also not at all clear that proposals to keep government information free and well-preserved will garner much support politically. When essential government services such as food safety, police, defense, nutritional programs for low-income women and children, nurses, clean drinking water, and much more all face drastic cuts, will there even be room in the budget debates to consider government information? When Congress can seriously consider proposals to cut spending on programs that affect the health and safety of the country, we can hardly assume that it will necessarily provide adequate funding for information access. President Obama’s own government transparency programs have been drastically cut and Obama himself is on record as thinking the printing of the Federal Register is wasteful.

The Big Problems

As serious as the current situation is, it can at least help us see the bigger issues that surround long-term preservation and access of government information and suggest solutions. The big problem is that we lack an adequate preservation and access “ecosystem” for government information. This puts all government information at the mercy of relatively small changes in government budgets. It is somewhat ironic that, if we had addressed the underlying issues earlier and had a rich ecosystem, we would be less vulnerable to the drastic proposals on the table today.

There are lots of issues and challenges that face those who wish to preserve long-term free access to government information. We can boil down a lot of those issues to two big ones:

1. Quantity. Just the quantity of information being produced digitally provides one huge challenge. Any attempt to preserve so much information must be able to scale to sizes that, until recently, were almost unimaginable. As Nicholas Taylor at the Library of Congress wrote recently, the amount of “data stored by the Library of Congress” has become a popular, if unusual, unit of measurement for capacity of storage, network traffic speed, size of digital collections, and so forth. The “End of Term” crawl of the web pages of the George W. Bush presidential administration by the Library of Congress, the California Digital Library, the University of North Texas, the Internet Archive, and the Government Printing Office produced almost 16 terabytes of data. (See more size comparisons here.) And digital preservation and access requires duplication and replication and backups that multiply the scale of projects quickly. LOCKSS-USDOCS, for example, says that the approximately 1 terabyte of data it is currently preserving is only a fraction of the 18 terabytes of content in FDSy when all the workflow iterations, copies, and backups are taken into account.

What this means is that providing preservation and access to all government information is a very large, non-trivial task. It is not clear that any one institution or organization will ever have the capacity or resources to do everything on its own.

2. Selection. But, you may well ask, how much of all digital government information is worth preserving? It is almost certainly true that much of the born digital content being produced by the government is of only transient interest or value. It is without question true that the rules have changed in ways that make it more difficult to know what is worth saving. In the past, we knew and could fairly easily define and identify “government publications” and could identify who created and published them. “Publications” were, for the most part, packaged as “books” and “journals” and “pamphlets” and so forth. These qualities made it relatively easy to know what we wanted to preserve and how to preserve it.

But in the digital environment, we find ourselves facing a whole new set of circumstances. It is not always clear who has created digital information, whether or not it is “government” information, or whether or not we have sufficient rights to collect or preserve or provide access to any given piece of information (Peterson). A single web page may display information from many different sources. A dot-gov web page may contain information from a commercial source and government agencies may post original content on dot-com web sites. The very processes that put a wealth of government information a click or two away also make it harder for us to preserve that same information and ensure its usability far into the future.

Apart from some obvious, preeminent series (e.g., Federal Register, Congressional Record, Hearings, Reports, the censuses, and other Essential Titles), lies everything else. Who will decide what of that “everything else” is worth saving? Who will decide if we save the digital equivalent of looseleaf binders, pamphlets, posters, one-off maps, slip laws, drafts, versions, editions, memos, press releases, and so forth? We must consider multi-media formats. We have to decide whether or not the “look and feel” of website presentations of information is important to preserve and, when there are several different presentations of the same information, which we should preserve.

Selection in the world of bad budgets ultimately means de-selection and weeding. Digital objects don’t get preserved by accident. They require constant attention and preservation work. When a repository says, “We can no longer afford to preserve this and this and this,” it is often relegating those things to oblivion.

Despite these big difference between the digital world and the analog world, the big, foundational issues we face are not that different. Specifically, there are two foundational issues: First, different people have different needs. What is important to you may not be important to me and vice versa. Second, the question of selection of what to preserve is a question of who will have the decision making power and who will have the control over their own decisions (Jacobs).

Some conclusions. There are some inescapable conclusions we can draw from the combination of the issue of quantity and need for selection. First, there is a need for more than one organization to be responsible for preservation and long-term access just to deal with quantity and scale. We cannot rely on any single institution or organization to preserve everything that is of value to everyone; that is just too big a job. The Library of Congress has come to the same conclusion, which is why it has created the National Digital Stewardship Alliance (NDSA).

Second, there is a need for different organizations to be involved in preservation in order to adequately reflect the information needs of different user communities. No single institution that intends to serve “everyone” can afford (literally afford, in terms of money and other resources) to pay sufficient attention to the needs of every small, specialized constituency. Without such attention, information will fall through the cracks and be lost.

Third, each such institution must have the ability to select information for preservation and obtain sufficient control over that information such that it can perform the needed digital preservation activities that will ensure long-term preservation of and access to that information.

Digital Preservation Road Maps

Luckily, we have road maps for digital preservation that help us address the issues and challenges outlined above. The road maps are the Reference Model for an Open Archival Information System (OAIS) combined with the checklist for certifying digital repositories, the Audit And Certification Of Trustworthy Digital Repositories (TDR).

TDR is built upon OAIS. Together they provide the context for long-term preservation and access to any digital collection. They do not describe how to build a repository nor do they define technologies that must be used. Rather, OAIS describes the required functions of a digital repository and TDR provides a checklist of “metrics” for evaluating if a given repository is meeting its own goals and objectives for achieving those functions. OAIS and TDR are just as applicable to small repositories and institutions as large ones.

TDR recognizes that preservation is not just about technology. It is also about continuity over time of the archive itself. TDR describes two essential requirements in this area that are particularly relevant here: the need for long-term financial sustainability and the need for succession planning.

1. Sustainability. TDR says that, to ensure viability, a repository must have business planning processes that ensure its financial sustainability over time.

Viewing sustainability for government agencies is tricky. On the one hand, an agency can claim that it has the full faith and credit of the government, legal mandates, and (in some cases) the historical precedent of its long-term mission. On the other hand, agencies come and go, budgets are cut and reallocated, and missions change.

In fact, as noted above, the current proposals are only the most recent examples of these very issues facing GPO. GPO has always had high hopes and made big promises, but its hopes and promises are limited by what Congress sets as its mission from year to year and how Congress funds it — or denies it funding. In a single budget cycle, “permanent preservation” can change to “temporary storage” and “free” can change to “fee-based.” A single bad-budget year can force GPO to make selection decisions that result in weeding of information that some communities will still consider vital.

While a lot of what affects sustainability is outside the control of the repository, there are many things than each repository can control and many actions it can take to control those. It can also take actions that will provide the best possible context for dealing with events outside its control. TDR enumerates these. But TDR says that a repository must also prepare for the possibility that unforeseen or unavoidable events might make sustainability impossible. For such occasions, a repository needs a succession plan.

2. Succession Planning. Any organization faces the possibility of funding cuts and shortages and unforeseen problems that can result in anything from scaling back to going out of existence entirely. IBM recently made this point clear about private sector companies when it said, “Nearly all the companies our grandparents admired have disappeared. Of the top 25 industrial corporations in the United States in 1900, only two remained on that list at the start of the 1960s. And of the top 25 companies on the Fortune 500 in 1961, only six remain there today.” TDR recognizes this and says that any trusted repository must have a formal succession plan, contingency plans, or escrow arrangements in case the repository ceases to operate or the governing or funding institution substantially changes its scope. These are exactly the threats GPO faces today. I cannot think of a better demonstration of the need for succession planning.

But what does it mean to have a succession plan? It means having a plan that will ensure the long-term preservation of the content for which a repository is responsible even if the repository ceases to exist. In general terms, it means that an organization has a plan for what specific actions it will take if it learns it has to change missions or if it will cease to exist. In extreme circumstances, it means that it has a plan in place to hand over its content to one or more trusted repositories.

We already have some existing projects for government information that may serve as models for for viable, long-term, collaborative solutions to succession. These include the Department of State Foreign Affairs Network (DOSFAN) partnership between the U.S. Department of State, the University of Illinois at Chicago, and the Government Printing Office; the LOCKSS-USDOCS partnership between Stanford, Carl Malamud’s public.resource.org, GPO’s FDsys, and more than three dozen libaries; and the CyberCemetery partnership between The University of North Texas Libraries and GPO.

In order for a repository to say that its content will survive the downsizing or elimination of the repository, it needs to be able to show that its content already is in another repository or that it could hand over its content to another repository. For a hand-over to take place, there would have to be another repository technically capable of accepting such a hand-over.

This, along with the above conclusions based on quantity and selection, leads us to some solutions both for our current situation as well as for the underlying issues surrounding long-term preservation and access.

Solutions

There is a common theme to the conclusions above. First, to ensure preservation of all that needs to be preserved, we need multiple repositories serving the needs of multiple communities of interest. OAIS calls these “Designated Communities” and both OAIS and TDR make them an essential (non-optional) element of trusted repositories.

Second, in order for repositories to have realistic succession plans, we need an information preservation ecosystem consisting of many repositories capable of cooperating with each other’s succession planning.

In a nutshell: The more repositories we have, the more secure all repositories will be, collectively. The more repositories we have, the better we can ensure that content relevant to all user communities will be selected and preserved by at least one of those repositories.

Visions of the future

What might this look like in practice? There is no single prescription for success, but we can imagine effective, practical, successful scenarios. Success would be achieved if we had a mix of a variety of different kinds of libraries and archives and repositories, each working for the best interests of its own designated user community, but, collectively, providing a national, loosely-coupled “system” of preservation and access. (Does this sound like the traditional FDLP? Yes! The FDLP provides us with a working model of experience in just such a system.)

In such a system, individual libraries (small and large) and consortia of libraries (small and large) could contribute to the long-term free public access to government information — simply by meeting the needs of their own user communities.

I can imagine lots of examples of how individual libraries or groups of libraries might take actions that would benefit their own user communities as well as the information ecosystem as a whole. I am sure that you can add to this list from your own experiences with your own user communities.

A few big repositories like HathiTrust, the Internet Archive, and LOCKSS-USDOCS containing large volumes of easily identifiable and obtainable major series of government digital information.

Regional, state, and local law libraries preserving local jurisdictional legal information and linking their collections through rich metadata and APIs to each other and national collections.

Libraries with a regional focus collecting information relevant to the region from multiple agencies and jurisdictions. (e.g. water rights, immigration, trade, agriculture)

Libraries that focus on specific kinds of users with common kinds of information needs (e.g., undergraduates, K-12, practicing physicians, farmers) collecting government information from many sources to build strong, dynamic working collections.

Libraries that want to emphasize a particular kind of information or research (e.g., spatial/GIS data, astronomical data, statistical and raw numeric data from censuses, weather data, textual corpa), combining government information with information from other sources and with computational tools to provide rich research environments for researchers.

Research libraries with institutional repositories of their research output combining government information to supplement, document, and enhance those collections.

The above are just examples, not prescriptions or predictions. The concept I want to illustrate above is that, when lots of libraries and archives and repositories select and acquire digital government information and create rich digital collections for their own communities, the result will be, collectively, better preservation and more focused access than any single institution could create on its own. This rich environment would be much more secure than our current environment in which each library hopes that some other library or government agency will take care of preservation and access to materials that are essential to its own user community.

Next Steps

So what do we do next?

We need to oppose recommendations before Congress that would gut GPO or force GPO to weed or disable FDsys or commercialize government information. The current bill will not be the last; we need to be able to make a convincing, persuasive case that government has an essential, inherent role in the life cycle of government information.

We need to work with existing large digital repositories (e.g. HathiTrust, Internet Archive, LOCKSS-USDOCS, etc.) to see if they can host government information and make it freely accessible now — particularly in the event of a scaled back or discontinued FDsys.

We need to work with Depository Library Council, GODORT, ALA, GPO, our own local FDLP libraries and Regional Depositories to plan for an FDLP of the future that includes life-cycle management of digital government information. This will inevitably include, but not be limited to, digital deposit of Title 44 materials into FDLP libraries.

We need to instruct ourselves in the requirements of Trusted Digital Repositories by learning about OAIS and TDR. Where there are learning opportunities we need to take them and where there need to be new opportunities we need to make them. Those of us with influence on the curricula of library schools need to make this a requirement.

Building on our own individual knowledge we can then work at a local level within our own libraries and library consortia and library organizations to build our own digital infrastructures and digital collections that meet the requirements of OAIS and TDR. We need to make sure that the planning process is not overwhelmed by technical considerations to the exclusion of long-term sustainability and succession planning. Sustainability and succession planning need to be integrated into the planning process from the beginning, not addressed later as an afterthought. This will help us have better conversations with our colleagues and will lead to more cooperative projects and better cooperative planning.

We need to work with national and regional organizations and professional associations to plan for a future information preservation ecosystem and infrastructure. Librarians need to work with different kinds of libraries; librarians and archivists and technologists need to work together. The ecosystem doesn’t have to be a huge bureaucratic institution — indeed, it probably should not be — but it will benefit from collaborations and planning that stretch across traditional boundaries.

Ask or Act?

The future of long-term preservation of and free access to government information is in the hands of Congress today. That leaves us with the feeling that all we can do is ask Congress to do the right thing. But we can do more than ask; we can act. Indeed, we must act. We have the power to take that control out of the hands of Congress and put it into our own hands by building our own digital collections. For many libraries, that will mean a change in strategy: instead of relying on someone else to ensure long-term access to the information your Designated Community requires, you will rely on your own actions. This comes with costs, of course, but it also has big benefits. You will be providing the essential services that your community needs. And that means that you will have a built-in, inherent role that no one else has, which will make your library more sustainable for the long run.

Sheketoff, Emily. Letter [MS Word document; available as a PDF document here] to Harold Rogers and Norman D. Dicks, Committee on Appropriations U.S. House of Representatives, from Emily Sheketoff, Executive Director ALA Washington Office (July 21, 2011). [includes attached “Resolution On Government Printing Office Fy 2012 Appropriations” Adopted by the Council of the American Library Association, June 28, 2011.