Universities race to safeguard government data under Trump

In an era when Trump administration officials label unwelcome tidings as fake news, endorse “alternative facts,” and favor conspiracy theories, many see another looming threat. They fear federal data may become the next casualty of a post-truth world.

The Trump administration’s early actions have sounded the alarm. First there was the dispute with the National Park Service over the inauguration crowd size. Then, the USDA removed animal welfare information from its website. There have been other disputes about statistics, such as the President’s false claim that the current US murder rate is the highest it’s been in 47 years. Given the Administration’s skepticism of climate science, its dim view of government regulations, and its proposed budget cuts, many suspect data may become the enemy in the administration’s effort to “deconstruct the administrative state.”

The nation’s colleges and universities aren’t taking any chances. Data is their stock-in-trade, and they know how vulnerable it can be. Fearing worst-case scenarios where original data disappears, they are spearheading a remarkable, self-organized movement to safeguard and protect federal data.

If non-partisan government data casts doubt on Trump’s political agenda or the effectiveness of his policies, the administration could deal with the problem a number of different ways. The cheapest, simplest, and most effective method is not to collect any data at all. Budget cutting does the job nicely. Alternatively, survey questions and data collection could be skewed. “All this can be manipulated and altered at the outset,” warns American University professor Chuck Lewis, founder of the Center for Public Integrity, a journalism nonprofit that specializes in data based investigations. “The methodology of data gathering is key.” Government agencies can also drag their feet, essentially ignoring Freedom of Information Act requests for data that often become the basis of journalism investigations.

Much of this happened during the administration of conservative Canadian Prime Minister Stephen Harper, who led Canada for nine years from 2006 to 2015. He focused on boosting economic growth, cutting government spending (except for the military), and reducing environmental regulations. His government closed data-gathering institutions, tossed historical scientific data into the dustbin, and prevented scientists from speaking to the media if their research ran counter to the government’s agenda. “It was like an iron curtain was drawn across communicating research to Canadians,” a Canadian scientist told the magazine Nature. These measures ended only in 2016, when Justin Trudeau became the new Canadian Prime Minister.

Even before the Trump administration took office, the US government restricted politically sensitive data collection. For more than 15 years, Congress has prevented the Centers for Disease Control from studying the impact of guns on human health. At the state level, the Wyoming legislature passed laws in 2015 imposing criminal and civil penalties on citizens, including academic researchers, who collected water, soil, and other “resource data” on public or private lands to report violations of environmental laws to government agencies. In 2016, the legislature tweaked the language to make it more acceptable to the courts. Now, an individual can collect resource data on public lands, but if he or she inadvertently sets foot on private land on the way there or back—a strong possibility given Wyoming’s patchwork of unmarked public and private land—the penalties apply, and none of the data can be used by government agencies. In an Amicus brief supporting an appeal, a group of First Amendment lawyers argue that collecting data falls under First Amendment protections, citing several Supreme Court rulings. The brief’s opening line: “Data is the lifeblood of speech.”

Instances like those at the CDC and in Wyoming were once considered outliers. Now, they could become the new normal. Academics have decided to act, starting a remarkable social movement led by librarians and digital archivists who know the ins and outs of data better than almost anyone else.

Shortly after the November election, a team at the University of Pennsylvania launched the Data Refuge Project to harvest and secure government scientific and environmental data they considered vulnerable that is not included in standard government archiving projects. Soon after, a more comprehensive sister Libraries Network project launched to rescue and preserve data at 15 federal agencies, including the departments of Labor, Commerce, Health and Human Services, and Justice.

Data Refuge events have spread to colleges and universities in 17 states and the District of Columbia, including Harvard, Yale, and MIT. More are planned. Every weekend, students from different disciplines sit next to scientists, researchers, artists, librarians, digital archivists, and community members learning to use automated web-crawlers and other software to scrape, mirror, harvest, copy, authenticate, and secure government data in multiple, virtual “safe houses” in the US and abroad. Canadian scholars are lending a hand.

In a six-minute YouTube video, a Libraries Networks narrator speaks of data as “shared cultural memories.” To her and her colleagues, data is also a historical snapshot of how we as a society go about investigating and describing the world around us, and how those methods and observations change over time. She describes the challenge they faced. ”Without a trusted original source,” she asks, “how can you even trust the copy?” Unless they solved this problem, “the Internet could become flooded with fake data.”

To address this issue, they are designing ways to develop and rapidly scale protocols to verify the authenticity of copies and secure them in multiple locations. They have created a virtual “chain of custody” to track changes, and procedures to spot bad actors trying to game the system. To navigate the vast oceans of federal data, they are asking researchers and local communities to tell them which data sets they value most and why, inviting even more civic involvement.

Beyond rescuing existing data, there are ways to create data on issues government officials would rather ignore. Civil society can step into the breach. For instance, problems with high lead levels in water and children’s blood in Flint, Michigan, first came to light when community members, journalists, and academics joined together to independently test the water.

Unelected city managers had insisted everything was fine. When the complaints became too numerous to ignore, state and local authorities tested the water and children’s lead levels using biased sampling and flawed methodologies, and reported back that there was no problem. Independent testing, however, showed extremely high levels of lead in both.. Civil society’s role in revealing the extent of Flint’s lead problem was formally acknowledged in the report on Flint’s water problems by the Michigan governor’s task force.

Data journalists can also help ensure government data is solid. They learn to spot red flags, anomalies, and claims of progress that seem too good to be true and often are. Even if court actions ensure federal data remains accessible, they provide an additional safeguard. The crash course on data integrity and preservation students and others get at Data Refuge events and their volunteer efforts also contribute to civil society data oversight.

The scientific community, computer scientists and others have developed reliable methods to collect and analyze such big data. In some areas like criminal justice, these methodologies have at times proven to be superior to the government’s methods and have eventually been adopted by government agencies.

Truthfully, the challenge is huge. But we’re already seeing how inventive and energetic civil society can be in tackling it. By democratizing data collection and gathering facts—the real, regular kind—it can raise one hell of a ruckus.

Has America ever needed a media watchdog more than now? Help us by joining CJR today.

Louise Lief is Scholar in Residence at the American University School of Communication Investigative Reporting Workshop.

Featured

In June, CNN hired Daniel Dale, a reporter who had become—to the extent possible for reporters—famous, specifically for fact-checking. Soon after, when Donald Trump kicked off his re-election campaign, Dale was on air to discuss. “TRUMP’S RALLY FEATURED 15+ FALSE...