Does the big data era demand new rules of the road?

By Adam Mazmanian

Apr 22, 2014

As policymakers wrestle with the emerging science of big data, the White House is expected to issue a report in the coming weeks – spearheaded by senior advisor John Podesta – that will detail how government and the private sector can take advantage of opportunities while being mindful of privacy risks.

One tricky element of any big data discussion is moving past the buzzwords and 30,000-foot vantage points to get some precision on what is meant by the term. Maureen K. Ohlhausen, a commissioner at the Federal Trade Commission, offered the "three V's" approach in an April 22 speech at Georgetown Law School, defining big data as including volume, containing variety among structured and unstructured datasets from different sources, and the capability of being produced and analyzed at high velocity. (Others have added a fourth and even fifth V to the definition.)

As a regulator, Ohlhausen approaches the big data issue from a few different perspectives. When it comes to data privacy, the FTC's jurisdiction is well established. A federal court recently affirmed in the case of the Wyndham Hotels data breach that the FTC has the authority to bring cases against companies for inadequately protecting customer data. This has big implications for big data, Ohlhausen noted.

"The FTC's data security enforcement framework isn't perfect," she said. "I would like to develop more concrete guidance to industry, for example. But I haven't seen anything that suggests that big data technology raises fundamentally new data security issues," she said.

The real gray area in regulating big data lies in the purpose behind a data collection, and the issue of subsequent, unforeseen use. According to the data-collection best practices framework known by the acronym FIPPs (Fair Information Practice Principles), data should be collected for stated purposes, with the consent of the consumer, and with a minimum of retention.

The problem of consent was highlighted by former Census Director Bob Groves, currently provost of Georgetown University, at a panel discussion following Ohlhausen's speech. Big data sets are often generated out of data ecosystems like sensors, and have not been designed by statisticians or researchers. "They're often just a single observation with a time stamp and a location stamp," he said. Such datasets are often held by proprietary organizations that lack clear rules on sharing such data with researchers.

As Ohlhausen pointed out, there are obvious tensions between the FIPPs framework and the way big data is used in the real world, where researchers, firms and governments are looking to combine and reuse information in ways not contemplated when consumer consent was given. "Companies cannot give notice at the time of collection for unanticipated uses," she said.

"Strictly limiting the collection of data to the particular task at hand and disposing of it afterward would handicap the data scientist's ability to find new information to address future tasks," Ohlhausen said. "Certain de-identification techniques such as anonymization, although not perfect, can help mitigate some of the risks of comprehensive data retention while permitting innovative big data analysis to proceed."

From a policy point of view, she said, the Fair Credit Reporting Act might provide some useful guidance. The 1970 law puts some restrictions on how and with whom credit bureaus can share personal information. Putting restrictions on "clearly impermissible uses" of consumer data could allow the FTC to maintain its traditional enforcement role in data privacy and protection, while allowing private sector innovators to pursue big data applications. "The FTC should remain vigilant for deceptive and unfair uses of big data, but should avoid preemptive action that could preclude entire future industries," she said.

The stakes are high, at least from a public policy standpoint, Groves said. "I firmly believe that the country that's able to fashion a privacy environment and a statistical and computer science environment that allows the country to learn how these data can inform multiple big policy issues will be the country that wins in the end," he said.

About the Author

Adam Mazmanian is a staff writer covering Congress, the FCC and other key agencies. Connect with him on Twitter: @thisismaz.

FCW investigated efforts by the departments of Defense and Veterans Affairs to improve a joint data repository on military and veteran suicides. Something as impersonal and mundane as incomplete datasets could be exacerbating a national tragedy.

The National Information Exchange Model's usefulness extends far beyond its origins in justice and law enforcement.

Reader comments

Wed, Apr 23, 2014
RayW

This reminds me of a situation I came across back in the 1980's. A Top Secret program was divided up into multiple parts of varying classifications. You could work on any one of the parts at the level of clearance it was at but if you worked on more than (as I recall) five parts, you had to have a Top Secret clearance. Of course, part of the TS clearance was knowing how many of which parts were the trigger level, which meant often someone got the sixth project before the oversight folks realized that an engineer at a secret level was now required to have TS.

From the espionage point of view, some folks may remember the training flick back in the 80's where the bad guys were collecting data from various conversations on the phones and then killed one of the phone talker's brother on a mountain road to get a special device that was to correct the grounding of a certain aircraft. This is the same thing, just more modern.

My personal thought, we have gotten desensitized to the requirement to share all your personal information just to buy a candy bar. And look at the hype on "social" media and the "news" blurbs that claim if you do not get a "social" media account and use it you are going to be ignorant of everything (a BIG data collection system). And this paragraph is just the tip of the iceberg, because as the article points out, there are others that collect data from sources beyond your control unless you go to the extreme of changing disguises often in a day.

Please post your comments here. Comments are moderated, so they may not appear immediately
after submitting. We will not post comments that we consider abusive or off-topic.