Unlocking Big Government Data: Whose Job Is It?

It's not just a good idea for private-sector organizations to help open up the treasure trove of government big data. It's a necessity.

Big Data Talent War: 10 Analytics Job Trends

(click image for larger view and for slideshow)

As storage pundit Jon Toigo pointed out last week, "big data," like "the cloud" before it, actually meant something when the term was first coined, but it's quickly becoming meaningless. But I'm less concerned about imprecise definitions--that happens with all new technologies--and more concerned with making the reams of publicly owned data more widely available and easily accessible.

Privately owned data, such as tweets and Facebook posts, aren't under government control, so we can expect the owners of that data to make it available--but at a price, and only to select partners. That is, probably not you. In contrast, most large government datasets are open to anybody via Freedom of Information Act requests, often at a nominal processing cost.

The problem with public data is that it can be an excruciating exercise to actually GET it. Exceptions include Data.gov on the federal level and Open Data Philly on the city level. But in most cases it's a painful process for both government workers and the requester because the workers don't have the automated processes to categorize and extract the open data (for example, business license data) and leave out the closed data (for example, social security numbers).

Depending upon the volume of requests and workload, the requester can wait a long time for the data while the government employee does lots of fun manual extraction. And while many government employees cheerfully deliver what they're supposed to, a dirty little secret is that some bad apples don't want people to get the data, so they throw up roadblocks such as printing out (and charging you for) records in boxes full of dot matrix tractor paper instead of offering FTP or something cheaper and more convenient. Obviously, this all leads to a massive clog of the big data flow that could be coming out of government agencies.

Open data is unquestionably good for society, but, assuming you're not a socialpreneur, why does this matter to you and your company? Well, as a "data economy" article in Slate magazine recently noted, "If big data is a strategic resource, as has been suggested, then many national and state governments have public reserves that can be tapped for the public good in this young century's version of the industrial revolution."

You don't think government data can be leveraged to create value for your company or community? Two words: Google Maps. Having integrated federal data with local community information and created a good user interface, Google is now dominant in mapping and location-based services. Buildfax.com aggregates building permit data, adds analysis of the data, and sells it to mortgage companies. Your organization can also make money, while also creating jobs.

So whose responsibility is it to open up this data, government employees'? Yes, to a point.

After a seven-year tour in government IT, where turning a sow's ear into a silk purse is standard practice, I have witnessed the inventiveness of some of the best and brightest in government IT. Some of these folks have won awards for innovating, improving customer service, saving money, you name it. They're mission-focused and want to do the right thing. But when it comes time for elected boards to choose between funding "fix the bridge," "buy new patrol cars," "hire more firefighters," and "buy tools for IT," buying tools for IT usually gets short shrift. Point is, assuming a non-hostile government IT shop, and assuming a rational elected board, there are still challenges. Government IT pros can't always do it alone.

That's where organizations such as Code for America come in. It's sort of a Peace Corps for programmers who want to make a difference in civic life. The Code for America Brigade appeals to all walks of life--to coders, yes, but also to people interested in liberating the civic data necessary to power those apps. There's a reason the director of the Brigade is keynoting Oct. 16's Open Data Day in North Carolina.

It's also about companies that leverage open government data, like those in the media industry. Some of them understand the importance of supporting organizations such as the Knight Foundation, which recently recognized developers at six ventures that bring data to the public, including The New York Times and Washington Post, among the winners of the Knight News Challenge.

I don't just think private-sector organizations getting involved in this big open government data mess is just a good idea. I think it's a necessity. It's not government that's responsible for open data--it's us. All of us.

Jen Pahlka, in a TED talk that generated more than half a million views on TED.com, notes that government is "this thing that we own and pay for," and if we consider it as something that's working against us, we're disempowering ourselves.

The idea of private enterprise learning to leverage big data from government in new and innovative ways seems like a potential budding industry to me. The level of science, research and information/data that is freely available from the government if you just know where to get it could be used to produce services on the same level as Google Maps.

But we'd never get those services and products if we waited on government to provide it. Jonathan Feldman is right.

Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.

Why should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.