Will a Crackdown on Privacy Kill Big Data Innovation?

The issue of data privacy on the web gets a lot of attention, thanks to the practices of sites such as Facebook and Google, but the positive aspects of those companies’ data practices tend to get overlooked. Not only does data drive the overall experience of our favorite sites and services, but it also drives innovation in broadly valuable technologies such as Hadoop and advanced analytics tools. As the government and policymakers, in general, strive to strike a framework for online data practices, they’d be wise to look at the issue from all angles.

The McKinsey Global Institute released an interesting report on big data last week, identifying key strategies for specific vertical markets that could save them hundreds of billions of dollars. The report highlights industries (e.g., health care and the public sector) and technologies (e.g., Hadoop and data warehousing) that we’ve covered before and that already are big data stars, as well as one very important issue to the future success of big data efforts: finding the appropriate balance between consumer privacy and business innovation.

The authors don’t delve into too much detail on this topic, but I give them credit for mentioning it at all, because it’s a deep, multi-faceted issue that could fill an entire report of its own, and that has broad implications beyond the world of big data.

As the report’s authors note, policymakers will play an important role in enabling future big data advances, both technologically and strategically. They point out and briefly discuss six issues facing policymakers:

Build human capital for big data

Align incentives to promote data sharing for the greater good

Develop policies that balance the interests of companies wanting to create value from data and citizens wanting to protect their privacy and security

I’ve given this issue a lot of thought over the past few months, and I think No. 3 is the key issue — not just for the future of big data, but for the future of the web in general. Unless there’s a well-reasoned balance developed between consumer privacy and business interests, goals such as information sharing and an increased pace of innovation could fall victim to the federal government’s heavy hand. As I explained in January, Congress is considering its strategy for regulating online privacy, but it’s an issue strewn with pitfalls. Here are a couple of thoughts I’ve been mulling lately:

Proposed federal regulations could hamstring technological innovation: For example, two proposed federal regulations — the Federal Trade Commission’s Do Not Track policy (which has just been endorsed by several senators in the form of the “Do-Not-Track Online Act of 2011″) and the Department of Commerce’s Fair Information Practice Principles — have the potential to seriously hamper big data and analytics innovations, illustrating the importance of striking the right balance. The regulations are fairly complex in their current states, but they strive for two separate but interrelated goals, respectively: giving consumers the ability to proactively opt out of certain data-tracking practices and giving consumers all the information — upfront and crystal-clear — about how sites are using their data. Both limit to some degree what sites can track, how they can do it, and impose penalties for violations. My concern — and one echoed by Google in its recent opposition to California’s proposed Do Not Track legislation — is that customer data has driven the innovation of numerous key big data technologies by major web sites, including Hadoop (within Facebook and Yahoo, especially), NoSQL databases and many of Google’s tools and projects. McKinsey highlights many of these among the list of technologies enabling big data. Will putting companies’ analytics efforts at the mercy of consumers, and under the thumb of the federal government, reduce desire to innovate because they fear penalties or because they simply don’t have the relevant data required to do so?

Social media and the personalized could be jeopardized. This is directly related to the above concern, but is more wide-reaching. Social media sites such as Facebook, Twitter and Foursquare, and larger-scope web sites such as Google, innovate on big data technologies because their services rely on data. The only way to optimize and create a better user experience is to draw better insights into customers’ activities, interests and connections. And the only way (or, at least, the primary way) to make money from such services is via targeted advertising. It’s the data that drives Google’s huge advertising revenues, which pay for its myriad free services, and Facebook to an $80 billion valuation. I’m not suggesting Facebook or Google are going to fold in the face of proposed regulations, just that their services could suffer. Less data and more regulations means less innovation and fewer risks taken. This might be a boon for privacy, but it’s a hindrance in the fast-moving web world, where major changes come from rewriting code as opposed to physically building a new project, and where services can be improved on the fly as issues arise.

Don’t get me wrong, consumers deserve more information and the federal government is right to attempt to give it to them, but everyone needs to get educated on the connection between data collection and usage and the benefits they provide. If consumers value their social media and personalized web experiences, and if the government is serious about pushing analytics as a major skill set for the next-generation economy, they need to consider the issue of big data in terms of its pros as well as in terms of its obvious cons such as privacy and security implications. It might be tempting to clamp down on data practices or to click “do not track” and shut off the personal-data firehose, but such decisions could have far greater implications than meets the eye.