May 30, 2012

Does Data Have Gravity?

Once you got past the extreme (and enjoyable) production values -- and 42 new products -- I found myself thinking hard about what he had to say.

He presented a case for a fundamental shift in the "physics" of information technology.

Data has mass; big data has big mass -- and the resultant gravitational forces were causing a re-thinking from the familiar application-centric world to a data-centric one.

And the more I thought about it, I started to realize some of the very cool implications of Pat's observation.

To Begin With

To be absolutely precise, information (1s and 0s) don't really have mass in the traditional sense, but information storage does require energy to set state and resist entropy, so I guess I could claim that mass and energy are two forms of the same phenomenon, thanks to Einstein.

But Pat's claim isn't meant to be literal, it's intended to be metaphorical, so let's evaluate it on that basis.

Gravity is caused by mass, curving space in such a way that an attractive force is exerted: proportional to the masses involved and inversely proportional to the square of the distance.

Here on Earth, we tend not to pay much attention to gravity, but -- once you leave our planet -- gravity becomes much more interesting. Interplanetary navigation, for example, is fundamentally all about overcoming, avoiding and leveraging large gravity wells.

At a cosmological scale, gravity is a force that shapes the structure of the universe.

Not to mention, current gravity theory appears to be incompatible with what we know about the rest of the universe via quantum mechanics.

Back to information: as a society, we've now started to amass mind-bending amounts of data for this first time in our history.

And -- as a result I think we're starting to see data gravity work in new and interesting ways.

Data Gravity And Viscosity

Pat used the term "viscosity" to describe gravitational effects. I tend to think of the effect in terms of gravity wells and needing to overcome physical forces.

A simple example would be a data migration from an old array to a new array. As we move from gigabytes to terabytes to petabytes (and beyond), we end up with ridiculously long times and amounts of effort to simply move a pile of data from one physical location to another.

Using this 10 gigabit ethernet example, an exabyte (1000 petabytes) would take 10,000 days, or about 27 years to move. And there are more than a few members of the "Exabyte Club" today.

It's not hard to visualize a deep, strong gravity well forming around all that data; one that requires enormous effort to overcome.

The more data, the stronger the forces at work around it.

Data Gravity Attracts Money ... And Talent

Pat shared some quick examples of newer, information-based businesses that have amassed stupefying amounts of data in the course of their activities.

These are just the tip of the iceberg - there are many hundreds to examine if you're interested.

Some are relatively "pure plays", other are information businesses embedded in more traditional ones.

We, as investors, can only value these newer enterprises in terms of traditional measures like revenue, profit and margin. We have no tools or frameworks to assess the value of the mountains of information they're sitting on.

It's almost like they've acquired mineral exploration rights to a million square miles of unexplored territory -- there's no idea of what might be lying underneath the surface.

I mean, how do you go about valuing 100 petabytes of social data, or 50 petabytes of health care claims, or 200 petabytes of consumer retail behavior data?

These very large "gravitational wells" are attracting some of the brightest people you'll ever meet. They're drawn into a world of massive, diverse and uncorrelated data sets. They want to explore and innovate in a way that human beings haven't been able to in the past.

My kids are college aged now, and I end up spending a lot of time looking at universities. Many of them promote the size of their library (e.g. 17 million volumes) as a reason to attend their institution. Personally, I can't remember the last time I went to a library to get a book.

How long will it be before universities advertise the size and breadth of their mashable data sets, ready for exploration by bright researchers? And how long will it take for us to develop the frameworks to measure the economic value of large information bases: both individually and aggregated with others?

Data Gravity Inverts The Relationship Between Applications And Information

Closer to home, there's a clear argument to be made on the growing importance of large, diverse information bases -- and less importance being placed on the applications that access them.

The move to SOA -- service oriented architectures and its web-oriented derivatives -- was likely the first massive wave of decoupling of applications from information. The advent of virtualization further separated applications and information.

Consider the advent of big data analytics, and there's a complete and intentional decoupling; indeed, there's an argument that can be made that value of information tends to increase when it's considered outside the context it was originally created.

From my perspective, older themes become relevant once again in this new light.

Hey Chuck! With respect to the library, I think you are hitting on something that drives me nuts about educational institutions--they are terrible at marketing! And to think, all they would have to do is throw an "e" in front of that Library name and it would make sense.

The Library, in its traditional sense, is where my son gets books for his weekly reading and where we take the kids to get books during the summer.

In my case, my university library is both a Single Sign-On (SSO) login that gets me electronic access to those same gravity wells you describe and a collection of links to external data sources for use in research. Albeit, most of the linked data sources are public, which I think is more to your point. When will valuable data sets be available for better analysis?

The observation is more frequent measurements increase the noise level, not the signal. If data sets are big because they have more frequent measurements, you may find it a lot harder to understand the data than if the data samples are taken less frequently. Or, to continue with the "gravity" metaphor, if you collect too much data, your get so much gravity, you end up with a black hole from which nothing escapes, including information.

In the end, businesses need to keep cost in mind. The fact you can sample data every second may increase the cost of actually understanding the data.

Jer Thorpe gave a TEDx talk recently about the weight of data. he comes at it from a visualizer's perspective - how to get a different kind of worth from the information. worth a few minutes to check out: http://www.youtube.com/watch?v=Q9wcvFkWpsM