2011 IDC Digital Universe Study: What to do with Big Data?

The results from IDC’s annual Digital Universe study (sponsored by EMC) published their key findings on just how much information we’re all creating and using. And every year, the study is a powerful springboard for discussion across the industry. We like it because it makes people think.

This year, though, it’s more than just big numbers to contemplate. The study went farther, and made some interesting observations on the nature of the information being created, how enterprises will be involved, and impacts to IT organizations. If nothing else, the findings here make a powerful case to IT organizations large and small that simply doing what you’ve been doing in the past probably won’t keep up for long. The findings make a powerful case for forcing a change in the way IT – and the business at large — handle its most precious asset: information. And, unlike previous years, there’s a definite note of optimism that — yes – this all might be an opportunity in disguise.

A Bit of Background
This is the fifth year of EMC and IDC collaborating on the Digital Universe study. Each year, the numbers get bigger. Each year, we realize that the forecasts from previous years were somewhat conservative. Each year, we all take a good stab on the “what does it mean?” questions.
So let’s start with the big numbers. It would not be an exaggeration to say that we’ve clearly entered the “Zettabyte Era”. A zettabyte is a trillion gigabytes, or a billion terabytes — as you prefer. This year (2011) we’re forecasted to generate and consume 1.8 zettabytes of information as a society. That’s up from an estimated 1.3 zettabytes in 2010, with a forecasted 35 zettabytes by the end of this decade.

Indeed, the most fascinating statement comes from the subhead of the press release: the rate of information growth appears to be exceeding Moore’s Law — a powerful argument for scale-out architectures if there ever was one.

Impact at Ground Zero
It’s easy to think about the explosive growth of the digital universe as something “out there”, i.e. not something that dramatically impacts the average IT organization. IDC makes a strong case that — yes — the brunt of the impact of digital growth will directly impact just about every IT organization, almost disproportionally.

For starters, IDC makes the assertion that, while 75% of the digital universe is created by individuals, 80% of all information is touched by enterprises at some point in the information lifecycle.You touch it, and there’s some sort of implied responsibility — at least, from my perspective. Just ask any ISP or cloud storage service.

Consider IDC’s forecast that organizations will need to deal with 50x more information by 2020 than they’re managing today. Go ahead, you need to visualize this. Take your total amount of storage capacity you’ve got today, and throw a multiplier of 50x against it. Contemplate that, just for a moment. And remember, that’s just an average — information-intensive businesses will likely see far more. Or consider that all this wonderful information will be stored in 75x more “containers” (files, objects, etc.) than we are dealing with today. DC uses the number of 500 quadrillion to describe just how many containers we’ll have to deal with come 2020. Object storage, anyone? Or that, by the end of the decade, we’ll have 10x as many servers to deal with: both physical and virtual.

It makes a certain sense — more information — and ever-more uses — means vastly more servers — of all types — sloshing around than before, putting all that information to work. We’ll leave it to our good friends at Cisco to weigh in as to what all this means from a networking perspective.

And A Forecast For IT Professionals
One of the numbers that scored a direct hit on me was the following: if you’re in an IT professional, there will be only 50% more IT professionals to deal with it all by the end of the decade. We suppose that’s good news and bad news. The good news? You’ll be very popular. The bad news? You’ll be very busy as well.

All joking aside, the numbers taken together make a strong case that there will be a forcing function that will change the way we do IT. Simply making everyone work harder, giving them marginally better tools, etc. doesn’t appear to be able to close this gap.

The Positive Elasticity of IT?
In classical economics, a good is considered “positively elastic” if a lower price incents people to spend even more than before. It’s one thing to say that, it’s something else entirely to see clear evidence of the phenomenon.

Consider this IDC finding: During the period of 2005 to 2011, the cost of acquiring, storing, managing, etc. a gigabyte of information has fallen to an eye-opening 1/6th of what it was. Theoretically, this should represent a huge boon to IT departments everywhere. But no, collectively, IT organizations are spending 50% more (an amazing $4 trillion) in 2011 vs. six short years ago. The same could be said about server growth. As virtualization makes it far easier and cost-effective to create “servers’, more servers are inevitably created. Call it virtual server sprawl, call it what you will — there’s a good reason it’s happening.

Make something easier to consume; people will tend to consume more. The more cost-effective we vendors make IT, the more you’re consuming. A lot more, it seems … Sobering food for thought for those who think that the primary mission of IT is to reduce costs. While that line of thinking makes absolute sense on a per-unit-consumed basis, it looks like a losing long-term proposition when it comes to overall IT expenditures.

Are You Protected?
As you might expect, the IDC folks took a close look at information protection. How much of this information has any sort of minimal protection? And, of the information that ostensibly should be protected, how much is actually getting done? Not surprisingly, we collectively come up short in both regards. A scant 1/3 of all information has any sort of safeguard involved. And only about 1/2 of all the information that should be protected is actually being protected. This shouldn’t be an abstract number to many of you reading this. Think, just for a moment, about your own IT environment. How much of the information you should be protecting is adequately protected? Indeed, the people chartered with protecting said valuable information often aren’t given the inputs they need (e.g. economic value, risk, etc.) they need to come up with a decently optimized approach. We all have our work cut out for us, don’t we?

The Cloud Angle
Yes, there’s a cloud angle here — in this case, cloud refers to “external service providers”. IDC estimates that only 2% of IT spending today can be classified as “cloud”. Personally, we think this is an extremely low estimate, simply because we have a much broader definition of cloud that includes many forms of external IT service providers. That being said, IDC estimates that 20% of all information will be touched by cloud providers in the coming decade, and that perhaps as much as 10% of enterprise information will be maintained by clouds. Again, this is using IDC’s rather narrow definition.

We look at it differently. IT organizations are already in the process of reacting to this bottomless demand for information and processing power. Some are re-engineering their internal operations to look more like an internal service provider, e.g. a private cloud. Others are very interested in using external specialists (e.g. external service providers or public cloud operators) to handle some of the load.

However, the vast majority of larger IT organizations will most likely use a combination of internal and external cloud resources, resulting in what we all call a hybrid cloud. Put differently, there’s no way that we can see legacy IT approaches keeping up with this tidal wave. Something’s got to give …

A Note of Big Data Optimism?
So often in past years, we’ve felt that the results of the annual IDC/EMC survey have cast a note of distinct doom and gloom — almost as if we were heading for some sort of an information apocalypse. This year is different.

Indeed, the title of this year’s study “Extracting Value from Chaos” reflects the thought that — yes — all of this might be an opportunity in disguise. They point to the advent of “big data” as a shift in perspective to extracting value from the new digital wealth that now surrounds us all. Not a day goes by when we hear of yet another IT organization that has begun to pivot from “lots of data as a problem” to “lots of data as an opportunity”.

Indeed, fostering and nurturing this progressive mindset seems to be one of the central tenets to ultimately coping with — and profiting from — the exploding digital universe.
Storage Implications.

If you’ve been following EMC for a while, you’ve probably noticed that many of our storage platforms have embraced the scale-out model: VMAX for scale-out block storage. Isilon for scale-out file services. Atmos for scale-out object services. And, of course, Greenplum’s scale-out shared-nothing architectural model for big data analytics.

Why? Besides the more pragmatic efficiency and simplicity drivers, we’re now a firm believer that these scale-out architectures have an inarguable advantage in coping with a world where information growth outpaces processor growth. Not to mention outpacing storage media device capacity and performance growth.

Indeed, the recent Isilon SPEC benchmark seems a bit less esoteric in this context. Who needs a million SPEC IOPS against a near-petabyte of data in a single file system? You might — if not today, then perhaps before too long. The storage (or database, or file system, etc.) architecture of the future is clearly headed towards many cooperating peer nodes built on mainstream technologies that collectively tackle big data in all its forms.

The management model is changing quickly as well: the era of hand-carving and hand-optimizing IT resources (storage included) is giving way to vast, automated pools of resources where policies decide the vast majority of actions, and having human beings directly involved in the day-to-day workflow means that there’s something amiss.

Call it what you will — cloud, IT-as-a-service, pervasive virtualization, automation, etc. — the label matters less than the concepts embodied.

Organizational Implications
When it comes to organizations that really understand the value of their information, there’s an incredibly broad swath. At one end of the spectrum, you’ll find organizations that have categorized their information repositories and associated flows, and can tick off with enviable precision what’s important, and what’s not. And, unfortunately, you’ll meet many organizations that look at it all as largely undifferentiated 1s and 0s — all of which has to be stored, backed up, managed, etc. Information — and all the technology around it — becomes an ever-growing burden. Some of these folks wish there was some sort of a silver technology bullet from vendors to tell them what they need to know about their information. Sorry, we don’t hold out much hope for this — except in very narrow use cases. Instead, the best recipe seems to be a relatively new organizational function – information governance — that categorizes and assigns value/risk to different information sets, which in turn drive intelligent IT decisions on what’s important and what’s not. Certainly, if you’re going to be the proud owner of 50x more information before 2020, there’s a case to put such a function in place if you haven’t already.

Putting It All Together
For us, this year’s study is yet more compelling evidence that we’ve clearly entered the information age– one where the foundation of economic value is largely derived from information vs. physical things. A century ago, we used to think in terms of big factories. Now we think in terms of big data. IT organizations everywhere appear to be bearing the brunt of this economic and societal transition. To the extent that their leaders recognize the new game — and are prepared to invest appropriately in new ways of doing things — great things await.

We wonder what next year’s survey will bring.

About the Author
Michael Becker is the Marketing Technologist, I.T. Project Manager at ChicagoMicro.