When Data Joins The Dark Side

A big data stockpile may contain dark data -- unstructured, unclassified information that you can't put to good use. Maybe it's time to find it.

Quick, how much of your big data is dark?

Sure, the word "dark" is open to interpretation, so let's clarify things a bit. Gartner's IT Glossary offers this definition of dark data: Information that an organization collects, processes, and stores in its day-to-day operations, but which it largely fails to use for other purposes, including analytics or business relationships.

But even if you know what dark data is, managing it can be tricky, said Julie Colgan, director of information governance solutions for Nuix, an enterprise software company that helps organizations manage growing volumes of unidentified, unstructured data tucked away in archives, email and collaboration systems, hard drives, and other places.

Nuix's customers include government, law enforcement, and regulatory agencies. Organizations also use the company's software for e-discovery to proactively govern their information and to seek out potential legal threats and opportunities.

"Dark data is the data that an organization retains, often unknowingly, that lacks any substantive control or classification," Colgan told InformationWeek in a phone interview.

As a result, organizations often are unable to benefit from it.

"Data is dark when we don't know it exists, when we can't find it, when we can't interpret it, and when we can't share or interface with it," said Colgan.

(Source: NASA)

But how does data join the dark side?

"Sometimes data goes dark because we're simply too busy to deal with it, so we push it to the side and ignore it," Colgan said. "Maybe we don't have the right tools to address the scale or speed, or to shine a light on the data."

Alternatively, data can go dark when it's trapped in a repository -- a legacy archive, for instance -- that renders it difficult to access or analyze.

"We have a lot of customers interested in migrating off legacy archives," said Colgan. "They're doing so for a couple of reasons: One, a number of archives are at end of life, and (customers) want to go to a more modern platform; two, they want to migrate to the cloud."

As is often the case with big data implementations, companies may find themselves with information hoards that are needlessly large. Knowing which data to keep can prove challenging.

"They find they have more information than they need, and they want to ... make some good decisions about what to keep, how to keep it, and how to get rid of the stuff they don't need," said Colgan.

She offered this advice for companies dealing with dark data:

"Take a step back and think strategically about how information is an asset, and (how it) presents new and different kinds of risks to your organization," said Colgan. "Align that to what your risk tolerance is ... and then apply the right tools."

The goal should be to create an environment where data "isn't a constant tsunami that's drowning everyone," she added. "The old methods for managing information need to be examined and realigned."

Of course, this process includes making good decisions about "what data to keep, how to keep it, and how to get rid of the stuff you don't need," said Colgan.

Data protection perceptions seem unconnected from reality for the 437 respondents to our 2014 Backup Technologies Survey, as 36% say they're very satisfied with their backup systems even as just 23% are extremely confident in their recovery capabilities. Get the 2014 Backup Technologies Survey report today. (Free registration required.)

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek. View Full Bio

The challenges are obvious. Dark data does raise eDiscovery cost where the organization if in litiffation, reviewing the case can only increase eDiscovery costs. It also consumers resources in IT a great deal. This can be time consuming and stressful for IT personnel given they may have to restore or identify files which are hard to locate.

Many organizations are dark, as you describe it, because of the silos you mention. Now recognizing the value, costs, and legal protections consolidation create, many organizations are slowly but surely pulling together their data repositories. It's challenging, but the payoffs -- as those who have accomplished the task often can attest to -- are many and rich.

On the consumer side, I'm sure we can all recall instances where our data is housed multiple times within a business. Often, that results in multiple emails/calls/letters, sometimes using different information. Multiply that across millions of people and that saving alone adds up. On the legal front, not knowing what you have (and, therefore, being unable to correctly secure it at times) is a hazard for many industries.

Isn't the point of big data technology that it's possible to hoard data more greedily and tease useful information ot of it? Maybe you want to root out duplication or reduce the amount of data that adds liaibility without any compliance-oriented justification for retaining it. But if there is some potential value left in the information, don't you want to be a hoarder these days?

If data is collected and not used, shouldn't the first reaction be to stop collecting it?

This seems to go against the Big Data "goal" of collecting everything and trying to find something (or possibly anything). But it may be a lot more costly to keep data for which there is little to no value considering this dark data may get stolen stolen (since it is dark data, would you even know the data was stolen?) resulting in potential legal fines and loss of trust (i.e. loss of customers, investors, partners).

As another commenter posted, silos are a serious contributing factor to organizations accumulating dark data. However as a recent IDG SAS survey showed, the surprisingly hack lack of a data strategy plays a key role here as well. Without a serious understanding of what you want to get out of data - and an understanding of how to do it, data will fail to fully realize its potential.

So true. Using data without a long term vision for where it can be used, metrics and analysis which can be incurred and a proper data strategy just means one thing - plenty of blind spots and chaos. The price for acquiring such data which isn't even stale , but dead/dark data is often paid by IT data strategy personnel (though the cost is billed on the company). You need a tool like HAVEN to make more sense of it (goo.gl/HFdxfV)

To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.

Chances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.