Data Discovery

Data discovery is a process for identifying and providing visibility into the location, volume, and context of structured and unstructured data stored in a variety of data repositories.

The Need for Data Discovery

It’s not uncommon for an organization to store terabytes (or more) of data in a variety of data repositories:

Heterogeneous databases located on premises, in legacy databases, and in the cloud

Big data platforms

Data-rich collaboration systems such as SharePoint and Office 365

Cloud-based file-sharing services such as Box, Dropbox, and Google Docs

Spreadsheets, source code, PDFs, emails, or other documents

The sheer volume of stored data and data repository types means that many organizations do not really know what data they store or where it’s located.

Combine that situation with the exponential growth of a global information economy, driven by new technologies and disruptive business models, requiring that an ever-increasing amount of sensitive data be collected, used, exchanged, analyzed, and retained.

An unintended consequence of the global economy is that all this collected sensitive data is a prime target for accidental or intentional compromise, exfiltration, or destruction.

Now, combine these situations — data volume, repository types, and sensitivity — with industry-specific and regulatory mandates, such as SOX, HIPAA, PCI, and GDPR. Most of these mandates demand that organizations ensure:

Some, such as the GDPR, go even further and require organizations to allow EU residents to view, correct, or delete his or her collected data.

How Data Discovery Helps

Before you can protect data from compromise, exfiltration, or destruction threats, before you can ensure data accuracy, before you can comply with various privacy and security mandates, you need to know what data you hold, where it’s located, and its context.

Data discovery provides you with that information.

With that information in hand, you can plan and then implement a data classification process to tag data according to its type, sensitivity/confidentiality, and cost/value to the organization if altered, stolen, or destroyed. And with classification information in hand, you can implement security controls to protect data from accidental or intentional compromise, as well as compliance controls to ensure accuracy, visibility, and other compliance mandates.

But it all starts with data discovery—knowing what data you have and where it’s located.