Email a friend

To

From

Thank you

Your message has been sent.

Sorry

There was an error emailing this page.

Wrangling your unstructured data

FREE

Become An Insider

Sign up now and get free access to
hundreds of Insider articles, guides, reviews, interviews, blogs, and other premium content
from the best tech brands on the Internet: CIO, CSO, Computerworld, InfoWorld,
IT World and Network World Learn more.

Other Insider Recommendations

Quick and easy tips to gain control over that sprawl of files that won't fit in a database

InfoWorld|Jan 5, 2010

Unstructured data is one of the biggest contributors to the data explosion. Defined as just about any kind of data that lacks a strict data model -- essentially, any data that isn't in a database of some kind -- unstructured data includes log files, documents, audio files, and images. This kind of data is difficult to manage due to the wide range of formats and lack of standardized metadata attached to them.

Here are some quick tips that will help you monitor and control how this data is created and stored in your environment.

Put someone in charge

Methods used to manage this data usually involve one of two different strategies: Either draw the data into a database where it can be easily mined, archived, and eventually discarded, or try to apply an organizational structure to the way that mixed data is stored. The former is often used with data that has a somewhat consistent format, such as log files. The latter is often the only avenue open to generalized file-sharing data short of a comprehensive document management system.

The first thing you can do is to make someone responsible for every piece of data your organization generates and maintains. You should never allow data to be created on your network without knowing precisely who is responsible for it. For example, never, ever make a "public" file share that anyone can dump data into. These shares are disasters because no one can claim ownership of anything, making it almost impossible to determine what should be archived and what should be deleted. Worse still, the loose permissions structure required for a public share is almost a guarantee that privileged data will eventually be exposed to users who should not see it.