By submitting my Email address I confirm that I have read and accepted the Terms of Use and Declaration of Consent.

By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.

You also agree that your personal information may be transferred and processed in the United States, and that you have read and agree to the Terms of Use and the Privacy Policy.

haves. For more in-depth tutorials, visit SearchSecurity.com's Security School page.

Most security professionals know metadata often contains sensitive information about documents hidden from obvious view, but easily extracted with the right tools. We often discover too late that users are not sensitized to these risks and don’t know the proper steps to follow to ensure information they release outside the organization doesn’t carry a hidden, sensitive payload.

The unintended leakage of this type of information can pose many risks to an organization.

Fortunately, there are a number of technologies available to assist in this effort. In this tip, we examine the source of metadata security issues and look at two specific ways you can reduce the threat that document metadata poses to your organization.

Metadata, quite literally, means “data about data”. In the security field, we usually think of it as the data stored (sometimes in hidden form) as part of our productivity files. This can take the form of revision history and comments made between document authors and editors, such as the familiar Microsoft Word “Track Changes” functionality shown in Figure 1.Figure 1: Microsoft Word Track Changes featureThe unintended leakage of this type of information can pose many risks to an organization. For example, a potential supplier who receives a marked-up copy of a contract draft may learn sensitive details about internal deliberations that put the supplier in a stronger negotiating position. Opponents in a legal dispute may discover internal discussions about the weak points in an argument.

Figure 2 illustrates another type of metadata: the data the operating system itself retains about files. In this example, taken from a Mac, you can easily see who created the document, when it was created and last modified, the tools used to edit the document and more. This type of metadata proved embarrassing to Microsoft a few years ago when investigative reporters used it to uncover that Macs were used to create materials for their “I’m a PC” campaign against Apple.

Figure 2: Macintosh document metadataThese are just two examples of metadata – there are many others. For example, a database might include hidden columns containing timestamps of modifications or a photo might include metadata with the GPS coordinates where the photo was taken.

Once we acknowledge the risk that metadata poses to our organizations, we can turn to two ways to control this threat: redacting metadata and using data loss prevention technology.

How to remove metadata before releasing documents

Many of your users may be familiar with the old paper-based process of blacking out sensitive information and making a copy to redact it before release and seek to implement the digital alternative. It’s important to explain to them that this approach is not effective in the digital world and they need to take extra steps to ensure metadata is removed from the document. Without taking added precautions, the reviewer may not notice and in turn redact sensitive information stored in metadata. In fact, the metadata might contain the revision history that includes the actual redacted content! Because of this, you may wish to consider having a second-level review process where a qualified technician examines redacted documents for metadata before their release.

Use the Sanitize Document tool in Acrobat Professional as a second check before releasing the redacted PDF.

While somewhat cumbersome, this fail-safe process provides two degrees of assurance that you have removed sensitive metadata before releasing a document.

Use data loss prevention technology

As part of a defense-in-depth approach to information security, you should also introduce a second layer of control designed to prevent the leakage of sensitive information when proper redaction efforts fail. The easiest way to do this is with a data loss prevention (DLP) product.

DLP products are already widely deployed to monitor endpoints, networks and data stores for accidental leakage of sensitive information. If you’re already using such a product, there’s probably not much else you need to do if you’ve already properly configured it to understand the key words and phrases that constitute sensitive information in your environment. Most DLP products scan the entire contents of a file, including the metadata, when they perform an inspection of outbound documents. If you haven’t already deployed a DLP product, you may wish to consider doing so.

Conclusion

Document metadata poses a significant risk to your organization’s confidential information because users are not often aware it exists. Driven by the WYSIWYG approach used by office productivity tools, users may not realize what you see isn’t always everything you get. Through the use of proper document redaction and data loss prevention technology, you can mitigate this metadata security risk in your environment.

About the author:Mike Chapple, Ph.D., CISA, CISSP, is an IT security professional with the University of Notre Dame. He previously served as an information security researcher with the National Security Agency and the U.S. Air Force. Mike is a frequent contributor to SearchSecurity.com, a technical editor for Information Security magazine and the author of several information security titles, including the CISSP Prep Guide and Information Security Illuminated.

1 comment

Register

Login

Forgot your password?

Your password has been sent to:

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Clearswift have technologies that can automatically strip document metadata (including revision history) on both their email, web and Exchange products. They can also strip active content and redact sensitive information and deliver, rather than "stop and block"