Big email: How to effectively find the needle in the haystack (Part 1)

Consider a common scenario: Your legal department has been alerted by a whistleblower to potentially fraudulent activity happening within the organization. There are a few clues to use as a starting point, but the key facts — as well as whether or not fraud actually occurred — remain unknown. With hundreds of millions of emails within the enterprise, and the need to uncover the facts as quickly as possible, the team cranks up the e-discovery process. This effort is undertaken through the same tried and true approach used for all matters involving electronically stored information — identify some key custodians and review most of their documents — and everyone involved knows how time consuming and potentially expensive this process can be. It could take months and precious internal resources to comb through millions of electronic documents to get to the bottom of the whistleblower’s accusations.

This scenario can incite migraines in even the most prepared general counsel. And while the standard “comprehensive review of a large volume of documents” e-discovery process is often unavoidable in responding to discovery demands, there is a better way to respond to or get ahead of the discovery demands in situations similar to the one described above. Counsel need a process that builds on existing processes and technology but leverages the power of emerging analytics capabilities at the outset of a matter to find facts upfront. For a legal department, what matters at the end of the day is finding the facts for any number of reasons including intercepting behaviors within the organization that are illegal or violate compliance regulations, responding to a whistleblower as described above, determining whether or how to dispose of a matter and planning case and e-discovery strategy in the event of a lawsuit. But with ever-increasing data volumes, getting to the facts the traditional “standard review” way is becoming more and more expensive and time consuming for legal teams.

As such, legal departments also need to be equipped with a better way to tackle the problem of finding facts quickly, without relying on going through the “standard review” motions of the entire e-discovery process.

Comprehensive review alone isn’t the best solution

Because legal teams are accustomed to complying with and planning for production obligations to opposing counsel under the Federal Rules of Civil Procedure, or to an investigating agency, there is a tendency to default to the standard e-discovery process as a means to finding information. However, in reviewing all documents in the set in detail, and doing them in serial order, it generally does lead to key documents being scattered throughout the review population in serial order and uncovered over time. Furthermore, since discovery obligations are very broad and designed for producing everything even remotely relevant to a matter, not just specific facts, there is a tendency to draw a broad net both in the document population and the review instructions, which means that it can take a long time into review before the facts come to light. This comprehensive review process has its place for litigation and complex investigations where it is pertinent that no slightly relevant document be overlooked. For establishing the key merits of an issue quickly, however, these broad strokes were neither designed to achieve that goal nor are the best way.

The alternative is to rely on data mining and visualization techniques developed for Big Data. Big Data is top of mind for all key stakeholders within large enterprises today, and data mining is a current hot topic. Everyone from IT and compliance to marketing and legal are talking about how it will change the way businesses and our government are run, and the many challenges, benefits and implications that come into play. Legal departments are increasingly aware of how Big Data impacts the enterprise, but what remains to be discussed is how data mining affects the practice of law.

Data mining is designed to visually expose trends within a data population that would otherwise go unnoticed, and enable large sets of data points (such as documents) to be understood quickly. Unknown knowledge is surfaced quickly as a large data set is manipulated at a high level. This provides an alternative for legal departments and their counsel to finding the facts through the standard comprehensive review e-discovery methodology.

Even in circumstances where the legal team will eventually be required to go through the entire e-discovery process, data mining can be leveraged to uncover key facts first and bring the most pressing evidence to light quickly. By using this as a starting point, legal teams will have a better understanding of their next steps and overall strategy for any given issue. The second article in this series, coming May 2, will discuss the key principles of this new approach and practical ways legal teams can begin to utilize data mining to save time, resources and money.