A Predictive Analytics Primer with Harvard Business

We love the Harvard Business Journal.

If you’ve never spent any time reviewing the Harvard Business Journal, you really should check it out. There’s a wealth of information around technology, processes, analytics and really everything a ‘current’ business executive needs to survive in today’s market, whichever market that may be.

Today I’d like to share a link that i’ve offered several co-workers and clients as a way to help get their their their feet wet in Predictive Analytics like Predictive Coding or Predictive Review. In this article, Tom Davenport describes some criteria that might make predictive analytics more or less effective in the right conditions. What’s cool here is that the ‘best case scenario’ described is basically what we’re provisioning in Xera by iConect.

The Legal industry has an amazing opportunity to harness the power of Predictive Analysis against a highly organized data set, as is created byproduct of modern eDiscovery processes.

Most businesses work tirelessly to collect ‘good data’ (i.e.; actionable data with complete metadata) to drive regression analysis, concept / behavioral iterations and ‘feed the seed’ for predictive analysis. In modern eDiscovery, we typically present anywhere between 7 and 50 extracted fields of accurate, actionable metadata, including full text of the record that is then reviewed by highly trained subject matter experts to build relevancy in a few key issues.

The short story is, modern predictive analysis engines eat this stuff up.

Bottom line, even though Litigation Support technologies are a bit ‘late to the game’ of Predictive Analytics, they’re right on time and they arrive at a technologically mature level that allows applications to build user friendly workflows and software to drive the process.

Check out the article linked below. We hope you’re as excited about Predictive as we are. It’s a new dimension of review that deserves a place in your toolbox. Predictive is a mature, defensible and cost effective review technology thats here to stay.

Author Sid Newby

Join the discussion 2 Comments

Thank you for reminding us about that (September, 2014) HBR blog framing the underlying aspirations pursued to discern future trends from historical data.

In Computer Assisted Review (CAR) we all want to automate as much as possible (but not more). That is why I offer two key cautions about that HBR blog and your observations about it:

This HBR blog discusses the employing “regression analysis to see just how correlated each variable is; this usually requires some iteration to find the right combination of variables and the best model… [or] regression coefficients—the degree to which each variable affects the purchase behavior…”

This HBR blog mentions that “Lack of good data is the most common barrier to organizations seeking to employ predictive analytics.”

To clarify that you mention that, “In modern eDiscovery, we typically present anywhere between 7 and 50 extracted fields of accurate, actionable metadata, including full text of the record that is then reviewed by highly trained subject matter experts to build relevancy in a few key issues. The short story is, modern predictive analysis engines eat this stuff up.”

Yes, while the predictive analysis engines may “eat this stuff up” it is not altogether clear–or demonstrable to a judge–what the engines digest. That is, the engines may have consumed the data but it is not altogether clear what the engines learned from the experience or can teach the researcher about the smorgasbord.

Bill’s comment is spot on, and focuses on my biggest complaint regarding the TAR marketplace – the tendency (strategy?) of many TAR vendors to point to the success of commercial analytics and claim the same potential for document review.

Analyzing millions of discrete commercial transactions to find suspected patterns or correlations, and then carefully adjusting variables and tracking the resulting outcomes (another couple of million transactions), has as much to do with predictive coding as your monthly credit card statement has to do with The Adventures of Tom Sawyer. Text analytics is a mature branch of statistics and linguistics and is useful for a wide range of applications. However, the lack of controls around its use for eDiscovery, and the failure to assess the accuracy of the algorithms, the inability to differentiate between significant information and duplicative information, and the inability to validate assumptions against the “real world” universe of source data, fatally compromise the claims TAR vendors make about recall rates.

There’s absolutely no reason what TAR won’t work, and it may work very well in a number of cases. But vendors do not provide prospective buyers with a way to select the right tool for each case, and to explain to a skeptical requesting party and court why they should believe that a “reasonable” percentage of significant documents were produced.