Last week Barry Murphy and I recorded an ESIBytes podcast on the rationale behind the eDiscoveryJournal and Review Less predictive coding boot camp national “tour” beginning in Washington, DC on April 17th, with a special guest Judge Nora Barry Fischer a Federal District Court Judge from the W.D. of Pennsylvania who is one of the Judges assisting with the judicial roundtable portion of the predictive coding boot camp.

Judge Fischer kicked off the podcast by reiterating the need for more e-discovery education by sharing some recent observations from the e-Discovery Special Master program from the W.D. of PA. This showed that eDiscovery is occurring more frequently in cases and highlighted the need for lawyer education in the field. This problem is especially acute when we talk about newer technologies such as predictive coding which requires some understanding of statistics to be able to understand the validation of all predictive coding tools.

At least a dozen friends have inquired about my health after the big push of Global Aerospace news came out last week, since there were no blog posts by me. As a participant in the case, I figured it was more important to let others weigh in on the significance of this decision as a precedent that predictive coding can be defended in a court.

The verdict came despite fierce opposition by a very well respected, (and deservedly so I might add), Amlaw 20 law firm which ultimately settled the case after the judge issued his order allowing the use of predictive coding. It is also worth mentioning that Judge Chamberlin was an unknown quantity going in, unlike Judge Peck in Da Silva Moore. As many of you know, Judge Peck had been embracing and educating lawyers and judges on predictive coding for at least a year before he had the chance to author an opinion. With the decision, all of our prior research pointing to broad judicial acceptance of using TAR was validated, and the case is now in the books.

On September 9, 2011 I presented to a blue ribbon panel of in-house lawyers, outside counsel and judges at the Committee Meeting on Preservation and Sanctions in the Western District of Pennsylvania who were in agreement with my thesis that we over preserve, collect and process ESI in proportion to what is eventually used. My views on this topic have been evolving over time and I outlined this for the panel with a contrast of the Rand Study, Where the Money Goes with the Microsoft Letter dated August 29, 2011.

The Rand Study reported that, for the seven corporate participants, collection costs were estimated to be only $.08 of every dollar spent on eDiscovery, with the bulk of the expenses going to processing, and review. The studies’ authors and I tended to diminish the importance of these more minor collection costs and I believed we needed to focus on search and retrieval to lower eDiscovery financial burdens. But my initial conclusion failed to take into account all the silos of data being preserved under legal hold. I realized that legal hold preservation is much more frequent than active litigation and the Microsoft Letter supported this conclusion with data showing the vast over preservation of the average Microsoft matter vs. the actual amount produced and then used in litigation.

I have been writing a series of posts about the “ever-elusive” metrics that many eDiscovery professionals seem to be waiting for when it comes to driving mainstream adoption of TAR. Does the elusive challenge of finding and providing these TAR metrics mean we are doomed not to be able to use TAR? No, it means clearly that lawyers, who have that rare blend of statistical training and court room experience, have a huge advantage in arguing in discovery disputes. The fact is there are boatloads of metrics that a smart litigator can use to defend their process. One analogous observation is the joke people have about statisticians. That two statisticians can argue forever that they are right on opposite sides of many issues by using statistics. We are going to have these arguments given the fact that the richness of collections vary and disputes have different values so these metrics will be moving targets. But since lawyers lead this argument, it is easy to see that if you put a lawyer in a room who understands this subject matter, against a lawyer who doesn’t, guess who is going to sound more reasonable? This is often the underlying standard being aimed for in discovery today. The analogy I have been giving people is that this match up tends to make the math adverse litigator look very much like a person on their first date; very unsure of how things are going and also what they should do when it comes time to present an adverse argument.

My last two posts have focused on the predictive coding metrics that so many eDiscovery professionals are waiting for with bated breath. What is the real problem here? It’s not that we don’t have standards which are reasonableness or proportionality, or that we don’t have metrics which are present almost everywhere you look when considering these tools. The problem is that lawyers don’t think in large enough numbers to understand the meaning of the basic metrics they have in front of them. Most of the basic metrics can be validated by simple sampling principles to not only make an argument that their approach is reasonable, but also to know when to disagree with an opponent in the heat of battle. It is much easier for an opponent of TAR to give a blanket statement about not wanting to miss any documents and having smart people, instead of computers, look at the documents because that is the surest way not to miss responsive ESI. By the way, there are no reasonable metrics in that position because there is never perfection with the search for relevant ESI, and not a single number can be shown in practice that supports this position!!

My recent piece, “Predictive Coding Metrics are for Weenies – Part I,” looked at how those who want metrics that will suddenly “validate” predictive coding are going to get left behind waiting for that validation. To examine the fence sitters’ concerns more closely, I agree it would be nice to know in advance if the number of random sample documents your TAR system uses is enough to train it adequately. If the system is looking at 5,000 documents as a training set, is that enough? Or, should it be something smaller, such as 2,000 documents? Or whether the final recall rate of responsive documents found should be an estimated 70, 80, or 90 percent of the total responsive documents in the collection (recall is the measure used to determine what percentage of responsive documents were found out of the total estimated number of documents in the population). Some TAR systems rank documents based on their likelihood of being responsive, so another helpful metric would be whether documents, which have a score above X with your predictive coding system, are presumptively responsive and conversely, whether documents which have a score below Y are presumptively not responsive. These types of metrics ARE NOT LIKELY to emerge for a number of reasons.

Judicial Activism with Predictive Coding – All In The Name of “Just”, “Speedy” and “Inexpensive” Discovery

I had a spirited discussion on Friday with a litigator about whether it was appropriate for Honorable J. Travis Laster from the Delaware Chancery Court to push the parties to use predictive coding when apparently neither of the parties sought to use predictive coding in EORHB, Inc., et al v. HOA Holdings, LLC, C.A. No. 7409-VCL (Del. Ch. Oct. 15, 2012). I decided to take some time to think about this outcome since I have been very public in my support of using predictive coding to reduce many of the ills of the litigation system, e.g. lengthy discovery disputes, excessive costs and uneven quality of ESI productions. Yet I am also on record supporting the outcome of Kleen Products where the plaintiffs failed in forcing the defendants to use predictive coding because they had what appeared to be an uphill battle to impose their discovery will on their adversary. I also generally believe parties should be able to chart their own course for how to proceed with discovery.

I recently read an assessment from a morning networking meeting held in Chicago about Technology Assisted Review (TAR), that those in attendance believed we needed more metrics around predictive coding tools to help end users grasp how to effectively use them. This assessment struck me as bizarre because metrics are the very essence of TAR.

Common metrics encountered with TAR are the “richness” of a collection,( e.g. how many documents are relevant in a collection being reviewed versus the total population of documents in the collection), and “recall rates” or how many documents the TAR system is finding when compared to the bench mark rate of expert human reviewers on the same sample of documents. In addition to this, the decision of when to stop or to continue training a TAR system, which is likely based on some combination of objective metrics and some human intuition, and the underlying algorithms for taking the training and applying the results to create complex searches to select more potentially responsive documents without additional human review, are all based on math or metrics of some form.

This is a continuation of my post from last week discussing “The E’s” of Predictive Coding

Before I get a string of emails from my fellow experts and friends in the field, I must add that creating new processes in law is never easy because of the barriers pointed out earlier in my last post. Also, the additional skills of understanding technology, statistics and law does make each of us experts, even if we are throwing out competing preferred routes to complete a review and confusing the marketplace. Just like your GPS, the customers end up with three suggested routes to choose from and they all get you from A to Z crossing different types of terrain. But, since lawyers are trained to avoid risk and would rather use “precedent” as opposed to coming up with new solutions, there is a risk of following blindly a single route as gospel without doing some homework about the different routes and their applicability for the specific matter at hand.

Nothing with law and technology combined comes with ease (or “E‘s” as I put in the title). It should come as no surprise, then, that one of the biggest challenges in the technology-assisted review (TAR) space is how to educate a critical mass of lawyers about how to work with newer technologies.

TAR methodologies require some level of comfort with statistics in order for lawyers to validate results and be comfortable certifying the completeness of their review and production. After two years of pushing hard to get lawyers to accept these tools, I am convinced we need less improvement in the tools today and more improvement in the competency of lawyers to successfully run TAR projects.