Last week Barry Murphy and I recorded an ESIBytes podcast on the rationale behind the eDiscoveryJournal and Review Less predictive coding boot camp national “tour” beginning in Washington, DC on April 17th, with a special guest Judge Nora Barry Fischer a Federal District Court Judge from the W.D. of Pennsylvania who is one of the Judges assisting with the judicial roundtable portion of the predictive coding boot camp.

Judge Fischer kicked off the podcast by reiterating the need for more e-discovery education by sharing some recent observations from the e-Discovery Special Master program from the W.D. of PA. This showed that eDiscovery is occurring more frequently in cases and highlighted the need for lawyer education in the field. This problem is especially acute when we talk about newer technologies such as predictive coding which requires some understanding of statistics to be able to understand the validation of all predictive coding tools.

Unlike most CLE programs I speak at, I have spent about eight weeks working on materials for these boot camps, to craft a program which “opens the kimono” on over 14 different predictive coding tools we have looked at and synthesizes the common aspects behind the tools. It also draws on my experiences in Global Aerospace and other predictive coding cases I am working on, as well as lessons learned from some of the other large cases that have hit the press. But we are trying to go much deeper into statistics and terminology than most CLE’s go.

A critical observation of mine from consulting in large cases involving both producing ESI using predictive coding and receiving a production with predictive coding is participants need to understand the differences between tools as opposed to just knowing the tool they want to use. It is difficult to meet and confer effectively with predictive coding when both sides are speaking different languages, but think they are speaking the same language. This problem usually manifests itself later in the project after a protocol is in place.

Another observation is that transparency is extremely important. Holding onto the rules and traditions of past document review standards which says that we do not need to show training sets, really misses the general proposition of Rule 1 of balancing speed, cost and results. It also misses the psychology of the side who is putting their trust in a black box that they need to see something more than pure output at the end to get comfortable.

This is more about common sense than anything else. Especially when the number of documents that are shared often number a few thousand, less non-responsive documents than are typically turned over in any large production. This is because of poor precision among teams of reviewers who get tired and miscode documents. It is an exciting space with lots of room to grow. We intend to offer a good view of the field, share some new survey statistics on national usage, and bring in local judges and facilitators who are using these tools to add to a town hall type feel and hopefully do it in an entertaining way.

My apologies to ESIBytes listeners that my free podcast shows have slowed in frequency. This is a short term aberration. While ESIBytes has always been a free resource for eDiscovery education, there is a small fee of $150 for the eDiscoveryJournal Boot Camps. The fact is the time commitment of designing the materials and expenses associated with travelling to cities; designing document review tests for the audience; and securing space with Internet access require some modest amounts of revenue. On the flip side, I am speaking at predictive coding events within the next 30 days which cost $750 to $2000 not including transportation costs. Given the need for more lawyers who are conversant with predictive coding, we felt it was important to take this program into communities and make it affordable to keep the costs manageable and to “seed” the development of more lawyers who can work with these tools. It is also cheaper to pay for our small team to travel than to require all attendees to pay to travel to come and see us speak.

Lastly, we are scheduled to do this program in Washington, DC on April 17th; Chicago on April 24th; Pittsburgh on May 6th; Boston on May 14th. Please sign up to support our initiative. Space is limited at each venue. And also feel free to reach out to us if you are interested in seeing this training boot camp come to your city. Hope to see you all in your home town sometime in 2013.

At least a dozen friends have inquired about my health after the big push of Global Aerospace news came out last week, since there were no blog posts by me. As a participant in the case, I figured it was more important to let others weigh in on the significance of this decision as a precedent that predictive coding can be defended in a court.

The verdict came despite fierce opposition by a very well respected, (and deservedly so I might add), Amlaw 20 law firm which ultimately settled the case after the judge issued his order allowing the use of predictive coding. It is also worth mentioning that Judge Chamberlin was an unknown quantity going in, unlike Judge Peck in Da Silva Moore. As many of you know, Judge Peck had been embracing and educating lawyers and judges on predictive coding for at least a year before he had the chance to author an opinion. With the decision, all of our prior research pointing to broad judicial acceptance of using TAR was validated, and the case is now in the books.

So what is there left to say? What eDJ Group and Review Less survey data of predictive coding use thus far suggests, many participants are culling substantially less data than was culled out in the high profile Global Aerospace case. That case culled out 87% of the data, placing it in the upper quartile of preliminary research we have compiled.

One would think, if you ever were going to be careful, and perhaps review more than is necessary, this would be the case. But the Global Aerospace predictive coding team made a decision early on to use the tools as they were designed and drive them in fifth gear instead of first gear to gain a true sense of the defensibility of this approach. This point was made in the Wall Street Journal article I was quoted in last week.

These results when compared to the data we are seeing, calls out for substantial legal education on how to use these tools. eDJ Group and Review Less are taking a strong step to fill this void and anticipate providing 8 – 12 Predictive Coding Boot Camps in cities around the country. These will be presented in a vendor neutral manner with significant judicial involvement in this CLE program. We are able to also give perspectives from pioneers, and will provide some testing of attendees to provide us with feedback on the extent the attendees grasp how to use and defend predictive coding results.

We hope to seed the industry with predictive coding users in a fair and neutral manner to prepare lawyers and corporations for an understanding of different approaches and validation of the results. For more information contact Marilyn@edjgroupinc.com.

On September 9, 2011 I presented to a blue ribbon panel of in-house lawyers, outside counsel and judges at the Committee Meeting on Preservation and Sanctions in the Western District of Pennsylvania who were in agreement with my thesis that we over preserve, collect and process ESI in proportion to what is eventually used. My views on this topic have been evolving over time and I outlined this for the panel with a contrast of the Rand Study, Where the Money Goes with the Microsoft Letter dated August 29, 2011.

The Rand Study reported that, for the seven corporate participants, collection costs were estimated to be only $.08 of every dollar spent on eDiscovery, with the bulk of the expenses going to processing, and review. The studies’ authors and I tended to diminish the importance of these more minor collection costs and I believed we needed to focus on search and retrieval to lower eDiscovery financial burdens. But my initial conclusion failed to take into account all the silos of data being preserved under legal hold. I realized that legal hold preservation is much more frequent than active litigation and the Microsoft Letter supported this conclusion with data showing the vast over preservation of the average Microsoft matter vs. the actual amount produced and then used in litigation.

Microsoft reported that their average case starts with 48,431,250 pages preserved. Of that preservation, 12,915,000 pages are processed, 645,750 pages are reviewed, 141,450 pages are produced and 142 pages are actually used (Microsoft Letter, Page 5). This scary break down highlights the incredibly wasteful effort in handling a vast majority of ESI that isn’t even remotely relevant to the active matter. Additionally, from the predictive coding standpoint, when the percentage of responsive documents in the preservation and collection is small, it is logically harder to find responsive documents regardless of whether a predictive coding application uses random sampling or a seed set for training.

This is such a systemic problem that it really calls for outside intervention so I’ve decided to use Rule 1 from the Federal Rules of Civil Procedure as my letter to Santa.

Dear Santa:

This year, I would like “Just”, “Speedy” and “Inexpensive” eDiscovery which Rule 1 offers as a framework for interpretation of the entire Federal Rules of Civil Procedure. These concepts are even more basic than “reasonableness” and “proportionality” which eDiscovery lawyers often point out as the more important standards because they are inferred specifically in the eDiscovery rules in the Federal Rules of Civil Procedure. This would be acceptable, as long as “just”, “speedy” and “inexpensive” are in your sack of presents because they are easier concepts for lay people to grasp and they frame how “reasonableness” and “proportionality” are decided.

Santa, we’re obviously in the perfect eDiscovery storm with this blizzard of preservation and over collection so I need you to bring in your secret weapon, Rudolph! We need him at the head of the sleigh guiding the other ESI reindeer: Keeper – the Records Manager, Geeky – In-house IT, Traveler – in-house eDiscovery, Worried – outside counsel, Manager – In-house attorney, and Deeply Concerned the Client. Only Rudolph can shine his bright light on the process and satisfy any judge who turns into the abominable snowman and sanctions the team for poor preservation or spoliation.

Rudolph will help deliver a “speedy” and “inexpensive” solution by focusing, initially, on the areas where responsive ESI is most likely. This targeted collection can be fully analyzed during processing and TAR and if analytics show this data leads to other custodians, more data can be collected and become part of the evolving TAR population. Rudolph’s “speedy” and “inexpensive” sleigh ride brings us the third Rule 1 requirement: “Just”; because when litigation teams find ESI more quickly, they still have time to follow additional trails of information: new custodians, facts and theories about the case. Rudolph’s laser-beam approach means parties can begin producing ESI on a rolling basis, much earlier in discovery and performed supplemental searches for more ESI can be conducted if necessary. This method even follows much of Sedona’s Best Search Practices via “iterative” predictive coding.

Santa, we all know that Rudolph is real because we’ve seen his hoof prints in the snow of several recent cases. In Global Aerospace v. Landow Aviation (the “Virginia Case”) a more “just”, “speedy” and “inexpensive” result was derived cooperatively among opposing parties, and ultimately presented to the Court. During discovery, the selected software was trained by coding only 5000 documents, which were used to cull nearly 90% of the roughly 1.3 million documents without expensive and time-consuming manual review. This made Global Aerospace the first and only predictive coding case to be effectively implemented and concluded under judicial scrutiny in an active litigation. We can also nominate Vice Chancellor, J. Travis Laster, for the “Rudolph eDiscovery Award” for ordering both sides to use TAR and also, to share the same vendor in EORHB, Inc., et al v. HOA Holdings, LLC, C.A. No. 7409-VCL (Del. Ch. Oct. 15, 2012), known as (the “Delaware Case”).

Getting back to over preservation and collection, we need Rudolph to bring the sleigh earlier in the litigation process to have a more powerful impact on the Discovery burden than just leveraging predictive coding during review. We need one of the eDiscovery reindeer to don the red nose and ask the questions: why are we doing this so expensively? Why are we preserving so much data when we ultimately use so little? Reindeer Worried – outside counsel or Manager– In-house attorney would be a perfect choice for guiding the sleigh because they are lawyers and they understand the legal process. Deeply Concerned, Geeky, Traveler, or Keeper could definitely raise objections to over collection and preservation and even an outside consultant who is also a lawyer, call that reindeer, Thinker, can assist teams in deploying Rudolph’s laser-target method. It doesn’t take a computer scientist to realize the hole we dig ourselves into when we over process data for search and retrieval, even when we try to use technology to aid in the effort so maybe you could leave the North Pole a little early, this year.

Thank you, Santa.

Your ESI Pal,

Karl

With that in mind, I would like to wish everyone in the eDiscovery Community a happy and healthy holiday season and a Happy New Year to my good friends at the eDiscoveryJournal who have been a fun and resourceful group allowing me to conduct useful research with them and to let me blog to a growing audience of readers over the past six months. So let’s stop playing reindeer games and all put on our red noses to try to solve this problem. “Now, KEEPER! now, GEEKY! now, WORRIED! and MANAGER! On, THINKER! on, DEEPLY CONCERNED! on TRAVELER! and RUDOLPH!… Hopefully we have the makings of an everyday litigation classic in 2013 and beyond.

I have been writing a series of posts about the “ever-elusive” metrics that many eDiscovery professionals seem to be waiting for when it comes to driving mainstream adoption of TAR. Does the elusive challenge of finding and providing these TAR metrics mean we are doomed not to be able to use TAR? No, it means clearly that lawyers, who have that rare blend of statistical training and court room experience, have a huge advantage in arguing in discovery disputes. The fact is there are boatloads of metrics that a smart litigator can use to defend their process. One analogous observation is the joke people have about statisticians. That two statisticians can argue forever that they are right on opposite sides of many issues by using statistics. We are going to have these arguments given the fact that the richness of collections vary and disputes have different values so these metrics will be moving targets. But since lawyers lead this argument, it is easy to see that if you put a lawyer in a room who understands this subject matter, against a lawyer who doesn’t, guess who is going to sound more reasonable? This is often the underlying standard being aimed for in discovery today. The analogy I have been giving people is that this match up tends to make the math adverse litigator look very much like a person on their first date; very unsure of how things are going and also what they should do when it comes time to present an adverse argument.

My advice to corporations who want to do TAR, and the recent FTI White Paper study suggests that over 60% of in-house attorneys want to do this versus 30% of outside lawyers, is to find a lawyer who can do this effectively. Advice From Counsel: Can Predictive Coding Deliver On Its Promise? FTI Consulting, By Ari Kaplan and Joe Looby. That is where the best investment of time can be made by organizations who want to use TAR. In contrast, most organizations vet tools ad nauseam, but for me, it’s less about the tool and more about the lawyer who can discuss the tool and its approach intelligently. This is because even though tools differ and workflows using them differ, if the lawyer running the project cannot advocate for the tool selected, there is an pretty fair chance they will have problems making coherent arguments for their approach. This holds true in meet in confers,in front of the court or even when they used their selected tool in the most efficient manner.

Two good examples of this on opposite ends of the spectrum are the 600 pages of testimony in the Kleen Prods. LLC v. Packaging Corp. of Am. N.D. Ill., No. 10-cv-05711, complaint filed 9/9/10 case which never could seem to intelligently focus in on the important issues as lawyers could not elicit clear testimony from their experts to help them with their position. Perhaps this was just strategy to confuse the judge and all the interested parties reading the materials. It might also be examples of lawyers discussing topics which they are not comfortable with. Contrast that with the Global Aerospace Inc. v. Landow Aviation LP Va. Cir. Ct. (Loudon Cty.), Consolidated Case No. CL 61040, Order Approving the Use of Predictive Coding in Discovery entered 4/23/12 case where a Carnegie Mellon trained chemical engineer, who happens to be a first chair trial lawyer, argued that statistics and sampling were a more reasonable way to approach the 2 million document collection and conduct a relevance review. If you read the transcript or listen to the audio file of the argument, you will see it was a fairly one sided discussion as the technology trained litigator comfortably was able to use metrics and industry statistics to make a compelling argument for using TAR.

So the conclusion of this series of articles is that, of course, metrics are important in TAR and are not for weenies. The real weenies might be the parties who do not understand TAR is chock full of metrics and when they encounter a lawyer who does, guess who is going to end up having sand kicked in their face? Given the pace of change in the field, this might take a few years, but law firms’ litigation departments might want to control this risk by hiring a few litigators who are comfortably able to argue positions based on statistics and sampling.

My last two posts have focused on the predictive coding metrics that so many eDiscovery professionals are waiting for with bated breath. What is the real problem here? It’s not that we don’t have standards which are reasonableness or proportionality, or that we don’t have metrics which are present almost everywhere you look when considering these tools. The problem is that lawyers don’t think in large enough numbers to understand the meaning of the basic metrics they have in front of them. Most of the basic metrics can be validated by simple sampling principles to not only make an argument that their approach is reasonable, but also to know when to disagree with an opponent in the heat of battle. It is much easier for an opponent of TAR to give a blanket statement about not wanting to miss any documents and having smart people, instead of computers, look at the documents because that is the surest way not to miss responsive ESI. By the way, there are no reasonable metrics in that position because there is never perfection with the search for relevant ESI, and not a single number can be shown in practice that supports this position!!

Even without metrics, we all know that there is way too much information to review in most collections and we need a way to find what to look at because litigation holds are cesspools of unstructured data. Intuitively, it would seem apparent that using technology is the only way to search for data to review. As a result, lawyers have used date ranges, file types, custodian selection, and key words to try and cull what to be reviewed as the only way to avoid looking at way too much unresponsive data. These are all forms of technology assisted review because the underlying metadata or key word hit is revealed based on using technology to filter the documents.

We do have plenty of metrics showing that this approach does not work very well. Maura Grossman and Professor Gordon Cormack’s ground breaking article Technology-Assisted Review in E-Discovery Can Be More Effective and Efficient Than Exhaustive and Manual Review, XVII Rich. J.L. & Tech. 11 (2011), http://jolt.richmond.edu/v1713/articlee11.pdf revealed how much more precise machines assisted review results were in TREC in identifying responsive documents. It was also found that they were marginally better in finding responsive documents than traditional manual review approaches as well.

The Electronic Discovery Institute’s study overseen by Herb Roitblat, Anne Kershaw and Patrick Oot showed similar results but elected to say TAR was at least as good as human review to be less controversial, given the huge cost advantage TAR has in terms of time and money. See Document Categorization in Legal Discovery: Computer Classification vs. Manual Review, Journal of the American Society for Information Science and Technology, by Herbert L. Roitblat, Anne Kershaw and Patrick Oot, Vol 41 No. 1, 2009.

The infamous Blair Maron study from 1985 showed just how poorly key word searching does in the aggregate. Seasoned litigators and paralegals using key word searching estimated they found 75% of the documents in a collection but, in fact, found only 20% of the documents (Blair and Maron, Communications of the ACM, 28, 1985, 289-299). There is also the question of consistency within a set of reviewers. There are studies which show that people on average agree with each other 50% of the time, or a coin toss, and that number drops to 30% when a third reviewer is added. Roitblat, H. L., Kershaw, A. & Oot, P. (2010). Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, Journal of the American Society for Information Science and Technology, 61(1):70-80.

Add to the mix the Rand Institute’s recent study on the economic costs with $.73 of every eDiscovery dollar being spent on review – dwarfing the costs being spent on collecting and processing the data – and reasonableness arguments for using TAR based on a metrics avalanche against the status quo is impressive to most everyone I speak to. Everyone, that is, except litigators in the trenches and their clients who need to be able to argue they have done enough with TAR or not but perhaps don’t understand the metrics well enough. See the not for profit Rand Institute for Civil Justice study entitled Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery, by Nicholas M. Pace and Laura Zakaras, Santa Monica, CA: RAND Corporation, 2012. http://www.rand.org/pubs/monographs/MG1208.

I will explore predictive coding metrics and some of the real world cases in my next post.

My recent piece, “Predictive Coding Metrics are for Weenies – Part I,” looked at how those who want metrics that will suddenly “validate” predictive coding are going to get left behind waiting for that validation. To examine the fence sitters’ concerns more closely, I agree it would be nice to know in advance if the number of random sample documents your TAR system uses is enough to train it adequately. If the system is looking at 5,000 documents as a training set, is that enough? Or, should it be something smaller, such as 2,000 documents? Or whether the final recall rate of responsive documents found should be an estimated 70, 80, or 90 percent of the total responsive documents in the collection (recall is the measure used to determine what percentage of responsive documents were found out of the total estimated number of documents in the population). Some TAR systems rank documents based on their likelihood of being responsive, so another helpful metric would be whether documents, which have a score above X with your predictive coding system, are presumptively responsive and conversely, whether documents which have a score below Y are presumptively not responsive. These types of metrics ARE NOT LIKELY to emerge for a number of reasons.

First, lawyers rely on published opinions for precedential guidance, but most cases eventually reach some form of agreement on discovery issues that do not provide much guidance to the legal community as a whole. When lawyers can’t reach an agreement and a judge decides the issue, there are very few appellate court opinions that will challenge that judge or special master’s opinion, when compared to the amount of litigation, because discovery issues are seldom appealed. Even if there were opinions that emerged, a more important factor is the quality of collections and richness of the underlying data that will vary depending on factors which will differ across organizations and people.

I can’t see how uniform metric standards can easily emerge here to turn TAR into the equivalent of an “easy button”. What we are stuck with is the need to identify a lawyer’s least favorite standard, “reasonableness,” and its close eDiscovery cousin “proportionality,” based on the particular case and the types of data you are evaluating and the math-oriented results which are emerging. You then need to make the argument to the other side and the court if necessary that your chosen strategy is “reasonable”. So the metrics will likely remain nebulous and will depend on the case.

]]>http://docreviewmd.com/?feed=rss2&p=1380Judicial Activism with Predictive Coding- All In The Name of “Just”, “Speedy” and “Inexpensive” Discoveryhttp://docreviewmd.com/?p=140
http://docreviewmd.com/?p=140#commentsThu, 07 Nov 2013 16:35:41 +0000docreviewadminhttp://docreviewmd.com/?p=140

Originally posted on October 31, 2012

Judicial Activism with Predictive Coding – All In The Name of “Just”, “Speedy” and “Inexpensive” Discovery

I had a spirited discussion on Friday with a litigator about whether it was appropriate for Honorable J. Travis Laster from the Delaware Chancery Court to push the parties to use predictive coding when apparently neither of the parties sought to use predictive coding in EORHB, Inc., et al v. HOA Holdings, LLC, C.A. No. 7409-VCL (Del. Ch. Oct. 15, 2012). I decided to take some time to think about this outcome since I have been very public in my support of using predictive coding to reduce many of the ills of the litigation system, e.g. lengthy discovery disputes, excessive costs and uneven quality of ESI productions. Yet I am also on record supporting the outcome of Kleen Products where the plaintiffs failed in forcing the defendants to use predictive coding because they had what appeared to be an uphill battle to impose their discovery will on their adversary. I also generally believe parties should be able to chart their own course for how to proceed with discovery.

So what makes the Delaware case different from Kleen Products? It appears that discovery is about to start in this case so, unlike Kleen Products, neither party has expended tremendous resources pursuing a discovery strategy. In addition, the Rules of Civil Procedure strongly support this Judge’s suggested course of action. The Delaware Chancery Courts’s rules of Civil Procedure state:

“Rule 1. Scope and purpose of Rules. These Rules shall govern the procedure in the Court of Chancery of the State of Delaware with the exceptions stated in Rule 81. They shall be construed and administered to secure the just, speedy and inexpensive determination of every proceeding.”

This language mimics Rule 1 in the Federal Rules of Civil Procedure. Rule 1 is important as it sets the tone for the entire set of Rules of Civil Procedure. So in any federal court, as well as state courts which start their rules of civil procedure with a general structure focused on “just”, “speedy” and “inexpensive”, the use of, and experimentation with, advanced TAR techniques like predictive coding should be encouraged by both parties and judges when it is appropriate (not all collections are suitable for methods like predictive coding; for example, audio and CAD files do not cluster).

Judges are lawyers too and charged to neutrally “administer” the Rules of Civil Procedure and attempt to help resolve disputes fairly in accordance with these rules. Taken broadly, it comes as no surprise that a judge might encourage parties to use predictive coding to secure a “just, speedy, and inexpensive” discovery outcome. This is because predictive coding used correctly, promises to reduce costs and turn over at least as much responsive ESI and less unresponsive ESI than our current eyeballs-on-every-document approach.

I also have a strong caveat, however in reaching this conclusion. I do have significant concerns if this result goes beyond encouraging sporadic experimentation it could be an unintended disaster if every judge ordered the use of predictive coding. As someone who spends a good deal of time working with lawyers and educating them on using predictive coding, I can say there is not enough knowledge in most law firms on how to use these processes and tools without significant vendor support. Using technology is seldom a smooth path in litigation and lawyers producing ESI need more experience in order to understand how to react to issues that will occur in predictive coding projects. Review teams will need to know how to assess validation statistics, use common sense judgment, and be ready to show that they have provided a reasonable discovery outcome. Yet there are also not enough vendor resources to handle every case in the country as most vendors do not have a deep bench of project managers and their technical experts are stretched thin covering several hundred cases around the country. While judicial encouragement of predictive coding is great and absolutely necessary, blind encouragement could be dangerous.

A final thought about this case is the Judge in addition to ordering the use of predictive coding unless the parties object, also mandates that they chose a single vendor to host the data. This ‘go use predictive coding’ without a defined process is a recipe for unfair/unjust results. Who does the vendor really work for? How can that vendor provide any real support without favoring one side over the other? In the majority of cases that I have seen, the vendor is playing a significant role in assisting the lawyers with assessing the predictive coding results. So there is a real challenge in finding a neutral vendor who can help both sides. Who pays for the vendor? In this case, this hubbub over predictive coding is from one last second, ad hoc order, pages 66-67 with no room for rebuttal or disagreement. There is an argument to be made that this is a well-intentioned, but possibly uneducated bench that is forcing parties to use and pay for an undefined, black box marketing label.

The encouragement of experimentation with predictive coding was the exact conclusion of the Rand Corporation’s Study on predictive coding by Nicholas M. Pace and Laura Zakaras. Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery Santa Monica, CA: RAND Corporation, 2012. This study, after building a compelling case for the impossible challenge of handling growing volumes of ESI with traditional review and eyeballs on every document, concluded that the only way to solve these issues is by having lawyers experiment with more technology solutions like predictive coding. That same type of experimentation is what Judge Laster requested the parties to do in discovery in his case. I have no problem with this outcome as a single case or in a handful of other instances if other judges are interested in fostering the use of more technology to reduce discovery burdens on parties and the questions I raised above about neutral vendors are answered. To the parties involved, good luck using the tools or crafting an argument on why you shouldn’t have to do so. I am sure many of the country’s litigators will be interested in watching how discovery proceeds in this case and whether its results are “just”, “speedy” and “inexpensive”.

I recently read an assessment from a morning networking meeting held in Chicago about Technology Assisted Review (TAR), that those in attendance believed we needed more metrics around predictive coding tools to help end users grasp how to effectively use them. This assessment struck me as bizarre because metrics are the very essence of TAR.

Common metrics encountered with TAR are the “richness” of a collection,( e.g. how many documents are relevant in a collection being reviewed versus the total population of documents in the collection), and “recall rates” or how many documents the TAR system is finding when compared to the bench mark rate of expert human reviewers on the same sample of documents. In addition to this, the decision of when to stop or to continue training a TAR system, which is likely based on some combination of objective metrics and some human intuition, and the underlying algorithms for taking the training and applying the results to create complex searches to select more potentially responsive documents without additional human review, are all based on math or metrics of some form.

Thus, I am not convinced that a lack of metrics is what the TAR fence-sitters are objecting to. What I hear in my own discussions with the majority of lawyers on the fence about TAR, both in-house and outside counsel, is that we do not know when we have done a good enough job to convince the other side, and even ourselves, that we can stop. No one wants to make that decision and be scrutinized, without agreed upon metrics or benchmarks to look to.

Put another way, the desire for metrics could also be seen as risk aversion because there is a general tendency among lawyers not to be the first lawyer to try a new technique. In a world where there is not enough tough love unless you were raised by a “Tiger Mom”, I want to emphatically say, waiting for the metrics to appear that will turn predictive coding tools into an easy button is a futile exercise. It is a great example of what is wrong with lawyers when risk aversion impedes progress. Looking for black letter law in an area which can never have black letter law is the type of approach which can keep your eDiscovery efforts stuck in the dark ages. This will also put your organization at a greater risk if you encounter an opposing lawyer who can comfortably work with TAR’s metrics to make their own reasonableness arguments.

This is a continuation of my post from last week discussing “The E’s” of Predictive Coding

Before I get a string of emails from my fellow experts and friends in the field, I must add that creating new processes in law is never easy because of the barriers pointed out earlier in my last post. Also, the additional skills of understanding technology, statistics and law does make each of us experts, even if we are throwing out competing preferred routes to complete a review and confusing the marketplace. Just like your GPS, the customers end up with three suggested routes to choose from and they all get you from A to Z crossing different types of terrain. But, since lawyers are trained to avoid risk and would rather use “precedent” as opposed to coming up with new solutions, there is a risk of following blindly a single route as gospel without doing some homework about the different routes and their applicability for the specific matter at hand.

Short of looking at TAR when a project is imminent and simply peering into this fragmented market for answers and training on the job, how does a lawyer get educated on TAR and reduce the risks of making mistakes? Some organizations centralize the function of eDiscovery and hope the various litigators in the firm will call the expert when there is eDiscovery. Success with centralization varies and depends on how diligently the organization’s litigators engage the eDiscovery department or litigation support department early on in the process. Importantly, those litigation support departments also have the challenge of how to get educated on these tools. Others go to education groups and take tests for certification to create an aura of expertise. But, I always wonder who taught these new experts, and this is an important factor to consider in this new field. Probably the best groups able to help are the consultants who are compiling data and studying the field across multiple vendors and clients. The consultants are able to identify different approaches because their job is to obtain a broad view over the industry without bias. This may be one of the first times I have ever seen in litigation where the need for consultants has been so critical to eDiscovery. It had been easy to hire technologists to assist with preservation, collections and processing, but predictive coding fits right in the EDRM box where the litigator’s initials are.

Thus, the entrepreneurs among us are critical in the development of new processes. That is why I admire my colleagues, friends and competitors who are pushing the field forward and experimenting. But, before we get to a point of anointing someone as “The Expert”, who lawyers everywhere should blindly follow, the legal community should look broadly and catalogue the actual experiences in the field of others to figure out what has been working and compare that to what has instead resulted in goofy outcomes. This is something consultants can do. Then, iteratively push these experiments in other directions. I know that is what I have been trying to do. I am not ashamed to drop biases from my experiences and say I think these tools are GPS’s and there are many ways to solve a problem using these tools.

Ultimately, I disagree with Craig’s witty observations of creating this association, because an association needs some consensus to create the standards that Craig seeks and that I agree would be helpful. From my perspective talking to companies, lawyers, working in cases and evaluating technologies though, we are not close to that point yet. Frankly, there are not yet enough lawyers using the tools to come up with clear standards. The desire for standards is made even more challenging because of differing data sets, different tools, differing richness levels, and differing case profiles. The best standard may be common sense and reasonableness. Personally, I prefer the federalism approach of experimenting, educating and evangelizing to get the word out.

One last pet peeve which I expanded on in a Forbes article Legal Hydra: Top Ten Tips To Become More Proficient With Machine-Assisted Review is that we need to look wider in the field for expertise than just lawyers in law firms. If you try to get technology advice from most lawyers, you might as well ask for stock tips at the same time. The quality of the advice is going to vary tremendously. Look beyond listening just to lawyers at CLE’s. CLE’s should actively encourage vendors and consultants to participate in the dialogue as we add skills, which are clearly relevant to this debate. Good luck everyone. It’s time for class to begin and to start getting Educated!!

Nothing with law and technology combined comes with ease (or “E‘s” as I put in the title). It should come as no surprise, then, that one of the biggest challenges in the technology-assisted review (TAR) space is how to educate a critical mass of lawyers about how to work with newer technologies.

TAR methodologies require some level of comfort with statistics in order for lawyers to validate results and be comfortable certifying the completeness of their review and production. After two years of pushing hard to get lawyers to accept these tools, I am convinced we need less improvement in the tools today and more improvement in the competency of lawyers to successfully run TAR projects.

Educated lawyers will be able to use advanced TAR methodologies and better deal with uninformed adversaries, potentially Luddite judges who hate discovery disputes, and the issues that occur when the technology’s “buggyness” or quirks arise. We all know there are times when data doesn’t get processed smoothly or people make mistakes. I am completing a more detailed position paper within eDJ Group’s subscription research site – eDiscoveryMatrix.com – on this topic, but want to share some of the important aspects of this position in this and the follow-up article.

If the most pressing need to push TAR forward is more education (our first “E“), a logical question to ask is: where should one get educated in this field?

Ordinarily, one would get educated by looking to the experts to hold the lawyers’ hands through initial projects as the ideal TAR protocols are developed. To date we have a number of proclaimed experts who have spent the last few years working to tout the virtues of predictive coding and TAR.

Craig Ball recently discussed the education dilemma facing the legal community. One of the dilemmas he pointed out, after identifying many of us by first name, is that we (the “experts”) appear to have biases toward different approaches and technologies. Craig says we should model our industry after the milk association and come up with “easy to understand” standards for a confused marketplace.

It’s hard to argue with the wholesomeness of milk, but I see a real problem with this approach. While we may all be experts to some degree, I would classify us instead as largely entrepreneurs with limited perspectives in trying different approaches and tools.

We understand how poor the current model is:

Guessing at key words,

Throwing bodies at review, and

Producing the curds (using our milk analogy) that comes out.

So the “experts”, from our various positions armed with data from a number of sources such as TREC, Blair Maron, and other related follow-up studies, went out in the field and have been trying to educate our brethren while simultaneously working on projects experimenting with new techniques. Usually the “experts” work with only a single preferred tool.

The result is that we are risk takers in the legal field because we are willing to take a stand when we already know the model has to change. However, we need to be careful to recognize that the expertise is still based on a narrow set of experiences across very few differing systems.

It’s my personal opinion that this is what Craig was pointing out and why there are confusing, diverse messages coming from the experts on the best approaches, workflows and tools. To hear the differences of opinion, one needs to go no further than listening to the excellent panel I moderated and put together at the Carmel Valley eDiscovery Retreat back on July 24, 2012. I had five other experts and some fellow potential milk association members join me in a discussion on best practices with predictive coding. Maura Grossman, Tom Gricks, Bennett Borden, Herb Roitblat and Dave Lewis had a spirited dialogue discussing different workflows and approaches using analytical review tools including predictive coding. The panel’s preferences and biases come out clear as day in this podcast. Hear Predictive Coding Power User Panel from Carmel.

I will continue to discuss the “E‘s” of predictive coding in my next post.