Oot opens by pointing out that the real goal in e-discovery is justice. Rule 1 of the FRCP references securing “the just, speedy, and inexpensive determination of every action and proceeding.”

Start with the notion that assessing relevancy is difficult. Oot references his involvement in Verizon acquisition of MCI. They used traditional 2nd request review process with much manual review. 83 custodians, 2.3 million documents, 2 law firms involved with one deploying 115 lawyers and the 2nd deploying 110 lawyers to conduct privilege and relevance review. It took four months of long days. The cost of document review was just shy of $13.5 million. Note that this matter was not big by today’s standards. FTC would not allow the parties to use key word searches to narrow the document review. “There’s got to a better way to do this than all the human review.”

Oot and Kershaw started the eDiscovery Institute (EDI) to study if there is a better way to conduct document review. Kershaw now summarizes the Institute: The idea started a few years ago with a private review Kershaw did comparing two approaches to document review. Judges and others wanted more data to compare approaches. Work today has just scratched the surface – much remains to be done. Institute is a not-for-profit and is set up to do additional studies to ease the pain of conducting litigation.

EDI’s first study compared traditional doc review with an electronically assisted process. EDI will publish a white paper in early 2008; it will be peer-reviewed and available freely. Views EDI as unique organization to provide factual information (Sedonna focuses on princicples). Pfizer and Verizon are current sponsors but EDI seeks additional sponsors. EDI will not be a vendor or process certification organization – it will report on factual findings.

QUESTIONS EDI WILL ADDRESS
– Should a party consider alternative methods to brute force review?
– Is computer assisted relevancy assessment reasonable under the Rules?
– Is any process reasonable?

Roitblatt describes study: Quantitative measurement is key. References the seminal Blair-Moran 1985 study that found that researchers are only 20% accurate in finding docs but thought they were 80% accurate. The way to measure accuracy is to measure actual performance against the “the truth.” You have to approximate the truth. [Editor: in medicine, this might be called the gold standdard.] How do you define the “baseline” of the objectively or widely accepted definition of relevance of each document. Must consider both false positives and false negatives. Precision is percent of docs selected that are truly relevant. Recall is percent of relevant docs actually retrieved. Elusion is percent of docs not retrieved that are relevant.

Key question is what we can actually measure? What are the appropriate “power tools” for e-discovery (versus manual review)? To answer, start by looking at ESI review process: training, case background, examples combined with experience lead to judgments of whether a document is responsive or not. In a 2nd tier review, typically reviewers only look at what first round designated as responsive. So two tier review has problem that relevance calls on first round are not necessarily carefully reviewed.

How does a computer get experience to separate responsive from non-responsive docs? It’s all just mathematics. The competition among vendors is who has the better math. The process with computers is based on rules, text, and math applied to docs. Computer approach may “recurse,” that is, adjust its process based on feedback from human reviewers. For study, “true” designation of document is based on original work of MCI-Verizon team.

Roitblatt describes the famous Turing test for artificial intelligence: can a human tell the difference between a computer and a human in a text interface, interactive conversation. By extension, a computer aided review should be comparable to a human review.

PROVISIONAL RESULTS OF STUDY: 4 computer systems agreement with original attorney review ranged from 72% to 88%. (For this comparison, the original review is considered as the “truth.”) Note that in human reviews, where there are multiple human reviewers, rate of agreement among the humans is typically lower.

Question to Judge: what happens when issues of best method come to the court. The Judge says it’s better for the parties to collaborate to come to a shared view on this topic. Disagreement should be aired at 16b conference. Legal system requires reasonableness, not precision. Plus it requires reasonable cost. [Editor: this begs the question of how precise is precise enough to be reasonable.]

Brickell: it is no longer reasonable to presume humans should review all the documents.
Craig Ball: If parties cooperate, they can agree on a reasonable method. Cautions that studies show that human review, by some measures, are only 40% accurate. Should computers be designed to make the same errors that humans make? Compares issues here to Google ranking, where links, which are made by humans (at least in theory), are a form of group voting.
Kershaw: Many discovery requests presume way too much is relevant. These studies may help narrow scope of what we generally consider as relevant. Low accuracy of human review reflects inconsistent judgment and fatigue.

[Editor’s note: In Thoughts on Full Text Retrieval (a KM and litigation support topic) (July 2003), I noted that “What we need as a profession is a mechanism to perform real-world tests, both on how the search tools perform under the most favorable conditions and how they work when actual users operate them. Unfortunately, this is costly and the incentives and structures to do so just do not exist.” It’s great to see that this is finally beginning to happen.]