AI in Law: Never? Some day? Now?

I’m referring to the widespread attitude towards artificial intelligence in law, to the effect that we don’t have to do anything about it yet because the technology has yet to prove itself. Some day, maybe, and we’ll worry about it then. Maybe.

Time’s up.

Across my desk this week came a just-released report from Blue Hill, the Boston-based research and advisory firm specializing in enterprise technology: Their particular foci include AI, cloud, neural science and machine learning, mobile, and security technologies, and perhaps not surprisingly they conduct research and write reports on legal technology on a fairly regular basis.

The growing availability and practicality of artificial intelligence (AI) technologies such as machine learning and Natural Language processing within the legal sector has created a new class of tools that assist legal analysis within activities like legal research, discovery and document review, and contract review. Often, the promised value of these tools is significant, while lingering cultural reluctance and skepticism within the legal profession can lead to hyperbolic reactions to so-called “robot lawyers,” both positive and negative.

To confront this chasm, Blue Hill did what any serious researcher would do: Set up, conducted, and reported the results of an experiment comparing AI-enabled tools to conventional ones on about as level a playing field as it’s possible to imagine.

Specifically, they compared the ROSS Intelligence tool to Lexis and Westlaw, using both Boolean and natural language search. (“Boolean” is basically the familiar keyword approach, allowing you to specify required words, impermissible words, and combinations using “and” and “or.” Natural language is what you’re reading.)

While I’m not formally trained in objective research protocols, their setup of the research struck me as scrupulous:

Sixteen experienced legal research professionals were randomly assigned into four groups of four apiece.

Each received a standard set of seven questions designed to emulate real-world queries practicing lawyers would pose.

For consistency, the subject matter was limited to US federal bankruptcy law.

Each of the sixteen was asked to research and provide a written answer to the legal question posed, with in a two-hour time limit.

Although experienced in legal research generally, none of the sixteen had more than passing acquaintance with bankruptcy law.

And each was assigned to a tool (Lexis, Westlaw, or ROSS) that they were largely unfamiliar with. (Westlaw mavens were sicced on Lexis and vice versa; presumably none were familiar with ROSS out of the box.)

To evaluate the results, Blue Hill measured the time each spent (a) researching; and (b) writing their responses. They also compared:

Information retrieval quality: What portion of the total results retrieved were drawn from truly relevant sources, what portion of all the items presented were themselves relevant, and were the most relevant placed at the top of the list.

User satisfaction: How easy was the tool to use and how much confidence did the researcher have in the results.

Efficiency: Time it took to arrive at a satisfactory answer.

Finally, for simplicity and ease of comparison in evaluating quality and relevance of the search tools, Blue Hill took into account only the first 20 results produced in response to each research query.

Recall our three contestants are the new entrant ROSS, the challenger (in Boolean and natural language incarnations) and the composite Lexis/Westlaw incumbent and reigning champions, in Boolean and natural language flavors.

1 Comment

A reader who prefers anonymity (but who knows to a fare-thee-well whereof he speaks) wrote me as follows, verbatim:

I read the Blue Hill study as so flawed as to be meaningless. The follow-up in Bob Ambrogi’s blog wasn’t particularly comforting either.

ROSS alone was not one of the protocols – only ROSS in combination with Wexis. Impossible to distinguish the relative contributions of each.

Arguably an unrealistic scenario – a busy lawyer would not do both.

The researchers, though generically “experienced” were specifically ignorant of the tools they were assigned.

We don’t know the searches done or results found – i.e., none of the experimental data was revealed. When I ran ROSS’s standard demo search (as in the Vanderbilt video) I got more and better results from Google Scholar than are shown in the video.

ROSS doesn’t make the case for AI in legal research. It’s a nice enough search engine, on a trivially small data set, with intensive investment in training. As a commercial matter, Wexis will blow past whenever they get around to it.