We can talk to our phones and they will open a website, search the web, dial a number, and more. Both Alexa and Siri will change the channels on our televisions, find a movie for us, and even respond with a wry, witty comment. Certainly, we must be on the verge of a Star Trek-like computer?

We’re not.

For all the glitz and gee-whizziness, our computers do not understand us. Those fancy voice interfaces are little more than immense lookup tables guided by complex statistics. It makes for a good show, but Star Trek remains many light years away.

The researchers looked closely at recent progress on handling Winograd Schemas, a standard test for detecting an AI’s ability to mimic common sense reasoning. Some systems claim 90% accuracy.

AWinograd Schema is a pair of sentences that alters the intended meaning by changing just one word. For example:

– The trophy doesn’t fit into the brown suitcase because it’s too large.

– The trophy doesn’t fit into the brown suitcase because it’s too small.

In the first sentence, the pronoun—”it” —refers to the trophy; in the second, the suitcase. Machine learning typically has difficulty with making such a common-sense distinction. The Winograd Schema Challenge contains 273 such carefully crafted sentence pairs.

AI2 researchers tested state-of-the-art systems against a much larger dataset, one containing roughly 44,000 sentence pairs. They found that common sense was still missing:

When they tested state-of-the-art models on these new problems, performance fell to between 59.4% and 79.1%. By contrast, humans still reached 94% accuracy. This means a high score on the original Winograd test is likely inflated. “It’s just a data-set-specific achievement, not a general-task achievement,” says Yejin Choi, an associate professor at the University of Washington and a senior research manager at AI2, who led the research.

Often the amazing results of AI depend on something called “dataset-specific bias.” In lay terms, the programmers see a pattern in the data that helps solve the problem. For example, IBM’s Watson performed well at Jeopardy, in part, because researchers discovered that stored Wikipedia entries contained the answers to many questions:

Infoboxes, the (now well-known) tables of facts that accompany Wikipedia pages, for instance, are generated automatically from an IE system, applied to Wikipedia pages, populating the DBPedia Knowledge Base. Watson makes use of these existing DBPedia relations in DeepQA, which is sensible because much of the information used by Watson to answer Jeopardy! questions comes from online sources like Wikipedia (more on this later). The structured resources like DBPedia can then be exploited without the need to manage separate efforts developing knowledge resources for use by the system.

This is the difference between computers and humans: We understand; computers regurgitate. We read, evaluate, and make decisions. Computers operate according to patterns and rules. While I expect those patterns and rules to improve, without a conscious mind, computers will not get past regurgitation. It takes a mind to know things and to have common sense.

Note: It’s just as well we are not heading for Star Trek computers. This episode of the original series aired in 1970: “Kirk and a sub-skeleton crew are ordered to test out an advanced artificially intelligent control system – the M-5 Multitronic system, which could potentially render them all redundant.”

Further reading on Winograd schemas:

AI is no match for ambiguity: Many simple sentences confuse AI but not humans (Robert J. Marks)

and

Computers’ stupidity makes them dangerous: The real danger today is not that computers are smarter than us, but that we think computers are smarter than us

Also: Why did Watson think Toronto was in the USA? How that happened tells us a lot about what AI can and can’t do, to this day.

Brendan Dixon

Fellow, Walter Bradley Center for Natural & Artificial Intelligence

Brendan Dixon is a Software Architect with experience designing, creating, and managing projects of all sizes. His first foray into Artificial Intelligence was in the 1980s when he built an Expert System to assist in the diagnosis of software problems at IBM. Since then, he’s worked both as a Principal Engineer and Development Manager for industry leaders, such as Microsoft and Amazon, and numerous start-ups. While he spent most of that time other types of software, he’s remained engaged and interested in Artificial Intelligence.

Mind Matters features original news and analysis at the intersection of artificial and natural intelligence. Through articles and podcasts, it explores issues, challenges, and controversies relating to human and artificial intelligence from a perspective that values the unique capabilities of human beings. Mind Matters is published by the Walter Bradley Center for Natural and Artificial Intelligence.