The newsletter/column excerpted below was originally published in 1998. Some of the specific references are obviously very dated. But the general points about the requirements for successful natural language computer interfaces still hold true. Less progress has been made in the intervening decade-plus than I would have hoped, but some recent efforts — especially in the area of search-over-business-intelligence — are at least mildly encouraging. Emphasis added.

For example, Artificial Intelligence Corporation’s Intellect was a natural language DBMS query/reporting/charting tool. It was actually a pretty good product. But it’s infamous among industry insiders as the product for which IBM, in one of its first software licensing deals, got about 1700 trial installations — and less than a 1% sales close rate. Even its successor, Linguistic Technologies’ English Wizard*, doesn’t seem to be attracting many customers, despite consistently good product reviews.

*These days (i.e., in 2009) it’s owned by Progress and called EasyAsk. It still doesn’t seem to be selling well.

Another example was HAL, the natural language command interface to 1-2-3. HAL is the product that first made Bill Gross (subsequently the founder of Knowledge Adventure and idealab!) and his brother Larry famous. However, it achieved no success*, and was quickly dropped from Lotus’ product line.

*I loved the product personally. But I was sadly alone.

In retrospect, it’s obvious why natural language interfaces failed. First of all, they offered little advantage over the forms-and-menus paradigm that dominated enterprise computing in both the online-character-based and client-server-GUI eras. If you couldn’t meet an application need with forms and menus, you couldn’t meet it with natural language either.

Even worse, NL actually had a couple of clear disadvantages versus traditional interfaces. First of all, it required (ick!) typing, often more typing than the forms and menus did. Second, forms and menus tell the user exactly what he can do. Natural language, however, lets him give orders the computer doesn’t know how to follow. This is inefficient, not to mention frustrating.

However, even in 1983, it was obvious that the typing objection would go away some day, because of speech recognition — once desktop computers reached 100 MIPs or so. (Effective keyboard-replacement speech recognition – as opposed to true natural language understanding — is mainly a matter of processing power.) 15 years later, standard PCs exceed 100 MIPs (assuming that 1 MIPs = a couple of megahertz for these purposes), and speech recognition is indeed getting practical.

In fact, as become increasingly evident recently, speech recognition is now a hot technology. Bill Gates has been talking it up for a couple of years. Increasingly, the press has swung to believing him … And my parents just bought a PC with two speech recognition products on it.

That said, speech recognition is as misunderstood (no pun intended) as most artificial intelligence technologies. Yes, it beats typing, in a number of circumstances:

People simply reluctant to type (e.g., anybody with sufficient wrist or back problems, and many males over the age of 45)

But before our computers talk back and forth with us in the voice of Majel Barrett Roddenberry, applications are going to have to add several important elements required for truly functional natural-language interfaces:

Intuitively clear names for everything on (or just behind) the screen

Application-specific disambiguation logic

For most practical purposes, the latter requirement equates to

A new generation of document selection technology

THE RULE OF NAMES

According to legend, knowing something’s name gives you power over it. When that “something” is a button or menu choice on a speech-enabled computer, the legend is literally true. But when a feature doesn’t have an obvious name, you can’t easily invoke it.

When applications consisted mainly of forms and menus, this was rarely a problem. Everything had a clear role and label. But web pages are less organized. Hyperlinks can be scattered all over the place, with little rhyme or reason.

Frankly, I don’t think this is a hard problem to solve. It wouldn’t take a lot of XML to divide the page into clear regions, so that commands like “Show me article #3″ (on a search results list) could be interpreted in the obvious way. But it does take at least some discipline; random web pages will not necessarily be easy to “talk” to.

CYBERNETIC LISTENING SKILLS

The bigger challenge is to make sure that the application can respond in some useful way, no matter what command it’s given. This is even more difficult than it was 15 years ago, because of the radical increase in “casual” computer usage. In the old days, we could assume the user had some clear business reason for using the application, and if necessary that s/he had time to be trained (even if people rarely sat still for as much training as they really needed). Therefore, we could at least assume that the users had at least a general idea of what the application did, and hence of which commands the computer could obey. From an NL standpoint, we could assume that what they actually “said” (which in those days meant “typed”) was at least reasonably close to what they were “supposed” to say.

Now, however, some of the most important applications are internet e-commerce and portals, competing and begging for the user’s attention. The user is there strictly on a voluntary basis, and if he doesn’t get immediate gratification, he‘s gone, history, hasta la bye-bye. Site-specific training isn’t even a consideration. And even if somebody did actually take a class on “How to use Excite,” the knowledge would be obsolete in six months. So applications, if they are to have natural language interfaces that please and respond to users, have to be able to respond pretty much to any command.

Ideally, voice-enabled systems would be like the computers on Star Trek, which can return information from vast archives, brew a pot of Earl Grey tea, play three parts of a quartet, create self-aware life forms, or answer questions like “Computer, what is the nature of the universe?” More realistically, they should be able, for example, to respond to a command like “Tell me about flights to Miami” by automatically giving the user a travel-reservation application or web page, and entering Miami in the appropriate form field.

If one thinks about the complications in such a system, it becomes clear that there are only two possible ways an application system can be designed to respond meaningfully to an enormous range of reasonable possible requests.

1. It can do the equivalent of saying “I’m sorry, I didn’t understand that,” “I’m sorry, I can’t do that,” and so on.

2. It can interpret many commands as text-search strings, and return appropriate results.

The first strategy – application-specific disambiguation logic, clear responses to “errors,” etc. — is absolutely necessary. No software is perfectly intelligent; the user will have to be asked for disambiguation help from time to time (just as clerks today ask customers to repeat their requests!). I’m not going to go into much detail about how that works because, frankly, it’s a tricky thing to get right. Users hate unnecessary disambiguation steps. They also hate the incorrect responses that result from ambiguity, and do tolerate being asked for help when it’s truly needed. In short, whatever you build the first time around will probably be wrong. So build something fast; then run, don’t walk, to the nearest usability lab, find out how you screwed up, and redo your system until you get it right.

I’m convinced that the second strategy — heavy reliance on text search technology — is a requirement as well. Just try to name a major web site that doesn’t use text search. True, text search has gotten a bad rap recently, mainly because a whole generation of search engines didn’t really work. But it will stage a comeback.

Comments

[…] Actually, there’s a pretty well-known example of BI near-perfection — the Star Trek computers, usually voiced by the late Majel Barrett Roddenberry. They didn’t have a big role in the recent movie, which was so fast-paced nobody had time to analyze very much, but were a big part of the Star Trek universe overall. Star Trek’s computers integrated analytics, operations, and authentication, all with a great natural language/voice interface and visual displays. That example is at the heart of a 1998 article on natural language recognition I just re-posted. […]

I read your essay about why NLP UI’s had not succeeded, and I think it’s a much more thoughtful analysis than most I’ve ever seen. But here are a few comments.

When forms work just fine, there is probably no need for using natural language. You’re right that forms have the big advantage of letting you know what things the system does and does not know how to represent and manipulate (an excellent point that I’d actually never heard before). In the forseeable future, nobody should try using NLP UI’s where something tried-and-true works well.

As you predicted, the need for typing is less of an issue now, and speech recognition has progressed quite a bit.

Applications do indeed need ways to refer to things. You might want to take a look at Kim Patch’s company, http://www.redstartsystems.com, which uses speech recognition (but not NLP) to let you refer to things in a computer UI very easily.

I think the biggest problems are still the classical ones: it’s hard to parse natural language statements because human languages are so complex by their very nature, and you have to be able to relate the sentences to an underlying knowledge base in order for them to have meaning, and that’s all very hard.

One reason there hasn’t been much progress in this area is that so many of the needs for NLP have been met by systems that are statistical in nature, rather than actually understanding human language. The success of statistics-based systems has pushed the need for “real” NLP back so far that industry hasn’t been working on it. I believe the pendulum will swing back; we’ll want abilities that statistical approaches inherently cannot do, and the time will come again for real NLP work in industry.

It’s been commonplace to say that “Artificial Intelligence” has not succeeded. But if you look at what I was taught about A.I. when I was an undergrad, successful A.I. is all around us. My cell phone and even my car can recognize speech. Genuine robots applications have succeeded and walking robots work fine. Deep Blue beat Karporov, and on and on. The usual thing happens: people say that what we’ve really learned is that speech recognition and robots and chess don’t require intelligence, after all. In this way, A.I. can never succeed, by definition. We all knew that even back in 1975: the bar gets moved back so you have to run as fast as you can to stay in the same place. But when you look from a historical perspective, the pattern is clear. We’ll have NLP. And it will “turn out” that you didn’t need “intelligence” for NLP after all.

[…] queries. The hope would then be to eventually achieve a rich enough knowledge base to support the Star Trek computer. But automated decision-making doesn’t just require knowledge; it also requires […]