Tech Time Warp of the Week: IBM Speech Recognition, 1986

A pair of carefully painted lips, a feathered head-of-hair, pearl earrings, a pink button-down beneath a blue sweater, synthesizer music, a personal computer with a floppy drive, and a monochrome display. Welcome to the 1980s.

At first, the display is blank, except for a flashing green cursor. Is it waiting for someone to type? No, it’s not. This PC is different from the average DOS machine. When those carefully painted lips say the word “speaking,” the letters arrive on the display, as if by magic.

Then we see that those lips belong to a perfectly coiffed spokesmodel. “I talk,” she says, “and the words appear on the computer screen.” And then these words appear on, well, you get the idea.

This little piece of computer history appears in an IBM promotional video that says as much as about the culture of the times as the technology (see above). The year is 1986. IBM is still the king not only of the PC world, but the tech world as a whole, and with the video — and a spokesmodel that could come from no other decade — Big Blue shows off its early efforts in speech recognition software.

IBM’s perfectly coiffed spokesmodel says her words with unusual care. “On. The. Computer. Screen.” But what do you expect? It’s 1986. This was the first version of IBM’s “isolated” speech recognition technology. It was smart enough to recognize human speech, but only if humans spoke very slowly — if they temporally isolated each word from the next.

As the spokesmodel says, the system had a vocabulary spanning a few thousand words, and if you buy into the video, it could distinguish between words like “write,” “right,” and “Wright.”

“It took a powerful language model to be able to do that,” says David Nahamoo, the current speech chief technology officer at IBM Research. “At the time, these PCs were very low on resources, so we needed special hardware to run the algorithms.”

Each experimental computer was loaded with four custom-made “Albert” cards — a nod to Albert Tangora, the fastest typist in the world at the time — and these cards were loaded with enough memory to store the entire speech recognition model, letting the machine search for spoken words in real-time. According to Nahamoo, the system could also learn to adjust to a person’s voice, resulting in increased accuracy.

The system would eventually evolve into IBM’s first real speech recognition product: the Speech Server Series, which arrived in 1992. The idea was to help businesses speed up dictation.

As we all know, this never really caught on — at least not in a big way. But more than 15 years later, Siri arrived. She didn’t work all that well either, but at least she took voice recognition into the mainstream.

The problem is that speech recognition systems are never as effective as we want them to be. Nowadays, they’re OK at dictation, but we want more. We want to search the web with voice commands, and Siri doesn’t quite do that. She often pumps out little more than a long list of links, and then it’s our job to sift through all that and figure out what’s what. Very often, the links are useless.

The big challenge today, says Nahamoo, is translating questions into actionable answers. If you ask Siri how to get to the closest open Rite-Aid right away, she should give you an action plan. That’s why companies like Google, Baidu, Microsoft, and yes, IBM, are doubling down on deep learning — a field of computer science that seeks to mimic how the human brain works. Using deep learning methods, companies may be able to better process natural language and help you find exactly what you want or even new things you didn’t even know existed.

What it can’t give you is a feathered head-of-mid-80s hair. But there’s always old IBM promotional videos.

An ad for IBM's speech recognition work, which began in the 1950s. Photo: IBM.

In the 1960s, IBM unveiled an early voice recognition system called Shoebox. The machine could do simple math in response to voice commands, recognizing just 16 words. Photo: IBM.

Shoebox, what's five plus three plus eight? Photo: IBM

In 1980, true speech recognition was still on the drawing board. But Big Blue offered a talking typewriter. Photo: IBM

In the '80s, the vocabulary of IBM's speech recognition system expanded from 5,000 to 20,000 words. Photo: IBM.