Jaron's World: Heads-Up

Why your next telephone may come mounted on a neck.

I've been thinking lately about two seemingly unrelated questions that have a hidden and, I suspect, significant connection:

1. Why do you have a neck?

2. Why hasn't videoconferencing ever caught on?

Let's start with the second question. Nearly a century ago, early science fiction authors and futurists predicted that airships and videoconferences would someday be ubiquitous. Air travel became ordinary by the mid-20th century, but today visual telecommunication remains a marginal technology at best. E. M. Forster, who anticipated the World Wide Web and many other aspects of the Internet in his 1909 story "The Machine Stops," assumed that two-way video transmissions would inevitably become the most common form of communication. It was the principal mistake in his otherwise amazingly prescient vision.

Long-distance videoconferencing was demonstrated way back in the late 1920s, but the idea flopped even as television soared. In the 1950s, AT&T introduced videophones once again, and once again they arrived with a thud.

In every decade since, the pattern has been repeated, despite what seem to be irrefutable reasons for people to seek out videoconferencing. Travel is time-consuming and expensive, and the planning needed to bring a group of people together can be tricky. In recent years, motivations for developing viable videoconferencing have multiplied. Now we need to worry about global warming and the high price of jet fuel. And dangerous new viruses being distributed by air travel. And terrorism.

So once again a variety of videoconference technologies are being introduced, and once again something seems to have gone wrong. At first users are enthusiastic, but over the course of a few months usage drops off, and soon the devices are abandoned. Why?

There is a world of opinions. A perennial speculation is that although people initially think they want a visual connection, they ultimately prefer to be hidden, except when they go to the trouble of traveling to attend a meeting in person. Perhaps.

A community of "tele-immersion" researchers suspect a deeper answer: Maybe there is something about how our brains are fine-tuned to perceive other people that video telecommunications have simply not picked up.

The most famous unsolved problem in videoconferencing involves eye contact: Since the camera and the display screen are separate objects, each time you look at the screen you shift your eyes from the camera. Someone watching you in a videoconference notices that you constantly look away. If the camera is above the screen, you always appear to be looking down. Studies show that this lack of eye contact reduces trust, collaboration effectiveness, and satisfaction with the interaction.

Whole libraries could be filled with accounts of the zany ways people have tried to overcome the eye-contact conundrum. There have been cameras mounted in holes in the display, plenty of tricks with mirrors, and lots of computer-graphics schemes to create the illusion that each person in a videoconference is looking in a different direction than is actually the case.

The more you study the phenomenon, the more subtle the human-factor requirements turn out to be, because the amount of eye contact maintained by people varies enormously for social and situational reasons. For instance, high-status people tend to seek eye contact more often than low-status people, and various cultures, including some Muslim ones, avoid prolonged eye contact in certain settings. Even while eye contact is not happening, the problem doesn't necessarily go away, and technological requirements are not necessarily reduced. My colleagues and I have noticed that subjects who tend to avoid eye contact still work better using tele-immersion rigs that at least make it possible.

Eye contact isn't the only issue. People are keenly sensitive to variations in facial skin tone, changes in pupil dilation, tiny head motions, even subtle delays in response, and perhaps to other cues not yet identified. The more researchers study nonverbal aspects of conversation, the more they appreciate their importance. Researchers like Alex Pentland at MIT have shown that observing certain nonverbal signals can often make it possible to predict the outcome of a date or a job interview before the participants themselves know what's going to happen.

So if videoconferencing technology could convey all these subtleties, would people finally welcome it into the mainstream? There's only one way to find out: Build it and see if they come.

In the 1990s, I had the good fortune to run a project out of the Internet2 engineering office called the National Tele-Immersion Initiative. Several colleagues, including Henry Fuchs of the University of North Carolina at Chapel Hill, Ruzena Bajcsy, now at the University of California at Berkeley, Andries Van Dam of Brown University, and I built and tested the very best tele-immersion rigs possible at that time. We were able to display people to one another holographically in full 3-D. We couldn't address all the known subtleties of interpersonal communication, but we probably got closer than anyone had before—close enough that I was able to develop a theory about the kind of device that might convey people to one another well enough to make the century-old predictions of presence-at-a-distance come true.

But before I tell you about that, let's backtrack to the first question. Why do you have a neck?

Evolution has produced a startling variety of configurations of eyes, ears, and other sensory organs. The most startling collection of eyes is found on the starfishlike brittle star, Ophiocoma wendtii. It has a bejeweled body almost completely covered by a compound eye, a multitude of crystalline lenses.

Animals closer to the top of the food chain have only two eyes, however. This is true not just for humans and other vertebrates but for squids and other cephalopods as well. It is easy to see why two eyes are better than one (enhanced depth perception and redundancy, for example). But why didn't evolution produce any large animal with more than two eyes? The answer is that two eyes that are highly maneuverable can see better than any number of eyes that are stuck in place—so well, in fact, that adding extra eyes would not add much performance.

It is natural to think of the brain as a computer and an eye as a simple camera connected to it. Here is a more accurate metaphor: The head is a spy submarine sent on a mission to perform a multitude of little experiments to learn more about its environment. These microexperiments are often carried out by constant, subtle changes of the position of the head.

By continually moving our head around in order to scan the scene, we simulate the effect of having far better eyes than we actually do, and in a far wider variety of placements. The motion of the head increases the quality of the image available to us: The brain integrates images seen from different positions to see more detail than is projected on the retina in a single moment. This motion is fundamental to human sight. If you immobilize your head in a vise, you will see far less well. If you also stop the motion of your eyeballs, you will soon cease to see at all. The world seems to vanish into gray.

There's a related problem concerning the screen. Fixed-position videoconference screens create a virtual tunnel through which people see each other. Faces fall out of sight if the people on the other end move their heads much, and as I've argued, heads naturally want to be in motion. I suspect this simple problem is one of the reasons videoconferencing has never caught on.

Now here's the interesting part that ties the two questions together. Everything changes if the whole telepresence device is put on top of a neck. And that's exactly what we are trying next.

The new experiment is called Cocodex, for "compact cooperative desktop explorer." I started work on it while I was the visiting scientist at Silicon Graphics. Oliver Staadt at the University of California at Davis and others are helping me develop prototypes of a compound gadget—including cameras, microphones, and a holographic display—that sits on top of a robotic arm. As you move your head, it follows you, giving your interlocutor a high-resolution view of all the subliminal details that matter. (When you look away, on the other hand, the Cocodex screen with its cameras doesn't stay in front of your face, but the other person will still see you from the side, able to judge the precise manner in which you are not maintaining eye contact.)

As it happens, putting the whole rig in motion solves a plethora of other problems as well. For instance, it becomes easy to assemble a lot of people from different locations into a single virtual space for lectures or meetings, something that is hard to do with fixed displays and cameras because they don't let people look around. It also frees you from needing a videoconference room with built-in screens and cameras.

It might sound strange at first to design a communication device that bobs around just as our heads do when we talk and listen. But when nature settles on the same solution to a difficult problem more than once, as it did with vertebrates and cephalopods, it is worth paying attention.

Although the initial prototypes of Cocodex will rest on metal robotic arms, we hope eventually to use soft organic designs, perhaps based on an elephant's trunk. In the late 19th century, technologies of the future were imagined to be cold and rigid, like steamships. Maybe the road to successful videoconferencing will require devices that not only move like graceful animals but look like them too.