Why do we even need robots? Aren’t chatbots and avatars enough?

Why are social robots better than chatbots or voice assistants? Here are four powerful reasons.

We live in a world where almost everyone has a smartphone with enough processing power and pixels to display a high-quality digital avatar. Most of these phones also have built-in voice assistants, and smart speakers are becoming more and more common in homes and offices alike. So why do we need robots? Is it really easier to interact with something with a physical presence? And if so, why? All great questions – and questions that we get a lot at Furhat Robotics. The answer is intuitive, and surprisingly simple.

Fundamentally, this is about the user experience.

The difference between working with pen-and-paper, a keyboard and a mouse, a touch screen, a smart speaker, an animated character, VR, AR, or a social robot, is all about the user experience. We can likely express any “information” through any of the mediums above, but each has its own benefits in how it engages with us in different tasks. And there are multiple elements of normal, human interactions that a voice assistant or chatbot simply cannot convey.

The importance of co-presence

But why? What is it about a social robot that makes it so much more engaging, that makes an interaction feel so much more natural?

Let’s start by exploring the concept of co-presence. Co-presence is the effect of sharing the same experience with another person due to them being in the same physical space. Relationships are based on shared experiences, and throughout human evolution, we have always been “co-present” with each other – until relatively recent technological advancements such as radio, TV, phones, and the internet allowed us to communicate with each other over distances.

We are “designed” to interact with people that share the same space as us – which is why we find it easier to trust or build an emotional connection with people when we see them in person, rather than speaking over the phone or Skype.

Co-presence makes interactions feel more natural. After all, that’s how humans were made to interact.

If you’ve ever tried a set of VR goggles, you’ve experienced how immersive technology can recreate that feeling. A good VR experience takes over our senses and we become co-present with whatever lives in that virtual space – helping us engage with the content on a much deeper level.

The same thing happens when interacting with a Furhat robot – except that it happens in the real world, making it potentially even more powerful. The Furhat robot has a face and eyes, can actively show that it is aware of the user, it can look, it can smile and create a sense of shared experience and connection that would be extremely difficult to replicate in a voice, or even video call.

Proximity: how close is close?

The other side of the co-presence coin is proximity. While co-presence is sharing the same physical space, proximity is how close in that space two things are – and the effects of that closeness on the interaction.

Simply put, proximity is the study of physical distance and formation. Sociologists use the proximity principle to describe the tendency for people to form interpersonal relationships with those who are close by.

Humans are highly sensitive to physical distance, and the space between two people when sitting in the same room is a key factor in determining the type of interaction.

For example, sitting very close to someone signifies trust and can even be used to signal a romantic relationship, while keeping someone “at arm’s length” is usually the opposite. If you’re walking towards someone approaching from the opposite direction on the street, it sends very different signals if they give you a wide berth or if they walk closer to you.

There are numerous examples of day-to-day interactions where having a robot in the same room means that we are applying very similar principles to human cognition as that of when people interact with each other. For most people, it’s much easier to buy a car at a dealership than talking to a digital assistant. There’s no doubt a digital assistant is capable of describing the features of a car, but there’s something about standing close to someone in the same physical space, looking the dealer in the eye and shaking hands, that makes it easier to make that final purchase decision.

The Mona-Lisa Effect

Let’s go back to video. The reason video conversations can feel awkward – especially when there is more than one person on either end – is due to something known as the Mona-Lisa effect. If you’ve ever been to The Louvre and seen Mona live, you may have noticed that wherever you stand in the room, it still looks like she’s gazing straight at you.

In general, humans are experts at decoding where another person is looking. We know when we are the target of someone’s gaze (thanks to evolution).

But evolution doesn’t always keep up with technology, and this skill doesn’t work over video.

When someone’s face is displayed on a 2D screen, a person’s gaze is perceived the same regardless of the direction from which you view the screen. So if it seems like the person is looking right at you, they are also looking “right at” everyone who is in the room (just like Mona-Lisa).

This doesn’t just apply to video calls. This limitation is inherent to any 2D display – be it a video, an animated avatar or – indeed – a renaissance painting.

A physical robot, on the other hand, inhabits the same 3D-space as we do. It can easily convey gaze towards objects, or establish eye contact with people in the same room.

The potential for multi-party interactions

And this ability, in turn, comes with its own perks.

Think of a group discussion – for example a teacher and a group of students seated around a table. Who determines who should speak, and when?

In fact, the turn-taking between the participants is carefully coordinated and negotiated, mostly using eye gaze. And when someone does speak, how do we know who the message is addressed to – just one person, or the whole group?

Again, in most cases this is communicated using gaze and glances. And this is impossible to convey with simply a voice or a digital avatar.

Voice assistants and digital avatars are great tools for individual people to use for quick informative requests – but that’s not enough. As we get more and more used to speaking to machines, we will soon expect – and need – them to be able to engage in more complex exchanges involving more than one person.

Summary

There are numerous scientific studies that back up the idea of robots over avatars and chatbots. Making the user experience more engaging means users get more out of the interaction – emotionally and cognitively. Take this 2012 paper by Brian Scassellati and colleagues at Yale University, which studied the effects of physical embodiment on learning gains.

Participants were asked to solve a set of puzzles while being provided occasional gameplay advice by a robot tutor. Each participant was assigned one of five conditions: (1) no advice, (2) robot providing randomized advice, (3) voice of the robot providing personalized advice, (4) video representation of the robot providing personalized advice, or (5) physically-present robot providing personalized advice. The study found that participants in the group with the robot physically present solved most puzzles faster on average, and improved their same-puzzle solving time significantly more than participants in any other group.

There is a time and place for voice assistants and digital avatars, which have already had a huge impact on society.

But without sharing the same physical space, being able to handle group conversations, and communicate eye gaze, there will always be something missing. These interactions will never feel as rich, be as engaging, or inspire trust and connection the way a physical, social robot can.

So the next time you are wondering why you fly a thousand miles to close a business deal or spend time with your family – or why your company should invest in a social robot rather than a chatbot – it’s the human thing to do.