Delving into the uncanny valley with the mind behind the first Arabic-speaking robot

Us humans are fine with robots that look like us, up to a point. The repulsiveness we feel when their appearance gets a little close for comfort, something known as the uncanny valley, continues to confound scientists looking for the reasons why and how we might move past it. Dr Nikolaos Mavridis has made a career out of building intelligent robots and studying how they interact with humans. In 2009, he introduced to the world Ibn Sina, the world's first Arabic-speaking humanoid robot. New Atlas sat down with the scientist to pick his brains on this phenomenon, and whether we actually even need human-like robots at all.

Mavridis was a research assistant at MIT Media Lab throughout the early 2000s, investigating human-robot interactions through language, vision and facial recognition and earning his PhD in 2007. Since 2008, he has continued this work as the founder and director of the Interactive Robots and Media Lab, a research laboratory starting at the United Arab Emirates University, then moving on to the New York University (NYU) Abu Dhabi.

Other robots Mavridis has worked on include "FaceBots," which are physical robots that are Facebook-connected and designed to create more meaningful relationships with humans by referencing shared memories and friends. The robot that has earned him the most recognition in academic circles, however, is Ripley – a conversational table-top manipulator arm equipped with vision and language capabilities. Ripley can help humans carry out tasks on the table, and can also learn the meanings of basic words by listening.

But with its freaky human likeness and pioneering speaking abilities, Ibn Sina garnered the most media attention when revealed to the world in 2009, appearing on television in more than twenty countries. Named after the famous 11th century Islamic polymath, Ibn Sina was the first Arabic-speaking dialogic humanoid robot, designed to potentially become part of an innovative interactive theatre installation with humans and robots, and most importantly, to explore human responses to robots in the Middle East.

Ibn Sina was soon after brought to venues in the United Arab Emirates and beyond, as Mavridis and his team sought to learn more about the interplay between Middle Eastern culture and humanoid robots. Ibn Sina interacted with thousands of people around the Gulf region. It was in the Emirati city of Dubai that we caught up with Mavridis and the following transcript combines that interview and written responses to follow-up questions we had for the scientist on what we can expect from anthropomorphic robots that increasingly look like us, for better or for worse.

What were some of the interesting things you learned from the Ibn Sina experience?

In extreme variants of Islamic culture, but not in the United Arab Emirates, certain Imams propagate a prohibition against anthropomorphism that goes beyond the environment of the mosque. Of course, prohibition of depictions of humans inside holy places exist also in other religions, like in Judaism, for example. However, in traditional Islam, as is practiced for example in the UAE, these prohibitions do not extend beyond the religious sphere. For example, people are allowed to use cameras and phones to take photos and make videos of their friends.

In Christianity, in the Orthodox variation, statues are not allowed in churches, however, icons which are thought of not as accurate depictions but as "representations of the essence" of the holy figures, are allowed. But such prohibitions very rarely move outside the religious sphere, into the secular.

The interesting thing is that in some of the most extreme Imams' interpretations of Islam, pictures and statues should certainly not exist in mosques, but they should not even exist outside. So in that respect, a robot which is anthropomorphic, under certain conditions, can be blasphemous. These Imams, for example, do not allow their followers to have cell phones with cameras, because even family photos can be blasphemous, in a sense. And these Imams are usually basing their theories in an extreme interpretation of a hadith (of a story involving the prophet Mohammed as reported by others).

So, when the Ibn Sina robot was exhibited and was interacting with the crowd, we saw a very small yet persistent group of people, that would come in with a totally negative attitude, although they were not expressing it directly, and completing the questionnaires we had regarding robots and what they thought of them and their potential applications. We intentionally made the first five questions have constant polarity answers, so that by clicking on the right one could choose the positive polarity, and by clicking on the left one could chose the negative polarity.

What we observed was that when the Ibn Sina robot would greet them, the members of this specific group of people would complete the questionnaire, starting with a negative response to the first question, and then just going down on the same side, thinking that this way they would just respond again negative, negative, negative, negative, without even reading the questions that they were responding to further down.

And we could be sure that they did not even read, because the polarity of questions would change further down, but they just kept ticking the same side, but it is important to note though that this was a very small percentage of people. It was actually interesting to see how even within the Islamic context there was quite a lot of acceptance of robots, and in some respects, certainly more than in the Western context.

When it comes to the theological justification of why robots should be acceptable in the Islamic world, usually there is an argument based on utility. If they can be used in a way to make the lives of people better, then unless of course they do other things that are not good, then they should be accepted.

What role do you see humanoid robots playing in our lives? How might they make our lives better?

There's actually an even simpler question that we have to answer first, which is, when should robots be humanoid? And the answer is that there are certain applications and roles where it is better for them to be humanoid and there are others where it is not.

Of course, one of the negative aspects of having a robot that is realistically humanoid is the so-called "uncanny valley" phenomenon, first identified by the Japanese roboticist Masahiro Mori. But of course there are many ways out of the uncanny valley. A simple one is a robot can be humanoid without being strongly realistically human-like. Then you can still get many of the advantages of it being humanoid, without the negative, eerie emotional response predicted by the uncanny valley.

What are the advantages of a humanoid? Number one, that we as humans, have cognitive apparatus that has evolved towards maximizing our interactive and communicative effectiveness with other human-like creatures. So it's not just voice, it's not just general purpose vision or sound, we are actually, often unconsciously, reading many kinds of meaningful signs in the face and body of our conversational partner, and they play a very important role in our interaction.

So if you have a machine that has the capability to perceive non-verbal signs like nods, facial expressions and head angle changes, this means we can engage much more closely, more naturally and more effectively with them. So if you want to have this face-to-face spoken dialogue with non-verbal elements in it, it might be important to have something that looks human-like. Of course it might not even need to have a physical 3D presence, it might even be a virtual character on a 2D screen, an avatar.

And there is also existing research on in what conditions a physical robotic humanoid might be preferable to a 2D avatar on a screen, and vice versa, and there is quite a spectrum of possible choices across these two cases.

Another reason why you might want to have human-like robots is that the environments which we live in, for example this cafe, have configurations and ergonomics that are optimized for humans. So for example, the objects you might want to grasp are usually placed at a height or a specific position that is accessible to human hands. They're not very low, they're not very high, they're not very deep inside, and so on.

So in that respect, if you want robots to be able to operate within environments optimized for humans, they might need to share some physical characteristics that are similar to humans. Thus, these are two of the classic reasons why humanoids robots might be preferable to other shapes, for certain applications.

First, if you want to maximize the naturalness and effectiveness of the interaction with humans, they have an advantage and second, if you want them to be able to operate easily within manmade environments that have been already implicitly optimized for the human body dimensions and capabilities, humanoid robots might again be better suited.

Of course, they might not always need to be very human-like, and they might even contain combinations of elements. For example, a platform with four wheels which holds a human-like torso without legs, such as the innovative robotic policeman that was roaming Dubai Mall recently and helping humans with services.

What do you think the underlying reasons are for the uncanny valley?

If you go from non-anthropomorphic towards more and more anthropomorphic robots, although you expect and actually initially see an increase in the subjective attractiveness of the robot, after a specific limit there's this very big negative dip, where it becomes repulsive. And there are many possible explanations for why that is the case.

One of the simple ones is that you have something that starts to be convincingly similar to being human, but combines strongly human-like elements with certain very non-human-like elements in its appearance or its behavior. And in that respect it gives you a zombie-like perception.

Another interesting relevant idea is this whole concept of "anthropomorphization." There is a strong inherent tendency for humans to anthropomorphize a lot of different things that they see. For example, when somebody sees the moon, they might often hallucinate eyes and a smile or something like that, in order to make it fit their innate prototype of a smiley face.

Also, designers often take advantage of this tendency to not only construe what you see as a human face or body, but also to assume a "theory of mind" for such entities, that they also have emotions, perceptions, beliefs and other things that we assume other humans, or even our pets, have. For example, in order to create this idea of animacy, almost like having a "soul" in layman's terms, for something like a car, a designer might even add elements that look like eyes or a smile and then it becomes much easier to relate with the cartoon car, and even feel empathy for it.

One of the mechanisms related to how we mentally represent entities perceived as animate, like humans, pets and cartoon characters, or entities perceived as inanimate is through "mental models" that we create in our mind, representing the world around us. One specific kind of mental model is what is known as a situation model in cognitive psychology, and these are usually filled either with representations of the situation we are currently in, or a situation we are remembering, or even a situation we are imagining.

Such "situation model" representations usually contain objects of two sorts. First, there are passive, physical objects for which the situation models just contain their descriptions of their appearance and also models of kinetic properties, predictions of the effects of physically interacting with them, and so on.

And second, there are "agentive" entities for other humans, animals, and increasingly, artificially intelligent entities that go beyond the physical and kinetic. They also contain mental properties such as emotions, beliefs, intentions, goals. For example, when we are interacting with another person or a cat, our mental models of them contain their estimated emotional state – do they seem happy? Do they seem excited or bored? Also, they contain estimated beliefs of the other person: for example, has my friend seen where the tomatoes are (while we are making a salad together) and thus he holds a belief regarding their existence and position, or should I inform him through an appropriate verbal statement?

So there are these two different, qualitatively separate kinds of classes of objects that populate our "situation models," objects that are agentive entities (which contain mental attributes too), and objects that are just passive (containing just descriptions of appearance and kinetic behavior).

The important thing to note, though, when it comes to how we mentally construe robots, is that we did not historically seem to have a third, special category for artificially intelligent devices: for example, something between humans and passive objects. Thus, usually when we think of an intelligent robot, especially if it is anthropomorphic, we might well be re-using a variation of the mental representations that we possess for classic animate entities, such as humans.

How has our understanding of the uncanny valley changed over the years?

Another interesting aspect of the uncanny valley is that there has been a lot of follow-up to the original research by Masahiro Mori, both by him as well as by other researchers. Mori first identified the phenomenon in 1970. Later he wrote a book called The Buddha in the Robot and for quite a while, he was trying create bridges between ideas from Zen Buddishm, Shinto, other Japanese traditions (such as the Karakuri ningyo), and Robotics.

So, let us consider something like Japanese theatre, for example. Interestingly enough, in Europe when we have puppet theatre we try to hide the strings, but this is not the case in Japan. You can see this black velvet behind and the actual person that is moving the puppet. One might ask, is there a cultural basis of this difference?

Also consider Shinto, the traditional "indigenous" Japanese religious system, which was canonified by the emperor, and which usually is followed alongside Buddhism by a large percentage of the population of Japan. In Shinto, everything is conceived of as being animate, as having a soul, the trees, the mountains, the stones, as well as the puppets.

So there is this widely infused animism within the culture and it might be making interesting connections to why robots might be so favored in Japan. If puppets are thought of as being animate, why shouldn't robots? This, therefore, is a simplistic explanation of why robots are given social ceremonies when they enter a company, or when they leave a company, or even sometimes given "funerals" when they have to be disposed of, and so on.

It's also interesting to notice that there is this bilateral influence between the Japanese predecessors of robots, the Karakuri ningyo, and actors in Japanese theatre. During the 1600s in the Edo period, you started having Karakuri ningyos (mechanical "automata" dolls) in a number of contexts such as religious, artistic, secular.

Weirdly enough, Japan was effectively also isolating itself during the Edo period, no foreigners were allowed to step foot on the island except from merchant ships from the Netherlands, which were allowed only to disembark on an isolated islet. And there are special human families of "automata maker master craftsmen," where the knowledge and the tradition is carried along from father to son, across many generations. And interestingly enough, the advanced generations of some of these families, later also become connected to the Japanese industrial revolution, and to the mega-companies that arose.

But what could the Karakuri ningyo do? In the secular context, they could, for example, give the form of an archer and mechanically throw small arrows in order to entertain the guests. In the religious context, the automata would become part of large wagons that would be part of parades, for example taking place during "matsuri" festivals. Most importantly, in the theatrical context, where in different genres human actors as well as puppets would participate, the karakuri ningyo robot-predecessors would also take part in the action.

Note though that these automata didn't have the capability to do strong facial expressions. Usually they weren't actuated in the face, but they could move their head slightly. So this new language of emotional facial expressions was created, which was based on slight movements of the head and changes in illumination.

Interestingly enough, there is a theory that these emotional display mannerisms (with slight movements of the head and illumination) started being copied by the actors, too. So the actual affordances for emotional expression that these automata had, also became the prototype for what human actors would do. And this might be one of the explanations for the subtlety of emotional expression in Japanese actors. So this is an example of how not only the machine often mirrors the human, but also the human starts to mirror the machine, in a sense.

Can robots have emotions?

The obvious answer to that question is another question, which is what does it mean for an entity to "have" emotions? Certainly, I think that machines can "appear" to have emotions. People can read happiness in the face of a robot, they might even be able to read emotions in highly non-anthropomorphic devices. For example, an intensely blinking red LED light with the right rhythm and duty cycle can appear as expressing frustration, anger or alertness and danger.

So we don't necessarily need anthropomorphic machines in order for them to appear to have emotions. Of course, there is still quite some active research in how emotions are conveyed by different configurations and it is worth bearing in mind that a capable artist can also "teach" humans to connect specific observables with associated emotions.

And we also have the ability to do affective speech synthesis, and synthesize voices that appear to be angry or sad and so on. We can also play with many more degrees of freedom of the apparent behavior of a system. Thus, there is a lot of interesting research and open opportunities for machines and robots that display or appear to have emotions.

But machines displaying emotions is only one side of the coin. The other is machines recognizing human emotions. And there is quite some progress happening there too in areas like facial expression recognition and voice recognition, affect recognition through body posture and walking style, along with physiological measurables correlated with emotions such as heart rate, breathing rate and galvanic skin response.

And to read human emotions is extremely important, for a robot salesman or persuasive robot, for a robot educator that needs to monitor and maximize the attention and excitement of its students, for the automated analysis of human responses to advertisements or to political speech and for a large number of other applications.

And then there is the possibility to enter "emotional dialogues," in which a machine is modulating its apparent displayed emotions in response to the human partner's apparent emotion in order to reach a goal. This could include goals that are simple to express, for example to calm him or her down or make them happy, and other goals that are more complex like maximizing engagement, learning effectiveness and implicitly teaching emotional responses.

So, slowly, robots are starting to get some of the basic components that are required for the wide range of skills and competencies that fall under the umbrella of "Emotional Intelligence." Therefore, in analogy to the classic Turing Test that aims to discover whether a machine is indistinguishable from a human while communicating to it via text chat for five minutes, one could also envision and design a special Emotional Turing Test, during which we would test the robot's indistinguishability of emotional responses from a human by exposing it to appropriate stimuli, events and situations.

The Turing test was originally quite generic, focusing on written natural language abilities through which it was arguably assessing a subset of general intelligence. But there have been proposals to create special Turing tests for other specific types of intelligence.

So, for example, there is a "capability for moral agency" Turing test that has been proposed for machines in which you give moral dilemmas to the machine, then you check what its responses and answers are. If you cannot distinguish the responses from a human's for long enough, then maybe this machine has effectively passed a moral Turing test, and some argue that this might be a pre-requisite for it to legally have agency, and thus for the machine (or robot) to be able to be responsible for its own actions.

In this case the robot will not be viewed as a tool or pet for whose actions the legal responsibility usually falls to the owner or the manufacturer. But rather, as it has proven capabilities for moral reasoning, the responsibility might be attributed to the machine itself. Of course, there is a lot of open discussion about whether this is appropriate and also, regarding what the implications of adopting such a stance would be.

It is important to remember, though, that philosophically we are just talking about subjective "appearances" here. This is more of an approach that doesn't involve any kind of essential properties, regarding what "having emotions" might mean. It is more of an interaction-based approach.

Of course, you're also not checking for "mappings" of emotions to biologically-specific hardware, you're not checking if there is a specific neuron inside the head of a machine, or specific neurotransmitters being used or if a certain sequence of events is taking place within a biological nervous system. Rather, we're just talking about what the subjective impressions and measurable responses of humans are, when observing or interacting with the apparently "emotional" machines.

When could a robot sit here next to us and you wouldn't even know?

Oh, have I convinced you that I am human, despite all these metal parts inside my cranium? I am just joking. This is quite an interesting, as well as an important question. One further question that follows is what would count as a satisfactory answer to such a question?

One first possible answer is, when we have realistic materials and construction techniques. But that is not enough if it means only in terms of materials as it should also go beyond appearances, and also hold in terms of interactive behavior. And this is the most difficult part. In Madame Tussauds museum we can find wax replicas that are very realistic, but they are not really moving or interacting.

In certain respects you can imitate adequately passive behaviors, but the difficulty is in fully-fledged unconstrained interaction. So you can have an automaton that looks quite realistic and seems to be moving quite realistically in some kind of a repetitive cycle of movement, for example, a moving "bread baker" automaton such as the ones that you see decorating French bakeries. But animatronic automata are not usually very rich when it comes to their sensory capabilities, in most cases they don't have any such capabilities – beyond an on/off switch.

So animatronics might look like humans when used in museums and so on, but they don't respond to you, they just go through a cycle of motion and don't have the richness or variation of realistic responses tied to appropriate sensory inputs such as vision or hearing, which are required in order to engage in fluid interaction with humans.

There are a lot of different arguments and estimates for when we might get to a point where they do. I would say instead of giving a precise timeframe, what's more interesting to ask is whether it is a real requirement to have non-indistinguishability from humans in terms of physical and mental properties, for many applications.

And also, it is important to bear in mind that with cloud computing and the internet of things, not only the "minds" of intelligent robots could be comprised of a large number of cognitive services spread globally over the cloud; but also that their "eyes" could observe millions of humans and have access to real-time big data from whole buildings and cities.

And furthermore, the distributed minds that will be controlling robots will have a degree of "sharing" experiences, skills, perceptions, memories, which will be unprecedented in terms of extent and speed, as compared to the "bandwidth" limitations in sharing among humans. Thus, the effective "collective" aspect and the sensory, world-effecting capabilities of the robots of the future will most probably be beyond what most people normally imagine.

Thus, getting back to the previous point, absolute indistinguishability of robots from humans is not necessary. Even if future robots cannot really fool you in the physical sense, you can have AI that has such capabilities that it can really change our lives immensely, even without a very convincing embodiment. If you have, for example, programs, that are handling resource allocation in cities, or making very important subjective decisions regarding policies or even short term actions within a car, a building, a city or nation, they could potentially have a much bigger beneficial effects in our lives than a robot that is directly indistinguishable from us. And of course, the above programs will not be isolated but would rather be a part of a large network of software and hardware, cloud services, devices, iPhones, sensors and robots which would be effectively tightly connected to our physical as well as human mental environment.

Thus, although variations of the Turing Test exist, such as the total Turing test, which also covers the physical appearance aspect, I must admit that beyond its theoretical appeal, it probably should not and will not be the main focus of our goals in the mid-term future. Rather, massive internet of things-derived big data and large collective intelligences with humans and machines promises to significantly transform our lives for the better, and to give us new capabilities that would hardly be imaginable before.

Anyway, who could even think of TV across continents in the 1800's? The situation will be similar for the children of our children. And robots, in human or other forms, with their distributed minds and special emotional, social, and cultural capabilities, will form an important part of these hopefully peaceful and empowering futures.