Using Wizard-of-Oz for user studies with Furhat

Our powerful Wizard-of-Oz tool is a great method to collect data for quick iterations on prototypes and to control experiments. We have outlined how this works and the fun history behind the term.

by Gabriel Skantze, Co-founder & Chief Scientist

Social robots (like Furhat) provide a new way of interacting with machines, similar to how we interact with each other face-to-face. Unlike voice assistants, social robots can look you in the eyes, greet you with a smile, and create that sense of presence that we value in physical meetings. Social robots can be used for many tasks where we typically talk to each other, such as serving in a reception, interviewing job candidates, or teaching children. However, we are not yet at the point where you can simply instruct the robot how to do these things, in the same way you would instruct a human. Instead, we have to program specific applications, or what we call skills, for these tasks. How do we do that?

Let’s say you want to build a skill for Furhat, such as a robot receptionist. To do this, you have to figure out what the robot should say and how it should behave in the different phases of the interaction. You also have to “teach” it what kind of things people might say to the robot (by providing examples), how the robot should interpret these things, and how it should react to them. The Furhat Software provides a powerful SDK that helps you build such skills, even if you are not an expert on human-robot interaction or conversational AI.

However, it can be very hard to know beforehand what people might ask the receptionist, how different people might formulate their questions, or how they will behave in unexpected situations. All of this is essential to understand in order to build a really engaging and robust interaction. Thus, we need user data. The question is how to get this data. Once we have built our skill, we can of course use that to collect data, but since we haven’t built it yet we end up in a catch-22 situation.

A common solution to this is to use a so-called Wizard-of-Oz setup, where a hidden person controls the robot initially. The name comes from the 1939 film where the heroes are searching for the mysterious Wizard of Oz to help them from the Wicked Witch of the West. However, when they finally arrive, the Wizard turns out to be an illusion controlled by a man behind a curtain who operates a machine and speaks into a microphone:

This solution was also used for the famous Mechanical Turk, a chess-playing automaton invented by Wolfgang von Kempelen in the 18th century. While the robot was certainly a very impressive piece of engineering, constructing a machine that could play chess autonomously was clearly not possible at the time. It seemed too good to be true. And it was. A small person was in fact hidden inside the box and controlled the robot:

This setup turns out to be very useful when developing social robot interactions. Typically, a person (the “Wizard”) sits at a remote location, can see what the robot sees (on a display), can hear what the robot hears, and presses buttons to make the robot say or do things.

Using a Wizard-of-Oz setup, you can collect data on how users would react to the robot if it behaved in this way, and use this data to develop your skill.

Another situation where Wizard-of-Oz is useful is if you are a human-robot interaction researcher and you want to do controlled experiments on how the robot’s behaviour affects the user. Let’s say you want to investigate whether a robot that smiles more often also makes the user smile more often. You might want to set up a simple interaction where this can be tested, such as a robot interviewer which asks the user a couple of questions. This can then be done without building a fully autonomous system.

However, anyone who has set up a Wizard-of-Oz system knows that this is not trivial either. It typically involves a lot of programming to set up a video streaming solution, a GUI for the Wizard, etc.

To define the set of buttons and corresponding robot behaviours, you can either do it through programming (using the SDK) or use our graphical Blockly tool:

If you run this code (you run it as a skill on the robot) and log into the dashboard, you will be able to immediately start Wizarding! Here is an example of what a more complex configuration might look like. As you can see, the buttons can be organized with colors and grouped. The camera feed from the robot is shown to the left. If you click in the camera view, you can make Furhat attend a specific location or a user (and then automatically follow that user as she moves around).

The trivial code examples above only hint at the potential of the platform. The code that defines the Wizard state is actually the same type of code that is used to program actual dialog flows with Furhat. In these code blocks, you can make Furhat not just do simple actions – such as speaking, attending and gesturing – you can program complex behaviours. A very powerful feature (not often found in Wizard tools) is that you can transition between different dialog states, where different sets of buttons are available to the Wizard. These states can be made hierarchical so that a certain set of buttons are always available, but others are dependent on the current dialog state.

The robot’s behaviour doesn’t have to be totally controlled by the Wizard either, parts of it could be automatic.

Since the code that makes up the Wizard interaction is the same as the code you use for dialog flows, you can mix Wizarding with autonomous dialog behaviour, involving speech recognition and natural language understanding. The Wizard could, for example, just oversee the interaction and step in when the user says something that the robot does not understand (yet). Or, the Wizard could control the dialog, but the robot’s attention and gestures are controlled automatically.

As we have seen, these tools are very versatile and flexible, and we believe they will empower designers and researchers working on human-robot interaction in many ways. You can start designing your skill using Wizard-of-Oz and try it out on users to see how they will react, and collect data. This will help you to figure out if the flow of the dialog is working and if users are engaged in the interaction, but also to catch unexpected situations (there tend to be many!). Then you can progressively replace the Wizard’s actions with automated actions. Finally, you will have a completely autonomous robot that is designed with the users in mind. Happy Wizarding!

Gabriel Skantze, Co-founder & Chief Scientist

Gabriel Skantze is Chief scientist and co-founder of Furhat Robotics. Gabriel is also a Professor in Speech Technology with a specialization in Conversational Systems at KTH. He is leading several research projects and has published 100+ papers on conversational systems and human-robot interaction.