Robots that Communicate with their “Ears”

Since its establishment, HRI-JP has been researching sound recognition technology for robots, or “robot audition.” In conventional speech recognition systems, the robot recognizes voice input coming from the microphone close to the user’s mouth. This is not natural because a user has to use a microphone headset whenever conversation takes place between them. “Robot audition” research aims to develop a technology to recognize and understand surrounding sounds with the microphones mounted on the robot, or by the ears of the robot.

Humans are able to filter out the ambient noise and hear only the desired sounds. However, this task is difficult for a machine. The same problem occurs if a robot tries to listen to and understand surrounding sounds using its own ear. To solve this difficulty, HRI-JP is doing research and development to extract and identify Define this term or replace with just "information." information, particularly the information as to who made what sounds at when and where for what topic (why), so that the robot can filter out noises and hear only the desired sounds. We are focused on microphone array signal processing. This processing enables us to locate the “positions” of the sound sources (sound source localization), “extract” only the sounds that are intended or desired (sound source separation), “identify” the sound (sound source identification), “recognize” human language in the sounds (automatic speech recognition), and understand “who” is speaking (speaker identification).

The microphone array processing mounted in the robot make it possible to understand multiple sounds simultaneously coming from a group of people. It can also identify the locations and types of sounds in the surrounding area.

Applications for Rescue Operations in Disaster Stricken Areas

HRI-JP, in collaboration with Kyoto University, made available to the public open source software for robot audition based on microphone array processing called HRI-JP Audition for Robots with Kyoto University (HARK). This software is actively maintained and we offer free tutorials for users both in Japan and overseas. HARK provides processing modules for audio source separation, localization, and identification, and we can build our own signal processing systems by combining them.

One of the applications of HARK is a drone to detect sound origins (sound source localization). Drones can be a significant source of noise due to their propellers and wind. HRI-JP is independently engaged in researching sound source localization techniques to determine the positions of the sound sources even under noisy conditions. This technique is adopted in a research project by the Japanese Cabinet Office’s Imposing Paradigm Change through Disruptive Technologies (ImPACT) program to test the capabilities of drones equipped with a microphone array in disaster areas to assist in rescue efforts.

microphone array

It is challenging for a drone equipped only with cameras to locate people buried under rubble, and standard microphones on a drone cannot pick up human voices calling for help among other noises. A test was conducted using HARK to locate people calling for help as part of a rescue operation, which successfully identified the sound location even in a noisy outdoor environment.

The experimental drone equipped with the microphone array processing capabilities to localize and separate a sound source even if the source is under debris or outside the camera field of view.