Sunday, 21 May 2017

A startup on kickstarter is touting world's first voice mask for smartphones. Having said that Hushme has been compared to Bane from Batman and Dr. Hannibal Lecter. Good detail of Hushme at Engadget here.

This is an interesting concept and has come back in the news after a long gap. Even though we are well past the point of 'Peak Telephony' because we now use text messages and OTT apps for non-urgent communications. Voice will always be around though for not only urgent communications but for things like audio/video conference calls.

Back in 2003 NTT Docomo generated a lot of news on this topic. Their research paper "Unvoiced speech recognition using EMG - mime speech recognition" was the first step in trying to find a way to speak silently while the other party can hear voice. This is probably the most quoted paper on this topic. (picture source).

NASA was working on this area around the same time. They referred to this approach as 'Subvocal Speech'. While the original intention of this approach was for astronauts suits, the intention was that it could also be available for other commercial use. Also, NASA was effectively working on limited number of words using this approach (picture source).

For both the approaches above, there isn't a lot of recent updated information. While it has been easy to recognize certain characters, it takes a lot of effort to do the whole speech. Its also a challenge to play your voice rather than a robotic voice to the other party.

To give a comparison of how big a challenge this is, look at the Youtube videos where they do an automatic captions generation. Even though you can understand what the person is speaking, its always a challenge for the machine. You can read more about the challenge here.

A lot of research in similar areas has been done is France and is available here.

Motorola has gone a step further and patented an e-Tattoo that can be emblazoned over your vocal cords to intercept subtle voice commands — perhaps even subvocal commands, or even the fully internal whisperings that fail to pluck the vocal cords when not given full cerebral approval. One might even conclude that they are not just patenting device communications from a patch of smartskin, but communications from your soul. Read more here.

Another term used for research has been 'lip reading'. While the initial approaches to lip reading was the same as other approaches of attaching sensors to facial muscles (see here), the newer approaches are looking at exploiting smartphone camera for this.

Many researchers have achieved reasonable success using cameras for lip reading (see here and here) but researchers from Google’s AI division DeepMind and the University of Oxford have used artificial intelligence to create the most accurate lip-reading software ever.

Now the challenge with smartphones for using camera for speech recognition will be high speed data connectivity and ability to see lip movement clearly. While in indoor environment this can be solved with Wi-Fi connectivity and looking at the camera, it may be a bit tricky outdoors or not looking at the camera while driving. Who knows, this may be a killer use-case for 5G.

By the way, this is not complete research in this area. If you have additional info, please help others by adding it in the comments section.