Post navigation

Accepting Speech Input in HTML5 Forms

The way that we interact with computers has changed dramatically over the past decade. Touch-screen devices and laptop trackpads have enabled a much more intuitive form of interaction than is achievable using a traditional mouse. These changes haven’t been limited to just hardware. Gestures, predictive text, and speech recognition are all examples of software innovations that have improved the way in which we interact with our devices.

Speech recognition has somewhat eluded innovators for decades. Many organizations have tried (with varying levels of success) to create reliable speech recognition technologies. There is one company however that looks to have cracked the problem – Google.

In this post you’ll be learning how to take advantage of Google’s speech recognition technologies to enhance your web forms. You’ll learn how to give Chrome users the ability to fill-in text fields using speech, and how to detect support for this new speech input capability in browsers.

Lets get started.

Enabling Speech Input

Enabling support for speech input is as simple as adding an attribute to your <input> elements. The x-webkit-speech attribute will indicate to the browser that the user should be given the option to complete this form field using speech input.

<input type="text" x-webkit-speech>

When speech input is enabled the element will have a small microphone icon displayed on the right of the input. Clicking on this icon will launch a small tooltip to show that your voice is now being recorded. You can also start speech input by focussing the element and pressing Ctrl + Shift + . on Windows, or Command + Shift + . on Mac.

In JavaScript, you can test to see if an element has speech input enabled by examining it’s webkitSpeech property. This is a boolean property and will therefore be set to true or false. You can override this property to enable or disable speech input on an element.

A Caveat About Input Types

Speech input is not available for all the different HTML5 input types. In my testing I found that the text, number, and tel types do support speech input whereas the email, url, date, and month input types don’t.

If you apply the x-webkit-speech attribute to an <input> element with an unsupported input type, the webkitSpeech property on that element will still be set to true. You therefore cannot rely on this property to tell if the browser is displaying the speech input controls, only that the browser supports speech input in general.

Detecting Browser Support

A simple way of checking if the user’s browser supports speech input is to look for the webkitSpeech property on an <input> element. An example of how to do this is shown below.

Google Chrome is the only browser that currently supports speech input. We’ll examine the reasons for this in the next section.

How Speech Recognition Works

Speech-to-Text with a Web Service

The browser relies on an external service to handle speech-to-text conversion. The recording of your voice is sent to this service which then analyses the audio and constructs a textual representation. The text is then sent back to the browser which populates the <input> element to complete the process. Many speech-to-text services incorporate machine-learning algorithms that allow them to get more accurate over time.

Note: A side effect of using an external service to handle speech-to-text is that you will need an internet connection for speech input to work. This is something to keep in mind if you plan for your web application to work offline.

The Chrome browser relies on Google’s proprietary speech recognition technology to provide the functionality behind x-webkit-speech. Google has had a team working on speech recognition and natural language processing for a long time. It’s this team that’s been responsible for developing the complex systems needed to provide a reliable speech-to-text service for products like Google Translate and Voice Search.

Note: If you’re interested in learning more about how speech-to-text works check out the research papers published by Google engineers.

Developing speech-to-text services is incredibly difficult and requires a significant amount of investment. This is probably the main reason why no other browser vendor has implemented speech recognition yet. However, now that Apple has acquired Siri, I’m interested to see if speech recognition will make it’s way into Safari some time soon.

Summary

In this post you’ve learned about the x-webkit-speech attribute and how it can be used to add speech input capabilities to your web forms. There is also a more advanced Web Speech API that we haven’t covered in this post. This API allows developers to add speech recognition functionality to more aspects of their applications, and even synthesize speech from text.

Whether it’s in the computer on your desk, or the phone in your pocket, software innovations like Google Voice Search and Siri are paving the way for a revolution in how we interact with computers. Welcome to the future my friends, now if only someone could figure out the whole teleportation thing.

I’m not sure. I think it’s mainly down to when other browser vendors can develop/license the speech recognition technology. Safari seems like the next contender for this feature after Apple acquired Siri. That said, Microsoft has done a lot of work on speech recognition too.

Speech is going to be so important in the future, using it on form inputs in your example can really open up and improve the mobile experience.

I got really excited a few months back by learning about the JavaScript Web Speech API, Ian Devlin has already shown real world examples of controlling video through speech – can you imagine how accessible the web is going to become when users start to navigate around just by talking. Exciting times ahead.

I’m not sure if `x-webkit-speech` is supported in Chrome for Android yet. Unfortunately I don’t have a nexus 5 to test :/

I know that this doesn’t work on Chrome for iOS at the moment though.

Many mobile operating systems include their own speech input technologies that are more part of the keyboard than the browser. Perhaps this is why Google haven’t seen the need to implement this feature on mobile.

Thanks matt for the article.would like to create my next app around speech recognition technology

Hello! We're the teachers here at Treehouse. We produce video courses on everything from web design and web development to iOS and business skills. You can browse our full library of content to find the course that's right for you.

In the meantime, explore the free features, tips, tricks and videos here on our blog. Tell us what you think, we'd love to chat: blog@teamtreehouse.com

Stay Updated

Sign up for our newsletter, and we'll send you news and tutorials
on web design, coding, business, and more! You'll also receive these
great gifts:

checkArt and the Web: Line, Shape, and Form - An eBook by Treehouse Teacher Nick Pettit.

checkOn Freelancing - An audiobook about running your own business by Simon Collison.

Swift is a new programming language created by Apple to program iOS apps. If you are new to programming or to Swift then this course is for you. Learn about programming concepts like: variables, types, collections and control structures.

Ruby is a programming language with a focus on simplicity and productivity, and it's used to create some of the biggest websites in the world. Learn how to work with Ruby and write simple Ruby programs in this introductory course.

Interested in creating Android apps? Learn the Java programming language, a tool for Android development called Android Studio, and some very basic concepts of the Android Software Development Kit, or SDK.

Bring your big idea to life! Learn how to start a company on the right foot with an introduction to basic business concepts, including corporate structure, marketing, finance, and accounting. Then you’re ready for more advanced business strategies.