Main Menu

How to Add Text-to-Speech Feature on Any Web Page

The text-to-speech feature refers to the spoken narration of a text displayed on a device. At present, devices such as laptops, tablets, and mobile phones already have this feature. Any application running on these devices, such as a web browser, can make use of it, and extend its functionality. The narration feature can be a suitable aid for an application that displays plentiful text, as it offers the option of listening to website visitors.

The Web Speech API

The Web Speech JavaScript API is the gateway to access the Text-to-Speech feature by a web browser. So, if you want to introduce text-to-speech functionality on a text-heavy web page, and allow your readers to listen to the content, you can make use of this handy API, or, to be more specific, its SpeechSynthesis interface.

Initial code & support check

To get started, let’s create a web page with me sample text to be narrated, and three buttons.

The buttons will be the controls for the narration. Now we need to make sure if the UA supports the SpeechSynthesis interface. To do so, we quickly check with JavaScript if the window object has the 'speechSynthesis' property, or not.

If speechSynthesis is available, first we create a reference for speechSynthesis that we assign to the synth variable. We also initiate a flag with the false value (we’ll see its purpose later in the post), and we create references & click event handlers for the three buttons (Play, Pause, Stop) as well.

Create the custom functions

Now let’s build the click functions of the three individual buttons that will be called by the event handlers.

1. Play/Resume

When the Play button is clicked, first we check the flag. If it’s false, we set it to true, so if any time the button is clicked later, the code inside the first if condition won’t execute (not until the flag is false again).

Then we create a new instance of the SpeechSynthesisUtterance interface that holds information about the speech, like, the text to be read, speech volume, voice spoken in, speed, pitch and language of the speech. We add the article text as parameter of the constructor, and assign it to the utterance variable.

We use the SpeechSynthesis.getVoices() method to designate a voice for the speech from the voices available in the user’s device. As this method returns an array of all the available voice options in a device, we assign the first available device voice by using the utterance.voice = synth.getVoices()[0]; statement.

The onend property represents an event handler that is executed when the speech is finished. Inside of it, we change the value of the flag variable back to false so that the code that starts the speech can be executed when the button is clicked again.

2. Pause

Now let’s create the onClickPause() function in which we first check if the narration is ongoing and not paused. We can test these conditions by making use of the SpeechSynthesis.speaking and the SpeechSynthesis.paused properties. If both conditions are true, our onClickPause() function pauses the speech by calling the SpeechSynthesis.pause() method.

Note that on the cancellation of speech, the onend event is automatically fired, and we had already added the flag reset code inside of it. However, there’s a bug in the Safari browser that prevents this event from firing, that’s why we resetted the flag in the onClickStop() function. You don’t have to do it if you don’t want to support Safari.

Browser support

All latest versions of modern browsers have full or partial support for the speech synthesis API. Webkit browsers don’t play speech from multiple tabs, pausing is buggy (works but buggy), and speech isn’t reset when the user reloads the page in Webkit browsers.

Working demo

Have a look at the live demo below, or check out the full code on Github.