Making Chatbots Talk with IBM Watson Text-to-Speech

Voice is becoming a dominant bot interface, and speech-enabled devices continue to grow in adoption. Digital assistants, or bots, as they are widely known nowadays, speak more like humans. Their popularity is growing with a multitude of apps that can make your life easier, and their application is found in domains like customer service, home automation, chat and news readers to name a few.

In this tutorial, we’ll show you how to build a basic text-to-speech chatbot. In this specific example, we’ll make a chatbot that analyzes a stream of price updates for the stock market, and talks based on certain conditions. The TradeBot, as we’ll call it, will speak out a message when selected stock prices cross certain predefined threshold levels.

Overview of Watson Text-to-Speech

IBM Watson Text-to-Speech is an API apart of the offerings on the IBM Bluemix platform. It provides an API to convert written text to natural-sounding speech.

The service supports various languages and voices and accents to choose. It even supports customizable cadence, tone, emotions and expressiveness such as speaking about good news, or an apology, or uncertainty. You can check out a demo of the service here.

TradeBot: A Voice-Automated Stock Trading Agent

One important trait of successful traders is that they always keep themselves updated with stock price movements. But everyone knows that’s impossible to do manually. We’ve put that burden onto computers, and in this case, Tradebot.

You’ll be able to configure the Tradebot with some stock counters and define their upper and lower thresholds. The TradeBot will give natural sounding feedback in male voice if the lower threshold is crossed and in female voice if upper threshold is crossed.

App Design

The speech synthesis capabilities of Watson are what gives the Tradebot its voice capabilities. The HTTP REST APIs access this service, via PubNub Functions, a serverless microservice. PubNub Functions host the service side business logic for the Tradebot and orchestrates the client requests with the Text To Speech service.

PubNub works on the publish/subscribe model of communication, where a publisher can publish a message on a channel, and any subscriber subscribed to that channel can receive it.

The sequence of operations to activate voice alerts for the Tradebot is as follows.

The stock exchange captures the random movement of stock prices and publishes them.

Once a stock price crosses a predefined threshold, the app publishes a message to PubNub network on a particular channel.

Within PubNub network, the PubNub Functions microservice calls IBM Text to Speech service API which returns a URL of the synthesized speech from the text message.

PubNub Functions adds this URL to the text message payload and publishes the payload to TradeBot client which is already subscribed to the particular channel.

TradeBot subscriber client downloads the synthesized speech message and then plays it back.

Since we do not have access to real stock market feed, the stock exchange environment is simulated with random price variations.

Building the Voice-Assisted Trading Bot

Setup

Now, you’ll build the TradeBot app. The source code and instructions to run the app are available here on GitHub. Refer to the README file to setup the services required to host and run this app.

Before you attempt to recreate this demo, make sure that you are subscribed to IBM Bluemix and PubNub account. Visit IBM Bluemix and PubNub to register yourself. Both the services offer a free tier subscription plan to play around with their services.

Software Components

Here are the software components and cloud services used to build Tradebot:

TradeBot app (Node.js) – A Javascript runtime environment to run the TradeBot client app for playing out voice alerts.

Watson Text to Speech – A service deployed on the IBM Bluemix platform. IBM Text to Speech API provides the service for converting text messages to speech.

PubNub – A realtime data streaming network based on a publish-subscribe mechanism. Devices publish messages on a particular channel to PubNub network acting as a broker, and these messages are received by devices subscribed to that channel.

Functions, a lightweight runtime that can execute any business logic within the PubNub network. A PubNub Function can process published messages either before or after they are passed on to subscriber devices.

For demonstrating the app functionality, JavaScript scripts are used to simulate the stock market and TradeBot client. It is also possible to make a Web app for TradeBot and directly stream the synthesized audio generated from IBM Text to Speech service.

The business logic of PubNub Functions also runs on JavaScript.

Demo

After building the app and creating all required services, you can start experiencing a live speech-enabled bot trading environment. Take a look and listen to how the TradeBot behaves in response to stock price variations.

As you can see and hear in the video, two separate scripts are run. The first script runs the TradeBot for generating voice alerts, and the second script simulates the stock exchange.

Enhancements

You have successfully implemented a speech-enabled trading bot using Watson Text to Speech service and PubNub.

What more can you do to enhance this bot?

As you experienced from the demo, this TradeBot actuates your hearing senses. For intraday traders, this is a real lifesaver as the visual sense cannot always keep pace with the fluctuating stock prices. Hence, one of the notable feature additions to this app can be to speak periodic price updates without waiting to cross the thresholds.

Also, PubNub Functions can be extended to hook this app to a real stock feed from various exchanges across the world. PubNub also makes it easier to scale the TradeBot app to deliver speech messages across a number of devices simultaneously.

Besides that, using the Storage & Playback of PubNub, the TradeBot can give audible feedback about historic price movements of a stock.

Conclusion

IBM Text to Speech service with its low latency synthesis of audio makes it easy to augment voice-enabled features to your application. The capability to customize the speech further with expressiveness and voice transformation makes it simpler to move away from the monotony of robotic voices.

One of the most impactful use cases of this service can be applications for vision-impaired people. There are a number of other potential applications as well. For example, reading aloud the morning news when you get ready for the day, or reading aloud texts or mails while driving so you can keep your eyes on the road are great use cases. Another important area is in development of chatbots for customer service interactions.

So what are you waiting for? Gear up and start building awesome voice-enabled applications. Here are the docs to help you get started and learn about IBM Text to Speech service.