Some Links

Tutorials

License

11 : Do You Speak my Language?

“Grandpa, what was life like before computers got so small that
they became invisible”, this caption of a cartoon sums up our
predicament. How do we communicate with applications on the ever
decreasing size of our computers. On the smaller machines, even
familiar applications may look different and behave oddly. We may
need help to use the different interface. However, reading a help
document on a cramped Netbook or a smart phonce screen is not easy.
Wouldn't it be nicer if the “mouse over” text or a help message
could be read out to us instead?

Efforts at text to speech has been around long before computers
were created (see http://en.wikipedia.org/wiki/Text_to_speech).
Getting the computer to speak in a nice voice has proven to be a
remarkably hard task. Many Interactive Voice Response(IVR) systems
get around the problem by recording phrases spoken by people and
playing back the appropriate sequence of wave files. This is clearly
feasible if the size of the required vocabulary is small.

If you are willing to compromise and accept a voice which is
comprehensible, even if robotic, there are a few options. The eSpeak
(http://espeak.sourceforge.net/
) system should be first to be explored as it is widely used as a
part of the accessibility features and is the platform being used by
the OLPC/Sugar project. It is small and has support for a fair number
of languages.

Getting Started

Python provides a wrapper for using the Speech Dispatcher
server, a generic server
for text to speech (TTS) applications. It can talk to various TTS
engines, including eSpeak, flite and festival. 'Festival'
has the best voices but the voices are available easily for English
only.

You will need to ensure that several software packages are
installed. The minimal list for Fedora 10 and Ubuntu 8.10 is:

espeak

espeak-data also on Ubuntu

speech-dispatcher

speech-dispatcher-python (on
Fedora)

python-speechd (on Ubuntu)

You may wish to enter some text in Hindi. Indic Onscreen Keyboard
(available on Fedora but not on Ubuntu) is a reasonable option. The
on-screen keyboard and the keyboard layouts are provided by:

iok

m17n-contrib-hindi

You will need to configure the default engine in
/etc/speech-dispatcher/speechd.conf:

DefaultModule espeak

Since Ubuntu and Fedora are now using Pulseaudio, you should
change the audio output in /etc/speech-dispatcher/modules/espeak.conf
to pulse; otherwise, you may hear only silence(on Fedora):

EspeakAudioOutputMethod "pulse"

You will need to create the default directory for the logs. The
speech dispatcher will work but no log messages will be stored.

$ sudo mkdir
/var/log/speech-dispatcher

Now, start the speech-dispatcher service and you are ready to
start.

$ sudo service speech-dispatcherd
start

Verify that the speech dispatcher is working:

$ spd-say 'Hello, Hello, Testing
1 2 3'

If you hear what you expect, you can now proceed further.

Learning the First Steps

You need to learn what the speech dispatcher provides and how it
behaves. Interactive learning is the easiest. So, start you Python
interpreter and try the following:

>>> import speechd

>>> dir(speechd)

>>>
help(speechd.Speaker)

A little exploring and you realise that you need to create an
object of the type Speaker. So, create it and see what next.

>>> spk =
speechd.Speaker('me')

>>> dir(spk)

List commands are always helpful. So, try

>>>
spk.list_output_modules()

('espeak', 'flite')

The first command tells you the output TTS engines available. If
the default is not espeak or you wish to switch between the modules,
you can easily do so:

Hindi is an option, though as a testing version at present. Try to
say something.

>>> spk.speak('Testing,
1 2 3')

(225, 'OK MESSAGE QUEUED',
('4',))

You should have noticed a difference. May be it was too fast so,
let us slow it down:

>>> help(spk.set_rate)

>>> spk.set_rate(-50)

>>> spk.speak('Testing,
1 2 3')

Negative values of setting the rate, slow down the speech. The
numbers should clearly be spoken in Hindi.

>>> help(spk.set_voice)

>>>
spk.set_voice('FEMALE1')

>>> spk.speak('Testing,
1 2 3')

>>> spk.set_pitch(50)

>>> spk.speak('Testing,
1 2 3')

Continue your exploring. The voice isn't very feminine, so more
potential for work!

What if you enter a Hindi text message? Try using the on-screen
keyboard, iok and enter a unicode message:

>>> text=u"गब्बर
सिंह कहते धे, जो
डर गया वह मर गया."

>>> spk.speak(text)

Don't be surprised if the Hindi text looks different in various
consoles. Even the synthetic voice seems scared of Gabbar Singh! More
areas of potential improvements.

Try changing the TTS language to English and hear the difference.

Read Aloud Application

You can have the computer read aloud one of the first nursery
rhymes that I learnt. Save it in a text file, NurseryRhyme.txt.

मछली
जल की है रानी.

जीवन
उसका है पानी.

हाध
लगाओ, डर
जाएगी.

बाहर
निकालो, मर
जाएगी.

Your basic program, read_aloud.py, would look like:

import speechd

s = speechd.Speaker('ReadAloud')

s.set_output_module('espeak')

s.set_language('hi')

f = open('NurseryRhyme.txt')

s.set_rate(-50)

for line in f.readlines():

sentence = unicode(line,
'utf8')

print sentence

s.speak(sentence)

s.close()

In Python 3, all strings will be unicode, but currently, you will
need to interpret the data read as a unicode string.

You would notice that the entire poem is displayed even before the
first line is spoken. You want to display each line as it is being
spoken.

So, you need to think in terms of events. Your program needs to
wait till the TTS engine has finished speaking the line. You would
notice that the speak method accepts a call back parameter and an
events parameter. This ensures that the speech dispatcher will call
your program back after the events you have requested.

The call back method will be passed a parameter, the event, which
resulted in it being called. The speech dispatcher currently has 2
events – 'begin' and 'end'. Needless to say, you would be
interested only in the second event.

Python's threading module contains a class Event which will be
very useful for this application. So, the better version of your read
aloud application will become:

import speechd

from threading import Event

def spoken(event):

speech_over.set()

s = speechd.Speaker('ReadAloud')

s.set_output_module('espeak')

s.set_language('hi')

f = open('NurseryRhyme.txt')

s.set_rate(-50)

speech_over = Event()

for line in f.readlines():

sentence = unicode(line,
'utf8')

print sentence

s.speak(sentence,
callback=spoken, event_types='end')

speech_over.wait()

speech_over.clear()

s.close()

You have added just 6 lines, modified one line and gained quite a
bit of control over the ability to integrate the playing of sound
with the rest of the application.

Dhvani and Festival

Dhvani won an award from LFY last year and is available at
http://dhvani.sourceforge.net/.
Using the source code from the repository and a little help from
Santhosh Thottingal, I could try the above examples code. The voice
quality seems more natural than the espeak voice. I hope the
package evolves and becomes a part of the Fedora and Ubuntu
repositories.

Debian and Ubuntu have a package for a Hindi voice for festival.
The quality of voice is acceptable but not better than eSpeak's.

Final Words

If more applications used TTS, the speech quality would definitely
improve. If speech quality were better, more applications would have
used TTS.

Fortunately, smart cellphones, netbooks and ebook readers are
going to change the dynamics of the above deadlock. Adding language
support to eSpeak project is likely to get acceptance the
fastest. (On how to go about it:
http://espeak.sourceforge.net/add_language.html
)

If user applications rely on speech dispatcher as Python
encourages, it is easier to use a TTS engine optimised for specific
languages, e.g. dhvani for Indic scripts. It is just a
configuration detail.

It is worth adding TTS to your applications today. If for no other
reason, it attracts attention.