Let Linux Speak

“User root is now on-line”. Words to be
dreaded when one is away from the terminal, and not logging in
otherwise. But how does one know what is going on with one's
machine when not in front of it? If only the machine could tell
you. In this article I discuss a tool which enables your machine to
do just that.

It all started a year back, when, thumbing through one of
those odd electronic magazines, I came across an ad for a little
speech synthesizer. This device was essentially a low cost
serial-based text-to-speech synthesizer using the SPO256-AL2 chip.
I believe this was the same chip used in the original Mattel
“Speak & Spell” toy.

After a couple of months, I thought about it again and
decided I just had to have it. Certainly, the price was right
(about US $50.00), and serial ports grew on my main Linux machine
like branches on a tree. So I ordered one. After a few weeks, I
called and was told my order had just been hand-made and would be
out in a few days. It is a delight to find hand-made electronics in
these modern times—almost like the days when furniture
manufacturing involved real craftsmanship.

In any case, the unit arrived as promised, complete with
schematics, a disk filled with DOS programs and a thin manual. The
disk I have yet to look at; after all, this was for use with a
Linux machine. The board slid into a PC slot easily enough. The
card uses the PC slot for power only. An RS-232 connector in the
back connects to a serial port. A separate stand-alone power unit
and case is available for $29.00 more. But having another power
pack to plug in was enough to keep me awake at night. A slot I
could afford; though I now foresee the time when I will fill up all
eight slots in the machine.

The board has its own built-in speaker and an RCA jack. The
RCA jack I quickly adapted to feed the background music (BGM)
source on my PBX at home. (Okay, so it's really a Panasonic digital
hybrid key system, to be technical, although it has ambitions.) I
connected the serial port and got a brief noise as DTR was raised.
I shortly learned this was supposed to say “Okay”, but the
impedance-matching on the RCA jack was poor.

Next, I changed the stty settings on the port to match the
speed I had selected for the device via dip switches, and, with
high expectations, I tried a simple test:

echo "Hello, my name is Rochester" >/dev/ttyS2

The monotone response I received back sounded a little like
“Hewlo, my name is Rokheestar” and reminded me of my last visit
to Atlanta, where they use a deliberately harsh-sounding cybernetic
voice on the inter-terminal shuttle trains. Hmmm, maybe it is time
to look at the manual, and maybe even that disk...

Several limitations and problems became immediately obvious.
The first was the text-to-speech algorithms handled words only.
Numbers are simply spoken as a series of digits. Hence 91 becomes
“nine one”, instead of “ninety-one”. This can be solved by some
simple look-up tables and text substitution.

Second, while technically the device acts as a
text-to-phonetic speech device, no special means, such as control
or escape sequences, allow direct access to the phonetic elements
and sounds the device can produce; the text-to-speech code hides
them. This second limitation can be resolved by using alternate
spellings, though not necessarily phonetic spellings, that saturate
the internal algorithm toward different phonetic choices. A little
experimentation was required to get a good idea of how the device
actually translated text to speech.

Since extensive table substitution was now needed, I
considered the next logical step; to develop a driver as a front
end for the device. Ideally, any driver should be able to read
straight text the way a person normally would. First, numbers
should be pronounced as numbers and not as digits. Similarly, many
common numeric constructs used in normal text—such as currency
amounts, standard formatted date and time fields, percentages,
telephone numbers, etc.—have pronunciation rules I wished to
encapsulate and emulate properly. The Internet has its own idioms,
like x@y.z, which should be pronounced as “x at y dot z”. I
decided to cover all of these, as well as in-line text substitution
for correct word pronunciation.

In the end, I decided on a server sitting on a TCP socket.
The server would accept a connection from the user application on a
known port and pronounce any text received according to a
reasonable set of rules (as stated above). I added an escape mode
to allow for spelling words out and single-digit announcement
modes. I could establish a simple telnet session with the server,
then test the device by typing text.

The TCP server offered another advantage. Only one
application can be serviced by the device at a time—otherwise
speech would be garbled together from multiple sources. The use of
a TCP session assures that only one connection would be accepted by
the server and kept active until closed by the client. Other client
applications can block as backlog while waiting for the current
application to finish talking. The simplicity offered by
backlogging, over the use of lock files was the reason I chose to
use a full server instead of a task initiated by inetd.

With the server in place, it was only a matter of time before
speech synthesis would pervade other system services. The first use
I made of the server was to monitor my BBS system. By connecting it
to the user login quota manager, I could have the device announce
as users logged in and out. Similarly, the traditional sysop page
can be carried over this device.

Eventually I tied the SPO server into my implementation of
the wall command and then created other
utilities to provide verbal monitoring of my Internet server.
Verbal monitoring would watch for and announce new e-mail for me,
as well as basic system stats such as uptime and disk usage every
hour. As all this speech can be annoying at night, I added a simple
muting schedule to the server. Most curious and entertaining is my
replacement for shutdown, called simply “down”.

For system monitoring, the speech device has proven to be
quite a useful tool—not a nuisance. The server was developed for
the ability to read written text and properly pronounce common
usages and conventions, and while I use this capability minimally,
others might have more occasion for it. The pronunciation
dictionary can be expanded as needed to cover a wider range of
words as they are identified in everyday use.

One use for the device which was suggested to me is as a
screen reader for visually-impaired computer users. Another
application I am looking at is in parking incoming phone calls and
paging or announcing calls through the telephone system. I have
often wished the board included a DTMF tone generator and a SLICK,
so I may look at modifying the schematics provided.

The SPO-256-AL2 text-to-speech board described here may be
purchased through B.G. Micro, P.O. Box 280298, Dallas, TX 75228
(214) 271-5546. The Computalker lists for around $50.00 (U.S.) as a
PC card or $80.00 (U.S.) stand-alone with a power adapter. Chips
are available separately, and I believe the Computalker may be
purchased in kit form.

While the SPO is serial-based and can be used on almost any
machine or OS, I originally obtained it for use on my main server,
which runs Linux. For this reason, the speech server was developed
and tested under Linux. The server was originally developed using
libraries and part of the code base of my BBS package, so these are
included as part of the published source. I am working on a more
portable public source implementation that should be more easily
and widely compatible to non-Linux systems as well. I must go now,
as I am being paged...

David Sugar
Best known for WorldVU, a public BBS system for
Linux, he is currently employed as director of software
engineering for Fortran Corp. and uses Linux for commercial
telephony development. He maintains his own Internet server under
Linux.

Trending Topics

Upcoming Webinar

Getting Started with DevOps - Including New Data on IT Performance from Puppet Labs 2015 State of DevOps Report

August 27, 2015
12:00 PM CDT

DevOps represents a profound change from the way most IT departments have traditionally worked: from siloed teams and high-anxiety releases to everyone collaborating on uneventful and more frequent releases of higher-quality code. It doesn't matter how large or small an organization is, or even whether it's historically slow moving or risk averse — there are ways to adopt DevOps sanely, and get measurable results in just weeks.