Linux as a Telephony Platform

In “Let Linux Speak”
(LJ, January, 1997), I demonstrated some fun
applications for the SPO256 text-to-speech board. Buried in that
article was a brief discussion on the potential for using
text-to-speech as a telephony resource and for using Linux as a
telephony services platform.

Talking about “telephony services”, or computer telephony
in general, can mean many things. The history of computer
telephony, while an interesting subject, is not our primary focus.
Instead, I will discuss the use of Linux as a platform for voice
response and path switching, for PBX integration and switch call
control, and for the extension of traditional voice applications
onto the Internet.

Telephony Overview

Voice response includes many applications such as traditional
voice mail and interactive voice response (IVR) systems utilized as
automatic “dial and survey” machines. These applications are
typically built around multi-channel voice telephony boards that
capture and play back digitized speech and that generate and listen
for DTMF digits and call progress tones. The more advanced of these
boards offer on board DSP resources, emulate FAX/modem services and
perform speech recognition. The largest single vendor of this kind
of board is the Dialogic Corporation.

PBX integration involves direct computer control of a PBX
switching system. Many vendors have specialized boards and/or
serial interfaces which run proprietary protocols for gaining
access to different switch features. Generally, PBX integration is
implemented as a telephony server or API, such as Microsoft TAPI or
TSAPI (currently supported only under Novell Netware). Usually,
these APIs implement first party call control for desktop
applications (such as putting a telephone image and dialer on a
desktop, controlling a digital telephone directly as a “terminal
device”) or as third party call control for server applications
(such as ACD: automatic call distributors).

The whole area of Internet telephony is vastly interesting
and intriguing. Most often, the first thing that comes to mind when
one says “Internet telephony” are those nifty programs that allow
computer users to place low-grade international telephone calls for
free over the Internet. This same technology, when applied on a
private corporate LAN with sufficient bandwidth, could provide a
cheap means of inter-office switching (much like tie line services
and expensive private T-1 networks) and a better solution for ACD
agent positions.

Traditional low-cost telephony solutions have historically
been implemented either under MS-DOS (with, perhaps, a custom
real-time kernel) or under OS/2. The need for highly specialized
real-time operating systems to drive multi-channel voice
applications has disappeared as a result of increased CPU power,
and equally important, the increased sophistication and power of
add-on telephony boards. Many of these boards now manage I/O and
call state largely on their own with only an occasional need for
direct intervention. In the past guaranteed maximum interrupt
latency was the mantra for evaluating real-time performance in
complex voice processing systems; I now find support for real-time
predictable scheduling policies more important.

A Look at Operating Systems for
Telephony

While Windows NT is commonly touted as a “telephony
operating system” these days, there are several serious
considerations. The first is simply expense; a Windows NT machine
means a machine with a video display. More often, telephony
applications are deployed on dedicated stand-alone machines which
sit in phone closets and, ideally, require only remote
management.

Some of the same optimizations that make NT work better than
Windows 95 as a desktop machine get in the way of using it for
telephony applications, or for use as a telephony server and
workstation concurrently. For example, one finds strange scheduling
quirks that occur as NT optimized video drivers, which are now
given the highest priority, update large areas of the
screen.

Finally, even in today's world of cheap RAM, NT requires a
minimum of 32MB, while Linux runs smoothly in 8MB or less. Even in
the low-end commodity voice market, where MS-DOS-based voice mail
systems predominate and a $50.00 change in margins can make or
break a product, these costs are very important. Now, imagine a
$2000 voice mail system, or, even better, a $1000 voice mail
“machine”, with desktop integration, multi-site networking and
voice/email exchange, and what that system would do to the bottom
of the commercial voice mail market.

Why not OS/2? Well, first there is always the question of
“will it be around?” Second, the last non-desktop optimized
release of OS/2 was 1.3, and it is still the most commonly used and
supported release of OS/2 in voice processing products today. OS/2
driver support still exists in the voice response OEM marketplace,
but it is not a rising star.

Why not DOS? Simply put, one cannot easily run network
services from a DOS machine. In tomorrow's world, voice mail will
have to present voice messages on the desktop, whether through
proprietary means or through a web server and standard e-mail
protocols. Other advanced user applications and networking services
will need to be leveraged onto these once dedicated stand-alone
voice processing machines.

And what about Unix? For many years, some variants of Unix
have been used successfully in voice processing, typically in
vertical market applications. The complete failure of the major
Unix vendors to understand the CTI market and create appropriate
software licensing terms or stripped down embedded releases have
kept the cost of using these systems prohibitively expensive as a
general purpose CTI platform. For example, a Unix machine for voice
processing may not need NFS, many user utilities or X Windows.
However, it does need sockets and a web server for administration
and desktop telephony. No major Unix vendor seems to know how to
properly license such a stripped down, embedded
configuration.

So what are we left with? An inexpensive operating system
capable of using inexpensive hardware, of running a mix of user and
real-time scheduled processes, of remote management without the
need for a local console, of integrated networking and of giving
months of reliable service unattended. Only Linux and Free-BSD fit
these criteria.