Speech Output for the IBM Personal Computer

by Curtis Chong

(As a general practice the Monitor tends to avoid articles that are technical in nature and aimed at a specific group. However, the computer has poked its terminal into so many corners of our lives that a discussion of its vagaries and mysteries is both timely and relevant. The blind have access to more print material today than ever before in history, largely because of the computer, but many blind persons do not have the necessary data to get the tools and take advantage of the opportunities which they can now have. This is why we are carrying this article. Curtis Chong is one of the leaders of the NFB of Minnesota. He is also the President of the National Federation of the Blind in Computer Science. In a recent letter to President Jernigan he said in part: "One of the biggest problems with the speech software market for the IBM PC is that it changes too rapidly. The information printed in Computer Science Update, the newsletter of the NFB in Computer Science, is already out of date.

"In order somewhat to rectify this problem, I have prepared a separate reprint of the article on speech output for the IBM Personal Computer. A copy of that reprint is enclosed for your information.

"It is my belief that readers of the Monitor would find the article of interest. However, I have some concern that the information which it contains may be out of date by the time it is reprinted. If nothing else, however, the article does give the reader a fairly comprehensive list of the vendors of speech software for the IBM PC. If you decide that is is appropriate to use the article in the Monitor, please encourage people to contact the NFB in Computer Science for current information about specific software."

Author's Note: The speech software market for the IBM Personal Computer is in a constant state of flux. Already, another speech software package has entered the market: Soft VERT from Tele sensory Systems, Inc. (TSI). As of this writing, I have not had an opportunity to fully evaluate the product. However, I do know that TSI is selling Soft VERT for $750. Observe that Soft VERT is definitely the highest priced speech software package on the market for the IBM Personal Computer. However, Soft VERT is loaded with features--features that some people might want and others might not. If you want additional information about Soft VERT, contact Telesensory Systems, Inc., 455 North Bernardo Avenue, P.O. Box 7455, Mountain View, California 94039-7455, Phone: (415) 960-0920.

I have received many calls from people wanting to know about speech output packages for the IBM Personal Computer (PC). Blind people are being exposed to the PC in a wide variety of work and educational environments, and the word has spread across the country that generalized speech output software exists to make the PC talk. What people don't know, however, is who markets these packages, which package is "the best," and will these packages allow you to use any program that can run on the PC.

Speech output for the IBM PC essentially consists of two items: the speech synthesizer that actually does the talking plus the software that actually tells the PC what to send to the synthesizer.

SPEECH HARDWARE FOR THE PC:

Two brands of speech synthesizers are worth considering--especially, if you are paying for the equipment with your own funds. The Echo brand of speech synthesizer is manufactured by Street Electronics and costs about $250. Votrax currently markets two speech synthesis systems: the Type 'N Talk and the Personal Speech System (PSS). Although the Type 'N Talk is cheaper than the PSS, and although it has a speech rate control knob on the front panel, I do not recommend it. This is because there is no way to programmatically kill

speech output. On the other hand, the Votrax PSS and the Echo will accept a command from the computer which instantly stops speech output.

The Votrax PSS and the Echo can both be purchased from a single outlet. Sense-sations (919 Walnut Street, Philadelphia, Pennsylvania 19107 (215) 6270603) carries both the Echo and the PSS as well as other speech synthesis systems. Sense-sations has done what it can to keep your costs down.

Before deciding between the Echo or the PSS, you may want to keep these things in mind:

1. The Echo synthesizer is cheaper than the PSS. The PSS costs about $350, and the Echo costs about $250.

2. The speech produced by the Echo is more metallic and less like a human voice than the PSS.

3. This is offset by the fact that the Echo is a more responsive synthesizer than the PSS: it introduces no appreciable delay due to carriage returns, and you can cause it to speak as little as .2 seconds after receiving the last character that was not a carriage return.

4. On the other hand, the PSS has more sophisticated speech logic built in. The PSS can store an exception table to control how specific character strings will be pronounced. In addition, the PSS sounds more human-like; and for those of you who like to play with sound effects, you can program the PSS to play tunes.

5. The Echo will automatically spell character strings that do not contain vowels (e.g., NFB). The PSS will not.

SPEECH SOFTWARE FOR THE PC:

One of the most gratifying things about the rapid proliferation of the microcomputer has been the tremendous development of speech software packagesnfor the IBM PC. Today, no less than six speech packages are available for the PC. This means that you, the consumer, can pick the package that best meets your needs instead of having to rely upon one or two companies to provide you with something that really doesn't help you or, perhaps of greater concern, far exceeds your budget.

Before looking at each package, here are a few questions that you should keep in mind:

1. How much does the package cost? How much does it cost to receive future releases of the product? Is there a maintenance fee that must be paid so that you can receive help from the vendor?

2. What speech synthesizers are particularly well suited to the package?

3. Does the package have a "Review Mode"--that is, can you use the package to hear data displayed on the video monitor? If Review Mode is provided, is it possible to set up sections of the screen or "windows" that limit what is spoken? What features are provided, in or out of Review Mode, to help you move your real (system) cursor to the location of your reviewing cursor? What are some of the other significant features provided?

4. How does the package handle punctuation? Are multiple levels of punctuation provided that will allow you to hear some special characters in one level but not in another?

5. How are upper case or capital letters detected?

6. What features are provided to enable the package to speak automatically whenever the computer sends something to the video monitor?

7. What keys on the keyboard are used to control the package, both in and out of Review Mode? For those keys used outside of Review Mode, what is the likelihood that they will interfere with the normal operation of the application program that you are running?

8. To what extent does the package function on IBM PC compatibles?

9. How does the package work with terminal emulation software such as the Irma or Ideacomm systems used for IBM 3278 terminal emulation?

10. Can the package be tailored for specific speech synthesizers, baud rates, and communication ports?

The address and phone number of each company will be listed at the end of this article. In the meantime, let me say that three of the six companies did exhibit at this year's National Federation of the Blind Convention: Computer Conversations, Interface Systems International, and Computer Aids. ARTS Computer Products made a deliberate choice and did not appear at our convention. Enable Software did not finish the PC version of its Enable Reader in time to make the convention. As for Mark Enterprises, the sole proprietor of the company, Arnie Miller, was unable to attend our convention because of his regular full-time job.

Now, let's look at each package and see how it stacks up against the questions listed above.

ENABLE READER: This package, marketed by Enable Software, is the newest speech output system for the IBM PC. It costs $500. I have no information about costs for future versions or possible maintenance fees.

Enable Software states in its literature that three versions of Enable Reader are available: one for the Votrax PSS, one for the Echo, and one for DECTALK (a high quality speech synthesis system that costs about $4,000). You, the customer, must tell Enable Software what synthesizer you want the software to work with when you place your order. The Enable Reader software provides fifty separate commands in its Review or "Reader
" Mode. In Review Mode, you can move quickly around the screen. Keys are provided to enable you to "jump" to a specific line and column. In Review Mode, you can set up windows and read within a window either one line at a time or the entire window. In Review Mode, you can determine the location of your system cursor only when you first enter the mode; once you start moving about the screen, the location of the system cursor is lost. Outside of Review Mode, no provisions are provided for using the software. In order to move your system cursor to a possible mistake, you must first find the mistake in Review Mode, remember what row and column the mistake was on, exit Review Mode, and move your system cursor to the desired location. By the way, you may want to know that Enable Reader announces the location of the cursor assuming that the top left corner of the screen is line 0, column 0. The bottom right corner of the screen is referred to as line 24, column 79. It is important that this method of announcing the cursor location be understood as it is not used by most of the speech output systems being marketed today. Most speech output systems sold today assume that the top left corner of the screen is line 1, column 1.

Enable Reader provides you with five levels of punctuation. Also, you can set up something called a punctuation filter which allows you to suppress the pronunciation of specific punctuation characters.

Capital letters are indicated by the word "upcase." In fact, you can turn the detection of upper case letters on and off.

Enable Reader intercepts normal DOS system calls and causes data sent to the screen to be spoken automatically. Many programs, however, write data directly to the video buffer, and Enable Reader does not automatically speak data sent via this method. However, you can use Review Mode to get the information you need from the screen.

Enable Reader provides a keyboard intercept feature which enables it to echo the keystrokes you type in even though data being sent to the screen may be bypassing the normal DOS function calls. You can choose either to hear each character as you type it or you can set up a delay factor for as little as .5 seconds--meaning that nothing will be spoken until .5 seconds after you have stopped typing. Using the appropriate delay factor, you can hear what you type as words rather than individual characters.

Outside of Review Mode, Enable Reader only recognizes the Control-Right-Shift key. This key is used to place the program into "Reader" Mode which is the Mode you use to scan the screen. In this mode, it is impossible for Enable Reader to damage anything on the screen. No function keys or cursor movement keys are used by Enable Reader in Review or "Reader" Mode. Instead, the program recognizes the normal alphabetic, numeric, and punctuation keys as command keys. Of particular interest is the "identify" key which enables you to find out what each specific key on the keyboard is expected to do when pressed. As far as conflicting with other software, there is some possibility that the Control-Right-Shift may be required by your application program. If so, there is some possibility that Enable Reader will not function.

Enable Software has stated that it would like Enable Reader to become a standard for all 16-bit machines. This implies that Enable Reader will work on a lot of PC compatibles. However, I have received no definitive information from the vendor regarding this.

No information is available at this time regarding Enable Reader's performance with terminal emulation systems. The product is so new that not enough time has passed for experience to be gained in this area.

As I said earlier, Enable Reader can be delivered in three versions for three different speech synthesizers. However, I do not believe that you can tailor a specific version to work with another synthesizer except by programming certain keys in Review Mode each time you start up the program.

Enable Reader communicates with your synthesizer through COM1 at 9600 baud. No provision is made for you to change these settings.

ENHANCED PC TALKING PROGRAM: Marketed by Computer Conversations, this package costs $500 prepaid and $700 with purchase order. However, the owner of the company, Ron Hutchinson, says he will negotiate a lower price for any blind person whose circumstances do not permit payment of the full price. The Enhanced PC Talking Program is the only one of the six packages that permits you to obtain information about data on the screen without having to use a "Review Mode." This "Interactive Mode," as the vendor calls it, enables you to deal with the display in "real time." Using the Enhanced PC Talking Program, it is rarely necessary to go through the cumbersome exercise of synchronizing the system cursor with a reviewing cursor. In real-time mode, you can determine the location of the system cursor, read the current line (or any other line on the screen, for that matter), read predefined windows on the screen, and read from the cursor position to the right.

The Enhanced PC Talking Program has a feature that enables you to determine the visual characteristics of data being displayed on the screen. With this feature on, you can detect highlighting; low, normal or high intensity characters; and reverse video.

The Enhanced PC Talking Program works with both the Echo and the Votrax (both PSS and Type 'N Talk). Also, it can be configured to work with the DECTALK.

By using a feature called "Roaming Reader," it is possible for you to move a reviewing cursor around the screen to hear what is being displayed. It is possible to place a marker at the location of the cursor which can then be used to define the upper left and lower right corners of a subset of the screen. Through the use of the Left, Right, Up and Down Arrow keys, the program will speak each character the system cursor lands on, thereby providing you with a talking system cursor in real-time mode. This makes it easy to correct mistakes, for example, in a word processor. What you hear is where you are.

The Enhanced PC Talking Program supports one level of punctuation detection. Punctuation pronunciation is either on or it is off. Blanks are not pronounced by this mode. Blanks are spoken either by turning on a special blank pronunciation mode or through the use of the cursor control keys--that is, the talking cursor.

If the cursor control keys are used to move about the screen, the word "cap
" is used to indicate a shift from lower to upper case. When the program encounters a shift in the opposite direction--upper to lower case, the word "lower" is used. One mode of operation tells you automatically whenever there is a case shift in the middle of a word.

Like the Enable Reader, the Enhanced PC Talking Program intercepts normal DOS function calls, thereby automatically speaking data sent to the screen via this method. The Enhanced PC Talking Program will not automatically speak data sent to the screen by programs that write directly to the video buffer. However, judicious use of the "Roaming Reader" and the other interactive functions of the program can help to alleviate this problem.

The Enhanced PC Talking Program can echo individual keystrokes like Enable Reader. You can also set it up to echo individual words instead of individual characters. In other words, as you type, the program sends the individual keystroke to the synthesizer. No carriage return is sent to force the synthesizer to speak until you hit either a space or a punctuation character.

The Enhanced PC Talking Program makes use of the function keys, the cursor movement keys and the Tab key (among others) both in and out of the "Roaming Reader." In its default mode, the program makes use of the Function keys to execute various tasks such as the announcing of the cursor location, announcing the contents of the current line, etc. If the application you are running requires the Function keys, it is a simple matter to switch the Enhanced PC Talking Program to use the Alt Function keys. If you would prefer to continue operating the program with the Function keys and your application must receive a Function key to carry out a request, you can hit the F10 Function Key twice, then the Function key you want to be passed to your application. This means that under no circumstances will keys used by the Enhanced PC Talking Program conflict with keys required to run your application.

As of this writing, the Enhanced PC Talking Program works on about 50 IBM PC compatibles. For specific information about which computer the program will run on, you should contact the vendor. Computer Conversations has done a fair amount of work with IDE Associates. Consequently, the Enhanced PC Talking Program works well with the IDEACOMM 3278 emulation system. Computer Conversations also has available a package that enables the IBM PC to emulate a DECVT100. This, too, works with the Enhanced PC Talking Program.

The Enhanced PC Talking Program will communicate with your speech synthesizer either through COM1 or COM2. However, the baud rate must always be set at 9600.

FREEDOM1: Marketed by Interface Systems International, this program is priced at $499.

Freedom1 was designed to work specifically with the Votrax Personal Speech System (PSS), and a special version has been developed for DECTALK. I am told by the vendor that the newest version of the program will come with an installation procedure that will enable it to work with other synthesizers (e.g., the Echo).

Unlike all of the other packages discussed here, Freedoml 's only purpose in life is to act as a highly sophisticated screen reader. All of its functions are executed in Review Mode. In Review Mode, you can define subscreens--that is, the upper left and lower right corners of areas, the locations of which can be saved on a disk file. The newest version of the program has a feature called "cursor joining" which permits the program to move the system cursor to the location of the reviewing cursor. However, you should be aware that this feature is not guaranteed to work with all programs. Freedoml comes with a "find" command which permits you to locate a character string on the screen--either by searching forwards or backwards from the current reviewing cursor location. In the latest version, it is possible for you to "program" keystrokes so that multiple Freedoml commands may be executed with a single keystroke. Like Enable Reader, Freedoml announces the location of the cursor assuming that the top left corner of the screen is line 0, column 0. Freedoml comes with a punctuation mode that you can turn on and off. Another function must be turned on if you wish to hear spaces spoken. Freedoml also comes with a "numerics" mode which will convert strings of numbers into full words. For example, with this mode turned on, the string "9999999" will be pronounced "nine million nine hundred ninety-nine thousand nine hundred ninety-nine."

Upper case letters are indicated by a spoken word rather than raising the pitch of the speech.

Freedoml is, if nothing else, a sophisticated screen reading program. No provision is made in the software to automatically speak system responses. There is no way to have individual keystrokes or words echoed when you type them in. Everything that Freedoml speaks is initiated by a specific request from you, the user.

Outside of Freedoml's Review Mode, only the Tilda key is recognized--that is, the program remains passive until this key is pressed. Once you have depressed the Tilda key, the program gains control, and numerous keys become available for executing the numerous reading functions and modes that are available. To exit the program and return to your application, simply press escape.

While under Freedoml's control, commands can be executed from the home keys. With simple keyboard commands, you can, among other things, instruct Freedoml to find a character string (forwards or backwards), move to and say the next word (also forwards or backwards), or to move forwards or backwards one character at a time.

The Tilda key is used to put you into Review Mode--which is the only mode in which the program operates. Therefore, the Tilda key is not available for use by other applications you may have running on your PC. No other keys used by Freedoml will conflict with other applications.

I do not know how many PC compatibles can run Freedoml; however, I do know that there are some. If you want more specific information as to which PC compatibles will run the program, you should contact the vendor.

No information is currently available about how Freedoml runs with terminal emulation systems other than the fact that it does not now run with the Irma 3278 emulation system. In the first version of Freedoml, data could only be sent to your synthesizer through COM1 and at 9600 baud. No information is available as to whether the new version (released in July, 1985) will provide greater flexibility.

PC SPEAK: PC Speak is marketed by Mark Enterprises and sells for $475. It was the first speech output package developed for the IBM PC.

PC Speak is designed to work with a number of speech synthesizers including Echo, Votrax, Intex, and Microvox. However, this only means that different control sequences are transmitted to the speech synthesizer when the program is started up. During normal operation, PC Speak behaves the same way for all synthesizers. This means that you cannot use the program to accomplish such tasks as varying the speech rate, raising or lowering the pitch or stopping speech. However, as you will see later, it is possible to overcome this problem to some extent.

PC Speak intercepts normal DOS function calls, thereby providing some form of automatic speech. It has a Review Mode which is entered by pressing Control-Numlock. No provision is made for the creation of windows or sub screens. There is also no simple way to bring the system cursor to the location of the reviewing cursor. PC Speak can be set up to echo individual keystrokes--that is, to intercept data as it is entered from the keyboard. This enables the program to echo individual keystrokes even when you are running a program that writes directly to the video buffer. In Review Mode, you can obtain information about the status of the Insert, Shift, and Numlock keys.

PC Speak has a unique way of handling the pronunciation of special characters (e.g., period, comma, exclamation point, etc.). Ordinarily, no punctuation characters are spoken. However, the program comes with a set of character substitution tables that tell it to convert specific characters into a string that will sound like something meaningful. Using character substitution tables, you can easily control what characters PC Speak will pronounce and how PC Speak should pronounce them. For example, if you are running a word processor, you might want PC Speak to pronounce a "" as "hyphen." In a spread sheet program, you might want "-" pronounced as "minus."

The Character substitution tables can be used to program command sequences to your speech synthesizer. You can choose characters above ASCII 128, for example, and set each one up to convert to a string of characters that will cause the synthesizer to perform the desired function.

Upper case letters are indicated by the word "capital" spoken before a specific letter. You must request specifically that a word be spelled in order to hear upper case characters within it. Outside of Review Mode, Control-Numlock is the only key really recognized by PC Speak. Control-Numlock places you in Review Mode, at which point the cursor control keys and the tab key can be used to move you around the screen. You can have individual lines, words, or characters spoken in Review Mode. This architecture means that the keys used by PC Speak will not conflict with any application you may be running.

Mark Enterprises makes no claim that PC Speak will work on any PC compatible. In fact, Mark Enterprises states specifically that PC Speak will run only on the IBM Personal Computer--which does, by the way, include the IBM PC/XT.

As far as I know, very little effort has been expended to get PC Speak to run with any terminal emulation system. There is some indication that PC Speak may work with the latest version of the IDEACOMM 3278 terminal emulation system.

PC Speak has a highly flexible system for controlling the communications path to the speech synthesizer. Using a control file read at program startup time, you can tell PC Speak to communicate with the synthesizer through any parallel or serial port; and if a serial port is being used, the baud rate, parity, and number of stop bits can be controlled.

PC VOICE: PC Voice is sold by ARTS Computer Products. The software sells for about $500 although you will probably want to purchase the speech card that works with the program. The card sells for around $200.

PC Voice is designed to work particularly well with a speech card from Artie Technologies called the SP200. However, you should be aware that the voice needs a lot of improvement, and the speech rate is still slower than other more well-known synthesizers such as Echo or Votrax.

PC Voice has a Review Mode which is entered by pressing the Fl function key. (Depressing the Fl function key twice in succession bypasses Review Mode and passes the keystroke to your application.) In Review Mode, you can specify that certain lines on the screen remain silent--that is, that lines not be spoken when data is written to them or when in Review Mode. No facility is provided to help you move the system cursor to the position of the reviewing cursor. In Review Mode, a "find" function is available to help you locate character strings on the screen.

PC Voice handles the speaking of punctuation by normally speaking all special characters. If you want this to be turned off, you have to enter Review Mode and enter a list of characters that PC Voice is not to speak. For external synthesizers such as the Echo or Votrax, PC Voice sends the actual punctuation character instead of converting the character to a word consisting of alphabetic characters. This means that if the synthesizer does not support a punctuation mode per se (which is the case with the Votrax PSS), you will not be able to hear special characters.

Upper case characters are indicated by the word "cap," and it is possible to detect upper case characters even when full words are being pronounced instead of spelled.

PC Voice uses a technique called "cursor tracking" to automatically speak data sent to the screen. Data can be spoken in character, word, or line mode. For example, if you are in line mode, nothing will be spoken until the cursor is moved to another line. In word mode, nothing will be spoken until the cursor crosses a word boundary (i.e., a space). This technique works for programs that use the cursor to write data to the screen. However, you should be aware that some programs send data to the screen without moving the cursor, in which case, nothing would be spoken. In fact, some programs remove the cursor from the screen altogether.

PC Voice is said to work on a number of PC Compatibles. ARTS Computer Products should be contacted for additional information.

No information has been provided regarding the ability of PC Voice to use terminal emulation systems.

You should be aware that PC Voice really does not function well with external speech synthesizers. In addition to the problem with punctuation characters (discussed earlier), the program has no facility to stop the external synthesizer from speaking. Most of the other packages have some method for doing this, but PC Voice does not. This is even more critical for PC Voice in light of the fact that it uses "cursor tracking," which means that more data is likely to be sent to the synthesizer than is sent by other speech packages.

SCREEN-TALK: Marketed by Computer Aids, this package is definitely the most economical. Dollar for dollar, you will get the most for your money if you purchase Screen-Talk. The program currently sells for $395.

Screen-Talk works particularly well with the Echo PC, the Votrax PSS, and DECTALK. Each of these synthesizers is equipped with an "instant stop" feature and the ability to vary pitch--both of which are used by Screen-Talk as a part of its normal operation.

Screen-Talk provides a Review Mode which is entered by pressing the Alternate key. In Review Mode, you can set up a maximum of three windows on the screen, and pressing a single function key permits you to switch quickly between windows. You can mark a specific location on the screen and jump to it with a single keystroke. You can read a specific line, and you can have individual characters pronounced directly or by use of the phonetic alphabet. At this time, Screen-Talk has no cursor joining capability. However, you can turn on a "talking cursor" that will enable you to hear characters that your system cursor lands on as you move it around the screen outside of Review Mode.

Four levels of punctuation are built into Screen-Talk: none, some, most, and all. The All Punctuation level permits you to hear characters not normally visible on the screen (e.g., graphics characters).

Screen-Talk uses a higher pitch to indicate the presence of upper case letters. Therefore, it is helpful for your synthesizer to support command sequences that can raise and lower the pitch of the speech.

Screen-Talk intercepts normal DOS function calls so that data sent to the screen via such calls will be spoken automatically. Programs that write data directly to the video buffer can still be used with Review Mode.

A number of keys are recognized by Screen-Talk outside of Review Mode. The Control-X is used to kill speech on the synthesizer. This means that the Control-X cannot be passed to any application you may be running. The Alternate key (normally used in conjunction with another key) is used to activate Review Mode. The Five key on the numeric keypad will tell you the location of the system cursor. This is not a problem for most applications because the Five key is normally regarded as nonexistent from a logical point of view. In Review Mode, the ten function keys, the cursor movement keys, and the home keys are used to control reviewing functions. In Review Mode, you can raise or lower the speech rate on the synthesizer and you can determine the status of certain keys such as the Insert key, Shift key, or Numlock key.

There is a possibility that the Control-X key could be a problem for some of the programs you might want to run. This is because the Control-X is never passed to the application program. However, the latest version of Screen Talk permits you to start up the program with a different "quit" key defined.

According to the vendor, Screen-Talk will run on a number of IBM PC compatibles. For more specific information, you should contact Computer Aids.

It has been learned that Screen-Talk works with the IDEACOMM 3278 terminal emulation system--or at least, with the latest version of that system. No data is available about how the program works with other terminal emulation systems.

Screen-Talk can be started up to send data to the speech synthesizer through any one of the serial ports. By using the DOS MODE command, you can vary other features of the serial connection such as the baud rate, parity, number of stop bits, etc.