By popular demand, here’s a summary of my experience with using Text-to-Speech (TTS) technology to record an audiobook edition of IrishFirebrands.

As a person with multiple disabilities, I’m acutely aware of the limited options for people like me. In addition, I wanted to make an audible copy of my first novel for my mother, who had gone blind while I was writing it.

My mother was an avid, eclectic reader, who amassed an enormous personal library, read to me from my infancy, and taught me to read. Cataract surgery restored enough of her sight for her to enjoy the landscapes visible from her windows, and to watch television, but because of eye damage from other causes, she can see only parts of pictures. Her brain makes Gestalts to fill in what’s missing, although a related disadvantage of that, is she also developed Charles Bonnet syndrome (visual hallucinations that can afflict sighted persons who become blind).

It’s also impossible for my mother to read large-print books or even magnified characters on screens, so for many years she’s had to rely on talking books from the National Library Service for the Blind and Physically Handicapped, at The Library of Congress. When I published IrishFirebrands, all she could do was hold a paper copy in her hands and admire the cover art.

There are many TTS software packages, most of which use a combination of operating system voices and proprietary voices from other sources. They cost a fraction of what hiring voice talent would cost, but even so, the programs are still too pricey for my nonexistent Indie budget. So I downloaded Balabolka, free software that uses a computer’s built in SAPI 4 or SAPI 5 voices. It reads text in 16 formats (including DOC, DOCX, EPUB, HTML, MOBI, PDF, and RTF), and records in formats with these filename extensions: .wav, .mp3, .mp4, .ogg, .wma, .m4a, .m4b, and .awb.

For the basic document, you choose one voice and set its rate, pitch and volume, but you can record different sections separately and combine them, and the software will also combine different recording formats into one audio file. You need to know how to nest HTML commands (for temporary changes to rate, pitch, and volume), but no other programming ability is necessary. Balabolka is supposed to be able to accept changes to its pronunciation database, and to add emphasis, but I haven’t been able to get those things to work, although that may be a limitation of the voices instead of the software.

Similar problems exist with Adobe Acrobat’s Read Out Loud utility, which uses only whatever built-in voices are available. This characteristic makes Read Out Loud of limited utility as an audiobook option, because the changes you make to the text to fix pronunciation problems for one computer voice, don’t necessarily work when the document is read by another person’s computer. It also has the annoying habit of reading everything on the page, including headers and footers, and it will pause at page breaks and the end of every line that terminates with a hard return. And depending on the PDF conversion settings, it may read aloud the punctuation, along with the text.

For best results in Read Out Loud, you have to strip out page breaks, headers, footers and apostrophes; then convert the file to PDF, using Standard formatting (no conversion alterations). When you listen to the PDF, take note of any additional pronunciation problems, fix them in your source document, and re-format. Anybody else who listens to the document must use the same voice preference settings you used.

From Text To Speech is a free online service, and you can save the files you record. It offers a selection of proprietary voices in American and UK English, as well as pronunciation for other major languages. The proprietary UK male and female voices that they use both sound good, with fewer mispronunciation problems, and the best ability to automatically add emphasis and interrogatory inflection. The drawbacks of the website include a limited number of voice adjustment options, it may be set up to periodically block the ISPs of frequent users, and the length of time it takes to generate an MP3 means it’s appropriate only for short reading selections.

After replacing the computer that I used to write IrishFirebrands, I discovered that the Windows 8 OS came with 3 new SAPI 5 voices: David and Zira (American English) and Hazel (UK English). Hazel is the only one of the three that automatically pronounces “Celtic” properly, with a hard C – but, oddly enough, she can’t say the name of my female main character, Lana. Although they’re afflicted with the same limitations of most other computer-generated voices (they don’t automatically elide, nor can they express emphasis and questions without help), their otherwise lifelike timbre made them a vast improvement over the SAPI 4 generation of voices.

Aside from difficulties due to hearing loss, I find most SAPI 4 voices impossible to listen to for any length of time, although some Sci-Fi writers may like to use them for their hollow, “robotic” qualities. In the Olden Days of cinematic and television sci-fi, it was assumed that robots would express themselves in flat, unfeeling tones – until the advent of the shouting, gesticulating robot in LostinSpace (“Warning! Danger, Will Robinson!”), who struggled with his emotions.

He was followed by the frankly psychotic HAL9000 (“I’m sorry, Dave…”). Eventually Droids came out of the closet with their feelings: in StarWars, a machine sounds like a man (C3PO and his many emotional meltdowns), while a man sounds like a machine (James Earl Jones’s sinister inflection, helped out with a SCUBA respirator, as Darth Vader). R2D2 still “speaks” only with beeps and boops, but his whistles and squeals are distinctly anthropomorphic.

Before starting on the recording, I had to learn how to use the voices at my disposal. To do this, I recorded a book trailer with a voice-over track. I used all three of the new voices, and MovieMaker software. The work took about a week.

On the basis of this virtual audition, and about six months of additional testing, I decided that I liked Hazel, the UK voice. To me, the enunciation of most British actors naturally sounds more clipped than that of Americans (who elide, or drop, most of their gerund Gs and many middle Ts, and soften lots of terminal Ds). Hazel uses non-rhotic Received Pronunciation (dropping Rs, or, paradoxically, inserting them where they don’t exist, such as between a word that ends with a vowel, and one that begins with one), but I was willing to trade the necessity of creating David and/or Zira’s endless elisions, for Hazel’s non-rhotic-English habits.

Since then, I’ve figured out how to trick Hazel into pronouncing some Rs, which has improved the clarity of a few words, but she definitely doesn’t sound Irish, because like most varieties of American English, Hiberno-English is rhotic: The Irish pronounce their Rs. But Hazel has learned a little bit of Gaeilge, with the help of the synthesizer at abair.ie.

I’ve learned to correct the multitude of bizarre mispronunciations that crop up unexpectedly, by creatively misspelling words, hyphenating syllables, running words together, changing pitch and speed, dropping terminal punctuation – and adding a few elisions. Unfortunately, there are very few changes that can be made with Balabolka’s global find-and-replace function: most of Hazel’s mispronunciations are dependent on syntax.

Many people dislike computer-generated voices, on principle: The owner of an audiobook hosting service refused to accept my recording, when it came out that I was doing it with TTS technology, even though many of the human-read stories on the site are badly performed or ill-recorded (e.g., sloppy diction, uneven volume, background noise, etc.). It’s also been difficult to recruit and retain beta readers, so I’m very grateful to those who have stuck with the project. Their feedback has been invaluable, while I’ve worked to whip the narration into shape. When it’s “as clean as humanly (and robotically) possible,” the IrishFirebrands audiobook will be available for distribution to the visually-impaired … beginning with Mama.

…

Readers and writers who decide to try Balabolka are welcome to ask me questions (in comments here, or via the Guestbook page on the Feedback menu) about specific pronunciation problems they’re encountering. I may have already found a tweaking trick that will work for you, too. And anyone out there who has some favorite fixes, please share them with us? No sense in all of us reinventing the wheel! Eyes – ears – even sanity – may be at stake! Thanks!

…

This blog post was recorded in Microsoft Hazel United Kingdom English, edited for rate, pitch, and pronunciation, using Balabolka text-to-speech converter. How many pronunciation edits can you find?

Thanks, Craig. Your encouragement means a lot. It’s been more than a year of screen-reading-induced headaches, and repeatedly tweaking Hazel’s more stubborn mispronunciations, until I’m ready to throttle her … but Mama recently turned 82, so I hope I make it, in time.

BTW, I’ve sneaked in some “eavesdropping” on “Panama,” and the setup is certainly suspenseful!

You’re welcome, Ali. I’d recommend that those who decide to use TTS for their audiobooks work on it part-time, so it’s not as tiring (I have a compelling reason to get it done as soon as I can). Writers with backlists can take their time. But an audiobook project could be made part of the proofreading of a new manuscript. One caveat: Prose for adults is written for the little voice inside the reader’s head, not to be read aloud. I know that flies in the face of the tradition of giving “readings,” but most writers who do that, find themselves sweating blood over annotating their selection with second-guessed punctuation, to make their presentation “sound good.” What sounds good aloud is not necessarily what sounds good to the little voice. This is especially true of TTS reading. So, if using TTS to help with proofreading, don’t use it to change the punctuation in the written version. It will goof up the silent reading experience.

Reblogged this on Chris The Story Reading Ape's Blog….. An Author Promotions Enterprise! and commented:
I know several blind and sight impaired authors and readers through my blog and I’m always looking for good text to speech information to share with them – and here’s the best so far 😀
BTW Authors – PLEASE remember to provide TTS on your books when publishing – and why not mention that fact in your book promos as well (HINT!) 😀

While built-in TTS functions are available in e-reader devices, and although any disability accommodation is better than none, I’ve read remarks about the poor quality of the playback. The nature of the criticism seems to suggest that the e-reader features are designed along the lines of the WYSIWYG functionality of Adobe Read Out Loud. The technology will undoubtedly improve with each new generation of voices, but I couldn’t wait that long!

So often I find that we’re thinking about the same things at the same time Christine. I’ve been thinking about creating an audio version of Jewel, and look, here you have given me all kinds of ways to do it – pre-researched and rated! You are the best!

Thanks, Angela! 🙂 Audiobooks go back a long way, with me, starting in my childhood, with vinyl recordings made by Alexander Scourby, Cyril Ritchard, and Hans Conreid. All computer-generated voices have their problems, but the newer ones are pretty good, and over the course of a book, they can “grow” on you.

One time-consuming aspect of prepping a manuscript is having to put in extra dialogue attribution tags, and trying to vary their placement. Another thing to consider is the length: my book is in the “epic” range, but Jewel is much shorter, so you might even consider recording with different voices, and assembling a true “cast of characters.”

Let me know if you decide to go forward with an audiobook – I’m sure it will be a success.

Hazel took some getting used to. At first, she gave me a bit of a headache, but after a couple of chapters, I actually began to like her. I do apologize for being so slow at getting back to you over these months. Unfortunately, life stuff has been getting in the way of other commitments.

Hazel does have her faults, but she means well, and if the road to hell is paved with good intentions, at least she and I will have each other to talk to! 😀

Life doesn’t pull punches. We roll with it. I appreciate every minute you’ve been able to spare for the audiobook, and your comments have been very helpful! Let me know if you still want chapters e-mailed, or if you want to try the cloud storage thingy.