flite_hts_engine is a reasonably good quality Text To Speech (TTS) synthesiser in an incredibly small package considering it includes the voice data (approx 1.6MB download). It is a special research patched version of flite from:

This version of flite_hts_engine has been compiled on DebianDog_jwm wheezy; dotpet packaged using Puppy Slacko 6beta. I have patched it to output to stdout so that it can drive, for example, aplay via a pipe. I am still working on the code, but will upload a stable version of the patches once my testing is complete.

As provided here, this program, buffers and plays a maximum of 1024 characters, but the program could be compiled to use a larger buffer, albeit with a larger processing start delay. That is a disadvantage compared to standard flite, which has virtually no startup delay or buffer limitation. However, standard flite has much poorer voice quality. The previous version of flite_hts_engine, which I compiled and released in 2009, tended to garble certain combinations of text, but that problem happily seems to be resolved in this later version, though more testing is required to confirm that.

The provided package include flitet, flitetf, flitet2wav, and flitetf2wav. These are very small exemplar shell scripts which the dotpet installs in /usr/local/bin. They are intended as simple exemplars only, as an aid to developers wishing to create their own more sophisticated driving scripts. However, these scripts are perfectly usable as is. Open them in any text editor to see brief, simple, usage instructions, which are repeated in briefer form here (or enter, for example, flitet --help):

Usage:

Example of use: flitet "hello world" "See you later" | aplay

Example of use: flitetf infile1.txt infile2.txt | aplay

Example of use: flitet2wav "hello world" "output.wav"

Example of use: flitetf2wav "infile1.txt" "outfile.wav"

flite_hts_engine is a special version of flite which has been patched for use with HTS voices. The HTS voice provided in the attached package is trained using Hidden Markov Model from the CMU ARCTIC database. Festival TTS could also use CMU ARCTIC, but that required a 100 MB download of the voice data... However this technique of using HMM results in pretty much as good quality with less than 2 MB of voice data (uncompressed)! Note that there quite a number of alternative HTS voice data sets now available via links from HTS website and by googling. I haven't tried these as yet, however, and don't know how easy or possible it will be to get them to function with the supplied scripts. Some relates HTS projects look interesting, however. For example:

Took me a while getting this all together into a 1.4 MByte dotpet, but it is now done: attached to the first post in this thread. A small download for such relatively good speech quality.

Note that this is very different from standard flite. Not only is the quality of speech much better and more natural than that produced by standard flite or espeak but the usage of the main executable (/usr/bin/flite_hts_engine is quite complicated in that it involved many options for feeding in the speech parameters. That's why I've supplied the four small scripts (/usr/local/bin/flitet2aplay etc).

You should examine the contents of these scripts if you want to see how to use the main executable on its own, or in your own scripts etc. I didn't find a way of piping the output of flite_hts_engine to aplay, so resorted to using a temporary output file (/tmp/fliteout.wav). Anyway, these small scripts are just simple exemplars, which you are welcome to improve upon.

You might also find the Documentation file for flite_hts_engine a worthwhile read:

The bad news is that I made a wee mistake, in the first upload (flite+hts_engine-0.91.pet) so relatively quickly (I"m on dialup) re-uploaded the fixed version: flite+hts_engine-0.91.b.pet

Actually, the main program was fine in both, but I used a $1 is the small script flitet2aplay when a "$@" was much more flexible, rendering it easy to use as a mod to technosaurus's excellent hotkeys-0.1.pet for text2speech. To do that, simply change technosaurus's code for /usr/bin/text2speech to:

EDIT: Altered the code below so it works with the latest flite_hts_engine...mce... dotpet downloadable from the first post in this thread.

Notice I just commented out the seamonkey bit, though you could delete that line altogether if you wish (thought I'd leave it simply commented out so that I could experiment with both possibilities).
[Note that the # seamonkey -remote "openurl(http etc...)" must be all on one line (the forum makes it look like two lines].

Remember, the above mod to technosaurus's /usr/bin/text2speech script, requires my fixed dotpet: flite+hts_engine-0.91.b.pet, which is the one attached to the first post of this thread (the b tells you that you have the right one! :-)

EDIT: The following are no longer required, since the latest flite_hts dotpet can output to stdout and thus straight to aplay via a pipe as in the code above.

(no longer required stuff follows) EDIT: You can now avoid using a temporary .wav file in the above by using the new fifo version of the flite_hts to aplay helper script: flitet_aplay.

Excellent! I had a play with it the other day. I found some other voices that work with it as well. I was dissapointed that it doesn't directly speak from the text (I mean that it has to be outputted to a wav and then played. Maybe we can get one of those studenty that developed it to patch that capability in there. What they appear to be doing is using the flite text-to-speech engine, but then coupling that to another voice internally before finally outputting it as a wav. flite itself can output wav files instead of speaking directly, so there must be a way to channel the output from the hts_engine directly to the audio device -do I hear MU rustling about and looking into this??? Hint, hint

I've been using flite for a long time and always wanted nicer voices -I have my drag-n-drop source-to-package program notifying me with spoken messages -which can include the name of the package, but it would be un-handy to have to output and then play a wav file.

I've never gotten around to actually trying espeak and don't know how space it needs, but flite is a thousand times smaller and easier to get going than festival. With these hts voices we can have the same voice quality now as festival -even though it is a bit bigger than flite by itself. i fonyl we could figure out how to get flite or flite_hts to load voices on-the-fly.

Your scripts are great anyway -that's exactly what I was going to do with flite_hts -but you've saved me a bit of time

Another thing to work on is finding out how to create even more voices in different languages -it's all very compley for my old brain. If I were younger, I'd probably be studying working in this field, though. It makes rocket science look like kids-play. text-to-speech s the easy part -it's speech-to-text that is a bugger!

Edit: I just listened to the amigo.wav file... you'll get a more accurate rendering of my voice by playing that text directly with flite LOL.
I call the standard flite voice 'Bruce' as it sounds a lot like the voice(named Bruce) I used to have on my iMac using some t2s program I found.
MU will get a kick out of this: since I am in Germany, I tried to teach Bruce to speak German by feeding it 'weirdly' spelled English. For while we had a recording of Bruce on our phone answering machine -I made him speak German but with a distinctly 'Ami' accent. ROFL

New dotpet uploaded, and I've changed the name to get rid of the plus sign, so now: flite_hts_engine-0.91.c.pet

amigo wrote:

there must be a way to channel the output from the hts_engine directly to the audio device

Yes, I wish that were inbuilt too, but in the meantime I've provided a script workaround in the new dotpet: using a fifo (named pipe) to communicate between flite_hts_engine and aplay.

It installs an additional two new tiny scripts in /usr/local/bin:

flitet_aplay and flitetf_aplay, where the underscore indicates that a named pipe (/tmp/flitefifo) is being used for the communication between flite_hts_engine and aplay (I've left in the flitet2aplay and flitetf2aplay as an alternative method).

Yes, an important next stage is to make it easy to add in different voices, including international ones ..._________________Non enim propter gloriam, diuicias aut honores pugnamus set propter libertatem solummodo quam Nemo bonus nisi simul cum vita amittit.

Updated the dotpet to version 0.91.d (I had made a scripting error in 0.91.c sorry). Refer to the first post of this thread for changes, usage details.

EDIT: I note that flite_hts_engine seems to have a bug in that, it terminates on reading a txt file as soon as it encounters a carriage return. I'll have to double check if that is true. I don't think the problem lies in my /usr/local/bin/flite... scripts (but I'll check that too!).

Using flitet_aplay or flitet2aplay with text from the clipboard [called up by Alt-l (i.e. Alt-small L), for example, from technosaurus's hotkeys-0.1.pet text2speech script, doesn't seem to choke on any embedded carriage returns, which is currently mainly how I am using flite_hts_engine: http://www.murga-linux.com/puppy/viewtopic.php?p=365966#365966)

Great! I had the same idea of using a fifo to be able to play the output directly -glad to hear it works. That should make it much easier to stream long teyts -I mean if you were converting long texts it could be a problem with both time and disk space.

I've been trying to get some other hts voices to work, but they seem to be made for festival, so no luck yet...

Thanks very much for versioning your pets I missed c,d and e overnight, but at least I know that today's version is different just by glancing.

I've discovered a current limitation; flite_hts_engine will only play back a text of maximum size 1024 characters, which is fine for some applications, but not so good for reading books. For texts larger than 1000 characters in size you'd need to write a program to send the text to flite_hts_engine in chunks, and for best effect the chunks should stop on word boundaries... [for example, send a chunk, and as soon as aplay closes, send another chunk to reopen aplay etc... Personally, I'd do that in C since easy to make aplay block, but then, it might be easier to modify flite_hts_engine such that it doesn't have the limitation and also produces output to stdout so that it can be piped...] That nuisance limitation creates a nice programming project for somebody!

If I am wrong about this limitation, please let me know - but 1024 text characters is the maximum I get processed by flite_hts_engine at a time on my system.

EDIT: On further thought, whilst it would be relatively easy to write a short shell script or C program to send blocks of words to flite_hts_engine, the problem would then be the wait involved in it processing the new block prior to sending it to aplay - that would be a painful delay! So looks like we have to accept the 1000 char limit for now or dig into a bit C programming/modify flite_hts_engine in the hope of removing some of these limits (the -o output.wav bit shouldn't be too tricky at least). Alas, I have no time for that for some months due to selling house and international relocation - but I'll check back later to see if the problems have been resolved meantime._________________Non enim propter gloriam, diuicias aut honores pugnamus set propter libertatem solummodo quam Nemo bonus nisi simul cum vita amittit.

I think this is probably the cause of the 1024 character to speech limitation:

#define INPUT_BUFF_SIZE 1024

It is near the top of flite_hts_engine.c (in the bin folder of the untarred source code). So easy at least to increase the INPUT_BUFF_SIZE and recompile. I'll try that as a test, but leave the dotpet as it is for the moment.

Posted: Fri 27 Nov 2009, 07:06 Post_subject:
new version on the way soon - won't need a wav or fifo...

Yes, increasing that buffer size to, for example, from 1024 to 2048 does double the amount of text spoken before the program shuts down. Actually, I couldn't resist going a bit further and made a few other changes to the C source code such that, on recompiling, I can now do the likes of:

flite_hts_engine the_voice_parameters... "hello world" | aplay

i.e. in the new version I am working on you don't need a wav file or a fifo all - so that bit solved (though some tidying up/extra code to finish off before uploading the new dotpet and scripts). I'll C code it in such that you have a choice of a wav file or use a direct pipe to, for example, aplay... but I don't really want to change the buffer from 1024 since the processing delay before it starts playing becomes annoying (on my machine anyway).

@amigo:
How long does a big chunk of text (over 1024 chars) take to start playing (using say flitetf_aplay) on your machine (and what specs has your machine?)

I've been playing with this too. I wrapped the existing scripts into a single app which (trys) to make sense of any options given. I#ll post it below.
But first, flite doesn't have any limitations about file size and starts streaming the output right away AFAIK. Of course I realized right away that the wait time would be long for larger texts using flite_hts_engine, but I hadn't found the limit to total text size -even though I thought I was feeding pretty long text to it. It had me wonderinf how we could stream the input into the program, but my first attempts at using straight cat without the echo were not working. This:
echo -E $(cat "$@") converts the text from files into a single long line.

I think you might find it easier to get flite_hts_engine to output to stdout than trying to 'chunk' the input. A look at the code for flite itself should give some clues on how to do it since it can either write to a file or output speech directly.

I'll get back at your later with my results -I'll need to to put together some example text with measured length.

I'll be wanting to include your Copyright info, so email me and give me your name, please
amigo AT ibiblio.org

I'm going to try to contact the devs through their sourceforge mailing list or forum and see if I can find out how to convert other voices -as you can see I've already built in tentative support for using other voices.

Can you post your changes to the code as you go along -it doesn't matter if it isn't cleaned up. A diff would be best, but you can just send me or post the altered files if you like. This is the best thing since sliced bread or drop-back drawers!

It was easy. I just left the buffer at 1024 for now though, since in my experience flite_hts_engine tends to garble text quite frequently when it comes across certain combinations of words, so long files not so good anyway, but I'll take a poll on that. Can easily compile a larger buffer version if wanted.

I only needed to make one change to allow the output to go to either stdout or to a wav. Just needed to set the wav file descriptor (wavfp) to stdout as its default state, in the attached (modified in that way) source file: flite_hts_engine.c (attached as a tar.gz)

I should make a diff, I know, but I'm tired and need to sleep...

Read the first post for new usage instructions. Clearly the scripts have radically changed (only 4 helper scripts needed to get going now...)