[nvda] Re: SAPI, was: Re: Re: IRC Clients, Anyone?

From: "John Greer" <jpgreer17@xxxxxxxxxxx>

To: <nvda@xxxxxxxxxxxxx>

Date: Wed, 2 May 2007 10:18:11 -0500

OK well what was Via Voice then. Was it SAPI 4? Isn't Eloquence based off
of the same thing? If you would have noticed I said in the thing it was
based off of SAPI not that it was. And, yes some hardware synths do use the
SAPI framework. SAPI after all is a com interface, it is basically the same
as the com port that is needed to make a hardware synth talk. I did not
post what I did to make you angry, it was not an answer just for you. There
are alot of people out there that don't understand why SAPI is slower than a
hardware synth and I thought I might explain it a little. No it did not go
into any detail of that what SAPI is doing is like the cable that is
attached to your hardware synth only it is done in software. No, I did not
go into the detail of when you have recorded speech it is slower because it
needs to access the harddrive in order to load the voice fragments in memory
then process it into speech and that equals lag time. But I did not intend
to post a college thesis. I only wanted to explain that when people say
SAPI is slow blah blah blah I wanted people who are not computer programmers
to understand why. So, if you want to start a movement to eliminate SAPI
doesn't it help if even the people that are not programmers to know why?
Isn't it helpful to some to know why their AT&T Voices are sluggish and
can't be understood at 1000 words per minute? Most people that are not
programmers couldn't care less about the technical jargon. They just want
it explained to them why it doesn't work for them. So yeah why not tell
them a few reasons why some voices are better than others rather than just
because SAPI sucks, because that is only a matter of opinion. Personally I
don't mind SAPI because it makes it possible for me to have speech on my
computer and I don't have to spend hundreds of dollars on a hardware synth
that in my opinion sounds like a robot from the 50's movies. But yes,
Eloquence was based off of the SAPI standard. Eloquence is essentially the
same thing as Via Voice the difference is that FS decided to use their own
proprietary com for it so that they could get paid for selling it to people
instead of a person being able to use it in any application they choose.
Now please forgive me because I am not flaming hardware synths it is just
that I have not come across one that has the same voice quality of RealSpeak
etc. I have not come across one that I can get for free either, so I
continue to use SAPI and continue to wait til that first free hardware synth
with a voice like RealSpeak comes along. So yeah if there is something
faster than SAPI that has voice quality like RealSpeak Daniel let's hear
about it. Anyway, the original thread was about a guy asking how to add
speech to mirc and how it got off on this I will never know.
----- Original Message -----
From: "Samuel Proulx" <samuel@xxxxxxxxxxxx>

Please try and keep your facts straight. Most hardware synths do not use
SAPI, and require individual interfaces or drivers (dectalk, for example,
uses DAPI). In SAPI5, the programs must pass a rate value between 1 (or
0?) and 10. This means that most engines can't go much over 300, because
they all have only 10 values for speed, and so work with the slower
speeds. Jaws does not, in fact, use SAPI by default. It ties directly
into eloquence, for exactly the reasons I mentioned: SAPI is slow and
buggy. Same with Window Eyes: by default, it uses DAPI with dectalk.
Supernova uses speech access manager, and that ties into Orpheus by
default, rather than use SAPI. The only screen readers I am aware of
using SAPI are Thunder and NVDA. Kurzweil uses SAPI4 by default, but 4
was a slightly better standard in that it supported passing of rate in
WPM. I am well aware of how synths work, and that is why no concatnitive
synths are installed on my computer; they are too large, and much too slow
to get any real work done.

John Greer wrote:

SAPI itself is a standard. If not for SAPI which stands for Speech
Application Programming Interface, your screen readers would not talk at
all. Even Jaws uses SAPI. Yep even Eloquence is based off of the SAPI
standard. What SAPI itself does is it gives programmers a standard
interface to write programs with so they don't have to add a million
additional lines of code writing some proprietary synth or synth driver.
Yep even your hardware synth that is able to speak at 2000 words per
minute uses SAPI to be able to know to speak 2000 words per minute.
Frankly I am amazed that something like ESpeak exists at all. My hats
off to the developer. Now like anything else in life there are things
that we like and things that we don't like and if one TTS engine is not
what you like then try another. But here is the reality of the
situation. Many of the software synths like RealSpeak are done using
concatnitive speech. What that means is that it is real human voices
chopped up and put back together according to the speech rules programmed
into the engine itself. How fast or slow a speech engine responds
depends on alot of factors, the quality of the recordings, the size of
the sound files and the speed that the engine itself can handle and it
still be understandable speech. Now the goal of concatnitive speech is
to make speech as close to human sounding as possible. I ask you when
was the last time you heard a human being speaking at 2000 words per
minute. If you even have, did you go huh? at some point? Now in order to
make a speech engine that talks at 2000 words per minute and be perfect
the speaker making the recordings has to be able to speak at 2000 words
per minute. If they can't it is going to introduce artifacts, in other
words things that don't sound natural. Now there are also those engines
that are not actual recorded speech like Eloquence or Via Voice. But in a
small way they are still based off of concatination. The way they work
is instead of having actual recorded speech they use a tone and
manipulate the frequency of that tone in a manner that sounds like speech
depending on the phonetics of the language that it is trying to emulate.
Then through the engines programming puts those tones together through
concatination in a way that sounds like words and phrases. That is the
reason why your non recorded speech synth is able to sound like
fjklirtujigjdfitjdfitjdfitrji is because it is not dependant on a human
speaker. Now all that SAPI is able to do is to take that engine that
sounds like skdlrtisutjdfkfsjtdfitgjdftkdfjtdift and make it possible for
screen readers etc. to use it. All that SAPI is able to do is to give a
framework to be able to create an engine that goes
foitriujtjdfitjdritdjtiruje. So I say if the engine you are using is not
understandable at a faster speed, slow it down to the speech rate that it
was intended to be at. If the engine you are using is not responsive
enough find someone that can speak at 2000 words per minute and get them
a job at RealSpeak or AT&T or Neo Speech. If that still doesn't work
invent a faster programming language than C++ or use more memory, or a
solid state hard drive etc. etc.

I second that motion; developers: stop it with the SAPI 5 stuff, please.
I look forward to the day when (if?) NVDA and espeak get things sorted
out so NVDA can use the espeak dll directly without going through SAPI.
SAPI is buggy, slow, many "SAPI" engines don't fully support the
standard, SAPI crashes a good deal, it's easy to break (a system where
one new voice can bring down SAPI entirely is not a good system), I have
never encountered a SAPI5 voice that could go fast enough for me (I.E:
over 300 wpm), and SAPI will never do braille or hardware synths or
anything else. I'm not trying to flame, SAPI can and should be used
when no other options are available (as in NVDA's case right now), but I
think we as blind users need to take a strong stand against SAPI in
applications like games and software plugins, because if we don't it
will soon be all we have left. I think the solution is to begin
pressuring screen reader companies to come up with, and all follow, some
kind of a screen reader interface standard so that an application can
send text to be spoken with one clearly defined function, and it will
get spoken by the screen reader and/or displayed in braille, no matter
what the current reader may be. Perhaps as an open source solution,
NVDA should set the standard?

Jim Grimsby JR. wrote:

That is a bad idea. More and more stuff talking to sapi.also what
about
Braille support. At least with direct interfacing with a screen reader
the

Braille and speech can be produced by the screen reader to the needed

device. An example of this I was completely deff the other night. If I
was
not using a version of mirc that interfaced with my screen reader I
could
not have used my Alva and then I would have not been able to get on irc
at
all. Well I could because I got my mirc jaws scripts but I think you
got the

Thanks a lot everyone. tIRC is exactly what I was looking for. It
integrates much better with MIRC than I expected.

By the way, I would imagine changing the com code would also allow us
to get All inPlay games working directly with NVDA, although we're
leaning toward getting away from direct screen-reader interfaces and
just going with SAPI 5.

Cheers,
Tim
ace wrote:

Oh but actually using IRC under a UNIX shell is probably less

cumbersome than using a Windows client, if you have a login somewhere.
IRC is a straight text protocol so really a lot of the Windows clients
are kind of too advanced in my opinion. I do, however, use mIRC and
have developed a script for it to interface to JFW, Window-Eyes, SAPI,
and Eloquence. It is unfortunate that I can't interface to NVDA at
this time but as soon as that is available I will explore it.
www.talkingirc.net

ace/Robby
Tim Keenan wrote:

Greetings All,
I'm going to need to be able to proficiently use an IRC client for my

job coming up. Has anyone had success with any particular one under
Windows?

I've tried quite a few with very little in the way of usable results.

I used to use Irc under a Unix shell, but that's not quite as
practical nowadays. Any suggestions would be much appreciated.