YAPC was pretty fun. As usual I didn't get enough sleep.
My talk
didn't go nearly as well as I'd hoped. I need to bite the
bullet and actually practice these things ahead of time. I
did, however, control my slides with the power of my voice!

I am working on really fun stuff now, namely converting
cluster-based unit selection databases from Festival to work
with Flite. I'm
still managing to avoid writing any Scheme code, and have
ended up using Perl to parse Scheme data and write out C.
It works really well, so I don't care...

There's a lot of work to be done in shrinking and optimizing
databases which should keep my busy for a while.

As a consequence, I'm finally really getting acquainted with
the internals of the TTS engine. It's amazing how much more
straightforward some things become when they're expressed as
well-written code. Of course, I may not have been able to
understand the code as well without the prior explanations.

It's also deceptive because some of the algorithms are very
simple in implementation and "just work", but rely on a
frightening amount of signal-processing-fu or statistics-fu
to explain why exactly they just work. Unfortunately
there's a tendency in some papers and books I've read to
throw down the big scary math *first* and then eventually
derive the practical implementation.

Well, I was hoping to ride at least 50 miles today, but I
had to get some work done in the morning and by the time I
had finished being lazy around the house, it was 4PM, so I
had to settle for a slightly extended version of my daily
15-mile loop (10th -> jail trail -> greenfield -> beechwood
-> fifth -> neville -> boundary -> jail trail -> 10th) ...
this time
taking a detour through Hazelwood and finding a more
formidable hill to climb on the way out. The hills around
here never cease to amaze (and exhaust) me; guess I'll just
have to keep going up them until that's no longer true!

Having installed some voice-building machines and
discovering the inherent suck-factor of all existing
command-line audio recording tools, I just had to write yet
another one, which I managed to do on the spot in about 30
minutes. Now if I could just get around to fixing
POE::Component::Audio to do multiple channels and other
trivial things, I'd feel better.

Stepping on the scale this morning I noted that, after
giving myself an extra week of reduced eating to make sure
that I was really actually under 150lbs, I was, well, really
actually )well) under 150lbs, down from 180lbs three months
ago and
probably more before that... Hopefully being able to eat a
bit more often will improve my demeanor and mental
clarity... Now I just have to get some pants that fit.

Fun with Sphinx acoustic training. Boy, I wish I actually
understood the math behind all this stuff. I guess that's
what graduate school is for (most likely that's where I'll
be in a couple years time).

For now, though, I'm going to enjoy some hacking on the
implementation of it - the process (though it is inherently
massively CPU-intensive) is taking a lot more time than it
should and there are a lot of code cleanups to be done.

Computers are not really a hobby for me anymore. This is
unquestionably a good thing in terms of my mental health,
though I worry that I am not doing enough to maintain and
improve my technical skills.

It was only a few years ago that I was a young liberal-arts
student and self-taught hacker living off of student loans
and struggling for meaningful
employment, and I fear that, should things go wrong, my lack
of formal education and credentials will come back to haunt
me. I guess I can always go back to school.

In reality, things are actually going very well indeed for
me; I work for a company with sensible management and
finances, doing interesting work with real applications and
customers. It's just that I have learned to survive by
planning for the worst case scenario.

I'm worried about the
perception
that free software doesn't pay, and about the
possibility that the demand for Linux/GNU/Perl/Apache
expertise in the job market may be evaporating. I really
hope that this is not the case, if not for my sake, then for
the sake of people who might be in the same place I was a
few years back; bright kids learning and using free software
on their own time, who should have their efforts rewarded
rather than repeatedly running into the "no BSCS degree, no
job" barrier.

On the other hand, I am certainly glad that there will be no
more bogus companies burning cash like Dubya, Cheney, and
their oil buddies want us to burn oil, nor instant
millionares and lazy people expecting huge payoffs. And
speaking as a Canadian ex-pat, I'd be happy to see the
outrageous salaries paid in the US be "corrected", so that
perhaps the rest of the world will get a chance to keep its
best and brightest.

Anyway, as I was saying, I've been splitting my spare time
between my old and
neglected interests of cooking, cycling, and homebrewing,
and if you'll excuse me, I've got some bread to bake.

I have volunteered myself for a million little projects with
unspecified goals and deadlines, and am consequently getting
very little actual work done on any of them. I miss my old
project manager...

On the bright side I'm finally fiddling with the voice
building tools in greater depth, since we're actually, like,
building voices now. I went into the studio to record some
unit selection prompts and I'm slowly working on a French
diphone voice (if I ever get the diphone list generation and
letter-to-sound stuff done...)

Arr. Once again I am in computer telephony hell. In the
hopes of achieving reliable echo cancellation and
full-duplex performance, we acquired a very expensive
Dialogic PCI telephony card and the package of (stupid,
proprietary, gross) Linux software support. Well, it
appears that the software support ... doesn't. And the
documentation ... doesn't.

And as far as I can tell, Dialogic doesn't actually want to
support people who try to use it. Their customer support
e-mail directs you to a web forum or a pay-per-incident
service. After paying $BIGNUM for this piece of crap board
and proprietary crap software, the least they could do is
write some bloody documentation for it that is good for more
than toilet paper and give at least some complimentary
e-mail support. Fuck you, Dialogic.

Surveying the landscape of computer telephony fills me with
a deep feeling of helplessness and despair. All the
software is broken, all the hardware is obscure as hell, it
all costs unbelievable amounts of money, and it never seems
to work correctly. And when it looks like I find a board
that is actually useful for my applications, it turns out
that it only works on Windows NT. (come on, people, you
could at least support Solaris or some kind of
semi-reasonable proprietary OS).

It looks like the Quicknet/IXJ stuff, as quirky as it may
be, is the best thing there is for Linux and open source
telephony at the moment. I hereby take back all the bad
things I said about them :-)

Oh yeah! That's right! We released
some speech stuff a few days ago. Please beat on it,
etc. I'm mostly working on higher-level things at the
moment (i.e. actual applications that use these modules) so
these should be stable for a while.

In other news, Sphinx2
0.3 was put out with little fanfare.
So you don't actually need to use the CVS version to
compile Speech::Recognizer::SPX. Which reminds me, I should
probably update the README.

I've already been asked if this stuff works on Windows, I
wonder when someone will ask if it will be rewritten in
Python...

I'm playing at rewriting the silence filtering library.
Handling all the audio buffering and framing safely and
efficiently is
unfortunately trickier than it looks. Pointer arithmetic
is HARD, let's go SHOPPING!

Playing with the iPaq. Wrote some nice feature extraction
code for FPU-less machines. Somewhat unsurprisingly, it's
three times slower on my P3 laptop. (but 80 times faster on
the iPaq)

Discovered a nasty bug in the iPaq audio driver, leading to
a kernel oops. Predictably, select(2) was broken.
Sent off patches. No reply. You'd think that a patch that
changes a grand total of 4 lines, and fixes an oops, would
get a bit of priority in people's mailboxes. You'd think wrong.

Oh the PLE133 is really lovely. So, to reduce costs, it
seems that VIA decided it would be clever for it to
only support PC133 memory, and not even all PC133
memory (it has to be CAS3).

I discovered this since I finally broke down and got a
better motherboard for my home box and thought it prudent to
reuse the old parts for a firewall box at work ... several
sticks of SDRAM later and I finally have a working machine
built around this thing which seems to be way too
overpowered to serve as a firewall. <sigh>

What else ... well, various stuff. Having more fun with
audio. I was thinking of presenting on speech and Perl at
YAPC but I might just do audio and Perl as there is more
than enough there to take up 45 minutes.

It really seems as if you cannot win with Linux audio
drivers. If they don't support setting the fragment size,
then you throttle with SNDCTL_DSP_GETOPTR (which is possibly
a better idea anyway). But then you discover that lots of
drivers don't support that either. Does anyone actually use
this stuff for anything besides playing MP3s and Quake?

Made up a summary of all the issues with the telephony
drivers and sent it off. Now I'm waiting for a reply, and
have ended up debugging sound drivers instead.
select(2) (and obviously poll(2)) breaks
in interesting and different ways on different drivers.
Somehow I am not surprised.

On a related subject, the VIA PLE133 chipset is really crap,
and I suggest avoiding it. The integrated video won't do
over 1024x768 without massive noise in the image, and the
on-board audio uses one of those lame-o AC97 codecs that
only does 48kHz. Suck.

Of course, if people who wrote sound applications actually
knew that SNDCTL_DSP_SPEED returns a meaningful value in its
argument (in particular the people who wrote the OSS backend
for libao), that would also be helpful.

I guess I should probably just install ALSA on that box
since its library will do all the necessary sample
conversion. The kernel's VIA audio driver is really nice
otherwise though, so I kind of fear what might happen.

In general, though, ALSA seems to have evolved to the point
where it does a better job of OSS audio than the actual OSS
modules. This is fairly impressive.