Technological Surrealism / Alec Nerds Out

Gerbil Vest: A Crude Lapping Generator

January 16, 2013

I’ve continued to tinker with automatic song generation since I posted Nickelmatic as I had a nagging sense that it was shooting fish in a barrel. The project I was intending to follow it suddenly started behaving frighteningly like Scott Walker when I was expecting Gordon Lightfoot — hey, you never know what direction these things are going to take — but it also led me in a new direction that has temporarily taken precedence.

I was trying to solve this problem: how can I take the endless and semi-comprehensible-at-best output from a Markov text generation algorithm, and filter it down to a subset that could plausibly serve as lyrics? I started working with syllables and emphasis, which quickly led me to the Festival speech synthesizer. This is capable of dictating audible speech from any piece of text, along with all the disambiguation that entails. (It can also sing — but more on that later.)
I’m a dedicated punner, so being able to split arbitrary text into its constituent chunks of audio (“constituent” into “k ax n s t ih ch uw ax n t”, for example) got me thinking: this would be a pretty comprehensive way to generate spoonerisms (famously, “the queer old dean” / “the dear old queen”) independent of all the quelling spirks that English has to offer. As long as Festival knows how to say it, we can spoonerize it: just try swapping a few of the initial segments and check to see if the results are words.

For example:

Cunningly Strapped

Verbal Generation

Wickedly Funded

Happily Crude

Word Tease

Technically, this involved making at least a little bit of sense out of Festival. Not trivial, since the project appears to have been dormant for years (not much help when you’re trying to ask questions) and most of it is written in Scheme, a language only a deeply masochistic logician could love. (That’s probably a redundant subclause and might earn me some unhappy words from masochistic logicians.)

This lists the words, then the number of syllables with stress information (I’m still trying to figure out how to dump the actual syllables), then finally the segments (“t-eh eh-k k-s s-t t-t t-ax ax-p p-aa aa-r r-s s-g g-ow ow-z z-hh hh-ih ih-r”) and frequency (pitch) information. And more, not included here.

For our purposes, we just strip out what looks like a segment and build up a list of these for each word. That’s the hard work and the rest is just a bit of brute force to determine what combinations work and what’s just nonsense.