Java development with an iPhone touch pad for the Atari 2600 from an urban hip-hop perspective

Audio Programming

Yesterday, in my arrogance I spoke about some old UDP audio streaming logic I found online yesterday. I suggested that I could see where the choppiness problems were and fix them. Today I couldn’t help myself. I dove in and ran the code between two Mac computers to hear how it sounded. I was amazed that the code worked at all without any modification! Hi, I’m Cliff. You’re here because you wanna hear how it sounds to break your voice up into tiny little ones and zeroes on one device then reassemble them on another. I’m here because I’m just fortunate enough to be able to do this task. Picking up from yesterday I can proudly announce that I have indeed solved the mystery of choppy audio! It took about 20-30 minutes of copyin/pasting the original code and tweaking it to sound natural. My goal is to give you the answer to the mysterious choppy problem but…! Before I do so, let’s try to understand what the code is actually doing. There are two components, a server and a client.

The Server
We’ll Look first at the server. It starts off in the main() method and prints information about the audio system’s connected devices to the console.

These API calls [above] create a special DataLine.Info object instance which describes the format of the audio we wish to capture from the microphone. We call a getAudioFormat() method where the details of the audio format are encoded and returned in a special AudioFormat object instance. Without going into too much detail, understand that audio can come in many shapes and sizes depending on if you want stereo, mono, high fidelity or whatever. The AudioFormat class models things such as sample rate sample size, number of channels, etc. The format is given to the DataLine.Info constructor which creates an instance that we pass to the AudioSystem API via the getLine() method. At this point we have a connection to the microphone which we can open() and start() to capture audio samples.

How Audio works
For those of you who are very green to programming and/or audio I’ll explain briefly how audio, or sound in general works in computer systems. This all may seem pretty complicated but audio is one of the simplest concepts you can learn and is actually pretty cool when you understand it’s basic building blocks. Sound is merely the vibration of matter which our ears detect using little tiny hairs. The speed of the vibration makes the different noises that we hear. The vibrations come in various wave forms with different frequencies. Computers record and create sound using a process called digitalization. This is where frequencies, which are analog in nature, are converted to and from digits. It captures digital audio by using a microphone which is a piece of hardware that measures vibrations, or wave forms in the air. It takes a series of samples of the intensity of the wave at specific intervals and it creates or synthesizes audio by sending recorded samples to a speaker which includes a paper cone that vibrates based on the size of the digits in the sample. In short, the bigger the digits are in the sample the more intense the paper cone vibrates. You can think of the digits 16 as causing a small vibration where the digits 128 would cause a much more intense vibration of the paper cone inside the speaker. If a bunch of samples are sent quickly to cone the vibrations happen quickly, if they are sent slowly then the vibrations occur slowly. The combination of the speed and intensity of the vibrations of the paper creates noise or sound that we hear. I’m over-simplifying and glossing over a lot but that is the basics of sound. The samples and sample speed are the key to sound!

Here we see a byte buffer is created (with a strange not-round length of 1000 instead of 1024) and we enter a loop where we continually pass the buffer between two methods, targetDataLine.read() and sendThruUDP(). The first method reads a block of samples from the microphone, which (as described above) measures the vibrations in the air and writes these samples to the buffer. The second method sends the buffer of samples over UDP to the client.

The Client
We’ll now turn our attention over to the client. The client is a RadioReceiver which extend Thread. As it turns out, this is unnecessary complexity as there is no work being done other than playing back the captured audio samples. We can safely ignore the Thread part and pay attention to the code inside of the run method, which is invoked indirectly by the call to r.start() in the main method below.

The run method below declares a byte array which is used in a while loop. Inside the while loop we load the byte array variable with the result from the receiveThruUDP() method. This method attempts to capture sample data sent over the network and return it to the caller. In short the byte array is loaded with the samples captured from the network which were originally captured and sent from the server. We then pass the array of samples to a method called toSpeaker. This method eventually hands the samples to a Java API called SourceDataLine.write(). This Java API will eventually send the samples to the speaker which causes the paper cone to vibrate and recreate the sound on the client. See the run snippet below:

That’s the basics of how the whole operation works. There’s not a terribly large amount of code and I haven’t described the UDP networking pieces at all. I’ll continue this series explaining UDP next. in the mean time, keep looking to see if you too can figure out where things are slowing, **ahem**, chopping up. Keep in mind how I explained the key components of audio, sample intensity and the frequency, or speed of the samples.

I posted a question on StackOverflow a while ago trying to learn the easiest way to get MapQuest to give you turn by turn voice output from Optimus Prime. Yesterday I saw a recent comment inquiring whether or not I had a solution. Today I have a possible solution. I found an open source vocoder written in C (Zerius Vocoder) that I’ve managed to build/run on the Mac. It works pretty good, I must say. Hi, I’m Cliff. You’re here because you obsess over little robot toys that fold into/out of vehicle form. I’m here to report status on what could be a nifty little iPhone app.

Imagine for a moment that you could plug a vocoder into the pipeline of CoreAudio’s queue services API. (Imagine that there were such a pipeline!) That would allow all sorts of interesting voice output! So far I only have a compiled binary with sample files and a rough understanding of what vocoding actually means. Obvious next steps are to finish my Speex tutorial series and refine the hideousness of iPhone audio APIs. I’m really thinking of using pipelines similar to how GStreamer works. If I get distracted reading that link to GStreamer I may end up on a tangent, trying to port GStreamer to CocoaTouch. That’s all for now, party people. I’ll hit you back when I actually have something tangible to share.

So I finally figured out how to install snack on OSX. I went down many different paths and ended up building from source and getting lucky. I’m sorry, what is snack? Apparently it’s some audio tool kit for scripting engines like Tcl, Python, and Ruby. Woah, I’ve probably lost you. Let me back up. Why am I messin’ with Tcl/Tk and snack? I’ve been trying off and on to build a custom voice for MaryTTS. It’s sortr of a pain b/c I don’t have time to mess with this in the office. I’d like to experiment at home but my development linux install is in the office and I haven’t figured out how to get audio over remote desktop on Linux yet. (I’ve gotten close but that’d take a whole ‘nother round of experimenting!) So I’m running OSX on my Mac of course for my MaryTTS tinkering but it’s difficult to get things intended for Linux to work the same way on Mac. I’ve installed Mac Ports which dumped a bunch of stuff somewhere on my hard drive. I managed to get sox from it. Wait, let me start over again! Hi, I’m Cliff. You’re here because you were having the same troubles installing snack on your Mac. I’m here because the prior sentence creates an interesting rhyme.

In experimentation with Mary custom voices I’ve learned that you can spend an entire month just getting the required dependencies for using the voice import stuff. The list of required tools is huge and I wish it could be simplified. To add on that, each tool spins its own dependency web. For eg, installing ehmm (or was it speech tools?) requires sox which requires Mac heads to go out and install Mac Ports or something similar. Running “./configure” for hts just to create the dumb make file requires tcl/and snack. God help you if you’re on modern 64bit hardware like I am. The web site for snack offers a PPC version of snack. It’s best (and painful) to build from source.

So I download the source tar and carefully follow the steps in the README which are incorrect. According to the README the tcl framework should be located in “/Library/Frameworks/Tcl.framework/Headers”. On my system I found it under “/System/Library/Frameworks/Tcl.framework/Headers”, though this might vary depending on how you install Tcl. And that’s the problem. With all the different paths for frameworks and libraries it’s a wonder you can get anything to work! After getting no satisfaction after running “make install” from the source bundle and many failed attempts to copy the libraries into the different Tcl and Tk framework folders I stumbled across a post somewhere that hinted that Tcl extensions should be found/loaded in “/Library/Tcl”. You would never know this unless you were a Tcl veteran. There wasn’t even a “/Library/Tcl” folder on my system! I created the structure and voila! Where was I? Oh yeah, I was trying to install the hts package. Now I’m struggling with a another dumb error that complains it “can’t find swab of SPTK”. What in blue blazes is SPTK and why do I need a swab of it? In conclusion, to build your own voices in Mary you have to be Tcl’ish, have a snack, get a swab of SPTK, oh and make sure you’re wearing sox! That’s only scratching the surface.

Speex is an audio codec specially designed for voice audio. I’m sorry, a codec is a software component used for encoding and decoding something. Oops, I’m terribly sorry! An iPhone is kinda like a phone but it has no wires, y’see… and the letter i is prefixed to it because… Hi, I’m Cliff. You’re probably here because you already know what an iPhone is. Maybe you’ve seen one of the commercials. Maybe your cousin’s best friend’s uncle’s nephew’s best friend’s cousin has one. (In case you missed it, that was a really really verbose way of saying “your cousin”!) Maybe you’ve been around the block once or twice and happen to know about both codecs and Speex. (Maybe it was dem fools throwin’ bones by the liquor store that taught you about Speex. Don’t laugh, cause you’d be surprised by what you learn on the streets these days.) Whatever the case, I’m going to explain how I got a clean compile on the Speex library using XCode. Hold tight because the remainder of this post is designed to be informative… that is all jokes and nonsense to the side.

Download the Speex C source bundle

Unpack it

you might wanna try building it from the cmd line. Run “./configure;make” from the folder where you unpacked it.

Create a new project in XCode. This can be an iPhone or an OSX project but for consistency’s sake (and because it’s what I put in the title) let’s use an iPhone project.

Create an actual folder, not an XCode group, under the root of your new project and call it “CSource”. This is where we’ll put the Speex source code.

Drag/drop the folder into your XCode project or create a group that points to that folder.

Double click your project icon in the left hand tree in XCode to edit the project settings.

Goto the build tab and type “header” in the search field to filter your choices to the things that deal with headers. (Yes I just blatantly included a “goto” in a modern day technical writeup.)

Look for the “Header Search paths” build setting. It should fall under the search paths category somewhere toward the middle of the screen. If you don’t see it finish typing the term “header search paths” in the search bar above. It does an incremental search as you type.

Double click the Header Search Paths build setting to bring up the edit dialog then double click in the value field and set the value to “$(SRCROOT)/CSource”

Open the folder that you unpacked Speex to in Finder and drag the “libspeex” folder directly into the CSource grouping you created in XCode. Choose yes to copy the files.

Back in Finder, navigate to the include folder under the Speex unpack directory. Drag the “speex” folder out of here and next to the libspeex folder you added to your project in XCode. Choose yes to copy the files.

Back in Finder, drag the “config.h” file out of the root of the speex unpacked folder and into your XCode project.

Back in XCode, hit Cmd+Shift+D and type “arch.h” to find and open the arch.h file we’ve imported into our project.

Add a “#include” to include “config.h” at the top.

Remove the “echo_diagnostic.m” file under libspeex from the project as it will just cause complications.

Hit compile and wait for the errors to roll in!

If you followed the above steps correctly then you should only see a few errors relating to duplicate symbols. If you get thousands of errors then it’s likely related to missing header files. You probably have to make sure you imported the speex folder with all the headers and double check your header search path to make sure that it points to the directory containing the imported “speex” folder. You might find a bunch of errors if you don’t remove the “echo_diagnostic.m” file from compilation. The last order of busniess would be to import the ogg container source.I downloaded libogg-1.1.4 which appears to work with speex-1.2rc1. Including the “ogg.h”, “os_types.h”, “bitwise.c”, and “framing.c” files allows me to compile code included from the “speexen.c” and “speexdec.c” examples.

The source of all code that’s fun

Code like a girlMy Page RankWho links to me?
The opinions expressed here are my own and are not borrowed, stolen, shared, or necessarily understood by my current employer.