Saturday, January 12, 2013

Playing a Pitch in an Android App with AudioTrack

I've been working on an android app lately that can be used to tune instruments. I've run into a road block with the pitch detection, so I'm starting over and making the code nicer. While I'm at it, I figure I may as well share some of what I've learned!

I would like to share how to play a sine wave of a specific pitch. It isn't too terribly difficult, but most of the examples out there have an annoying "clicking" sound, and there is a bit of a time delay after you tell it to stop making noise. I've solved both these problems, and I will now show you how to do this task.

First, I would like to state a disclaimer. I've only been studying Java and the Android API for about a month, so please forgive any bad Java or Android programming practices. I will be using Eclipse, but I won't be telling you how to create an Android project. There are other resources out there that explain that pretty well.

Let's get started, shall we? First, I'll go into a bit of sound theory, then we'll set up a simple GUI, then we'll start on the cool part of the code!

Sound Theory

What is sound?

When something makes a sound, it creates pressure on the air (or whatever else). This pressure is applied, then released. Our ears are able to interpret the pressure coming and going as sound.

If you were to graph the pressure coming and going with respect to time, you'll have a squiggly line that goes up and down. In the real world, lots of sounds are blending together, and that makes the graph kind of "spikey" or "noisy". If you focus on just one sound of a specific pitch, though, it will look something like this:

Depending on how quickly the pressure comes and goes (or how many times the graph changes direction), our ears hear a specific pitch. How quickly it goes up and down in a second is called the frequency. On a piano, the A above Middle C has a frequency of 440 Hz. That means that it goes through a complete up-down cycle 440 times a second! I don't think that I can fit that graph on here, but imagine 440 of the above graphs happening every second.

How do we make sound?

To play a sound through a speaker, you need the speaker to go out and in (creating pressure and lack of pressure). To play the A key above Middle C, the speaker will do this 440 times a second. Easy, right?

We will use the Sine function to figure out the pressure at each instant in time. Why the Sine function when anything that goes up and down might do? Well, a search on Wikipedia does show all sorts of different functions that can be used to play a pitch, but the Sine wave seems to be the most-requested on help forums. Also, I'm not an expert in sound theory, so I'll stick with what I know works.

Set up the GUI.

Now that we know a bit about how the sounds are made, we can start work on the app! We'll start with the GUI, then we'll go back to the sound stuff in a second. If you already have this part done and want to skip right to the audio coding, click here.

I won't go into details on how to do this. Instead, let me just show you what my little test app looks like and give you the relevant code.

Programming the Sound Player

Going back to the topic of how sound works, we know that we want to send energy to the speaker that comes and goes. It will look like a Sine wave if we graph the energy with respect to time, and that Sine wave will go up and down really *really* fast!

To do this task, we will use Android's AudioTrack class. It's really easy to do. We just call the constructor with a few bits of information, then we just keep writing data to it. That data tells it how much energy to send through the speakers. Piece of cake! Right? :P

At first, we'll do it the way most other examples out there do it. Then, after you see (or, rather, hear) the issues, we'll go over what's happening and how to fix the problems.

First, let's make a handy PitchPlayer class.

In our other code, we don't want to worry about the messy stuff of writing a sine wave to an AudioTrack. So, we'll make a class that's easy to use! We'll just have a constructor, a setFrequency method, a start method, and a stop method.

streamType can be one of several values. I think that STREAM_MUSIC makes the most sense out of all of them. It's technically music we're making, right?

sampleRateInHz is how many pieces of audio data to process at a time. This isn't the pitch or frequency of the audio data. This is how fast it reads the data that we write to it.

If this number is too low, we will be very limited in how many ups and downs we can make it do in a second. For example, if we are trying to make the speaker go up and down 10,000 times a second, but we can only send 8,000 pieces of data a second, we can't play the 10,000 Hz pitch!

We'll use a constant named sampleRate for this. That way, we can change it around when we need to.

audioFormat specifies the format of the data we'll be writing to it. Do we want eight bits per data sample or sixteen bits per data sample?

The higher the bit count, the more precise we can be about how much energy to exert at a specific point in time. We'll have more values to choose from! However, this comes at a cost of memory usage.

We'll use 8-bit data samples. If the sound quality isn't good enough for you, it isn't too hard to convert it to 16-bit.

bufferSizeInBytes is how much data needs to be stored by the AudioTrack. A larger number means that we can fit more ups and downs in there, but we might not have to do that if we can just loop back over the data that we already have.

For now, we'll use a constant and set it to however much data is required to play a sound for a second. Later, we'll change this to a value that makes more sense (and you'll see why later on).

mode can either be MODE_STATIC or MODE_STREAM. Static means that the data is written just once before playing. Stream means that we have to continuously write data to the AudioTrack.

So that we don't have to get into messy multi-threading, we'll do MODE_STATIC. We won't be changing the data it's playing while it's playing it, so why would we need to continuously update it anyways?

The constructor code is below. Remember, sampleRate and bufferSize are constants. Side note: Is this what they're called in Java? I'm coming from C/C++. Anyways, they're declared "final", if that makes any difference.

To set the frequency on our handy dandy PitchPlayer class, we just call the setFrequency( frequencyInHz ) method! Well, that's pretty easy for the other code that uses this class, but it's a bit more involved on the inside.

The setFrequency method will write a sine wave to the AudioTrack. The sine wave will go up and down some number of times a second. Let's start with a for loop that goes through each sample from 0 to bufferSize. It will write the value for that point in time.

for( int i = 0; i != bufferSize; ++i ) {//write value here
}

The i'th sample is at some point in time. Which point? We need to know this! Well, since we are writingsampleRate values a second, each sample is 1/sampleRate seconds apart. So, we multiply i by (1/sampleRate) to get the point in time.

Now, how do we get the energy output at a point in time (t)? Well, we're using the Sine function f(t) = sin(t). The Sine function goes up and down after 2*PI seconds using this formula. If we want to go up and down once in a second, we'll multiply the time (t) by 2*PI to make it think that 2*PI seconds have gone by. We end up with f(t) = sin(t*2*PI).

But we don't want it to go up and down once per second. We want it to go up and down the number of times specified by the frequency variable. Following the logic in the above paragraph, we multiply by frequency to get f(t) = sin(t*2*PI*frequency). That, dear readers, is how much energy should be exerted at a specific point in time.

Now that we have the amount of pressure needed for that sample, we need to write it to the AudioTrack. We could do that with the write method... after looking at the specification, though, it looks like it might be most efficient to write a lot of data to it at once. Therefore, let's make our own buffer and call it samples. Our code now looks like this:

The (int)(f * 127) part is just us converting from double to byte where 127 is scaled from 1. In English, f is between -1 and 1. When we go to a byte, it will have to be between -127 and 127. So, we multiply by 127.

Now, after the for loop, we can write the data to the AudioTrack. The write method takes an array, an offset, and a size. Easy enough! Our final code to setFrequency is:

Now on to the start method.
AudioTrack has a nice little start method. Sadly, this has its limitations. The first limitation is that it stops after the data has been played. The second limitation is that, after the stop method is called, it won't replay the data.

The solution to the first limitation is to use the setLoopPoints method to tell the AudioTrack to just keep looping over the data. Its parameters are the starting position, the ending position, and the number of times to loop (-1 for infinite).

The solution to the second limitation is to call the reloadStaticData method so that it can reuse its data.

It works, but it doesn't work well!

There are two issues. The first and most obvious is the delay between hitting "stop" and the sound actually stopping. The second is a clicking noise that is heard at most frequencies.

There's also a third issue that has to do with the app crashing if you leave the EditText blank, but that's a GUI problem. It's easy to fix, but the GUI is just for testing our PitchPlayer, so I won't fix it. Just don't leave it blank!

To fix the delay between hitting "stop" and the sound stopping:
The AudioTrack specifications state that stop will continue playing until it reaches the end of the current buffer, if it is in streaming mode. In that case, it recommends a pause followed by a flush to stop it. Well, that doesn't work for me, and we're in static mode.

If the AudioTrack will continue playing until it reaches the end of the buffer, why not make the buffer smaller? Let's make it as small as we can possible make it while still being able to store a complete up-down of a Sine wave.

That's the part of setFrequency that comes up with the up-down values for each sample in the buffer. Let's look at its math and see how small we can make bufferSize so that the sine function goes through a complete cycle. Let's do this by looking at i when it is at bufferSize.

Given: t = i * 1/sampleRate
Given: f = sin( t * 2*PI * frequency )

If we're looking at i when it's at bufferSize, we can just say that i isbufferSize.t = bufferSize * 1/sampleRate
t = bufferSize / sampleRate

We want f to make a complete up and down over the samples. The Sine function goes up and down when its values are from 0 to 2*PI. The end of Sine's cycle is when the stuff inside of it is 2*PI. So, let's set the stuff inside to 2*PI and see what we get!(bufferSize/sampleRate) * 2*PI * frequency = 2*PI
(bufferSize/sampleRate) * frequency = 1

There we have it! Now we just need to decide what minimum frequency we want to be able to play. Then, we can figure out what bufferSize we need! Why the minimum frequency and not the maximum frequency? Well, a higher frequency goes up and down more times in a second, so it takes less time for it to go up and down once. A lower frequency goes up and down less times in a second, so it takes more time for it to go up and down once.

We'll have a constant for the minimum frequency (minFrequency), and we'll compute our bufferSize constant from that. I've chosen 200 for the minimum frequency since that's the lowest pitch that my computer can play okay. Conversely, my headphones stop working below 70, and my stereo stops working below 35. I wouldn't set minFrequency below 20.

Since the rest of our code is based on these constants, everything else should work after the changes.

Now to get rid of that annoying "clicking" noise!

Trying playing a different frequencies of you haven't heard it yet. At some pitches, everything works fine. At others, you hear a clicking noise exactly once every second -- once every fraction of a second if you've already fixed the delay problem.

First, we need to understand what causes that clicking noise.
Let's say that we are playing a sound at 1 Hz, and our buffer holds a ten pieces of data. The graph looks like what's below.

Now let's play a sound at 2.5 Hz. The graph of this is below.

Notice the problem? It doesn't make a complete up and down cycle! Since it's looping over the data, the sound it plays over several iterations would look like this:

That break in the line is the clicking noise. Fixing this is a bit more involved, though.

One solution that I've seen online is to curve the ends of the sine wave towards zero at both ends. I don't like this solution because the graph no longer depicts a sine wave.

My solution is to change the buffer size! If the buffer for the 2.5Hz signal is 8 samples long, everything will work out just fine! Take a look below:

Now, how can we put this into code? Well, we need to find a formula for how much to shrink the buffer by. To start, let's go back to our for loop.

bufferSize is how much space we have to work with. When i is equal to that, we have this:f = sin( bufferSize / sampleRate * 2*PI * frequency )

We want f to make some number of complete cycles. The sine function makes a cycle every time the stuff inside of it goes to a multiple of 2*PI, so let's let x be a scalar and set the stuff inside to x*2*PI.x * 2*PI = bufferSize / sampleRate * 2*PI * frequency
x = bufferSize / sampleRate * frequency

And now we have how many times f would make a cycle inside of the given buffer! Indeed, if you plug in our minimum frequency, you would get 1!

We're not done yet, though. If our bufferSize is 200, our sampleRate is 8000, and our frequency is 50, we end up with an x equal to 1.25. It's not a whole number. The 0.25 is what was causing the "clicking" noise before, so we need to cut out that part of it. With computer, this is easy. Just store it in an integer!

After we have found the x and have cut out the bad part of it, we need to recompute our buffer size. All we have to do for this is rearrange the equation.x = bufferSize / sampleRate * frequency
bufferSize = x / frequency * sampleRate

Now that we have the math for this solution done, let's put it into code!

We'll make a new private variable mSampleCount to keep track of what we come up with. Since we'll need to know it in the start method when we specify how much data to play, we need a way of getting it there. It will be set in the setFrequency function, a different place all together. Thus, a (probably unneeded) explanation of why we're making a private variable named mSampleCount.

We'll put the math from above in the setFrequency function. Right at the top, we'll set it. We'll need to change the for loop, too, so that it doesn't go over the entire buffer. Instead, we want it to just go over the part of the buffer that we need -- 0 to mSampleCount. Here's the new code:

And that's all there is to it!

Okay, so maybe it's a bit more complicated than just a simple "computer, play me a pitch!" But now you've got some code that can actually be used, and you will never have to worry about this again! Yea... lol.

Anyways, here is the finished product! I've left out the GUI. This is just the PitchPlayer.java file. All you have to do to use it is create a PitchPlayer object, then setFrequency(Hz) and start() and stop().

9 comments:

This is really helpful, but there seems to be a major typo, which is that where you say "I've left out the GUI. This is just the PitchPlayer.java file" you've actually done the opposite and shown us the MainActivity class instead of the PitchPlayer class.

Thanks for writing this up, though. This is something I've struggled with a lot. How do your pitches sound compared to other pitch pipe apps? I've found that the sounds I generate are often very unpleasant to the ear.

The sine waves turned out perfectly in the final product, but a few changes were required to the code in this post. I seem to remember it struggling for extreme frequencies. I'll try to find the code in my backups after the holiday. I ended up scrapping the project when I found a better app to use.

Thank you for this great work! I'm trying to make it work with 16-bit data samples but it doesn't seem to work...it only plays a couple of keys and then it shows an error in setLoopPoints... Do you know how to make it work??

James Belue, I appreciate and did learn much from this post. Especially, thank you for ingenuity in fighting "pops" in AudioTrack output! However, there are a few issues:

1. The biggest thing I learned was that apparently on some Android platforms, only a sampleRate of 48000 works with STATIC_MODE. Using an LG phone running Jellybean 4.1, your setting of 41000 produces no sound, and also (exasperatingly) no diagnostics! I think this also happens under Kitkat. So if someone is seeing no sound and no errors, try 48000.

2. Your "finished product" has a few gotchas:a. The start() method should have mAudio.pause() before mAudio.reloadStaticData(), and it is configured to play indefinitely (-1), probably not what someone adapting your code would expect or want to use.b. Calling stop() on a MODE_STATIC instance of AudioTrack requires a call to its write() method before resuming play. So the end user can't call your stop() and then start(), the sequence is stop(), setFrequency(), start().c. You should include a method of calling release() because your class hides the AudioTrack instance mAudio from callers. Either that or make mAudio public.

Further notes from the same "Anonymous" as 6/24/16 above:With the Android platforms I tested, either 8 bits is not precise enough to describe sine waves such as 700Hz, or the implementation is faulty. Frequencies which evenly divide 48000 play, others produce noise and harmonics. Switching from AudioFormat.ENCODING_PCM_8BIT to AudioFormat.ENCODING_PCM_16BIT and using shorts rather than bytes in the code above has fixed this for me. Note that when using 16 bit encoding, bufferSize is expressed in number of bytes, but mSampleCount is expressed in number of shorts, so will be .5 * bufferSize. Also note that setting bufferSize to sampleRate / minFrequency * 2 gives the smallest buffer that accomodates full wavelengths for all frequencies. At a 48000 Hz sample rate (see above), without the *2 multiplier some frequencies such as 700, 1100, and 1300 Hz don't evenly fit the buffer, resulting in clicking when the waveform gets truncated at the end of the sample.

About Me

Programmer, talks to self, crazy thoughts, terrible at song-writing and singing though I would never agree with that, odd musical tastes, embarks upon extremely odd adventures, thinks bamboo caterpillars are tasty, and went to TCHS, NEMCC, and MSU.