Audio Units are a little bit less forgiving than Audio Queues in their setup, and they have a few more low-level settings that need to be accounted for, and they are a little less documented than Audio Queues, and their sample code on the developer site (Auriotouch) is a little less transparent than the one for Audio Queues (SpeakHere), all of which has led to the impression that they are ultra-difficult and should be approached with caution, although in practice the code is almost identical to that for Audio Queues if you aren’t mixing sounds and have a single callback. At least, I’ve spent as much time being mystified by a non-working Audio Queue as by a non-working Audio Unit on the iPhone. But it needs to be said that the main reason that Audio Units aren’t much harder than Audio Queues at this point is because a lot of independent developers have put a lot of time into experimenting, asking questions, and publishing their results. A year ago they were much more of a black box.

The decision process on which technology to use is something like:

Q. Are any of the following statements true: “I need the lowest possible latency”, “I need to work with network streams of audio or audio in memory”, “I need to do signal processing”, “I need to record voice with maximum clarity”
A. If yes, Audio Units are probably best. If no,
Q. With the answers to the previous questions being no, do you still need to be able to work with sound at the buffer level?
A: If yes, use Audio Queues or Audio Units, whichever is more comfortable. If no, use AVAudioPlayer/AVAudioRecorder.

In my experience there is just one big downside to the Audio Unit on the iPhone, which is that there is no metering property for it. There is a metering property which you can see in the audio unit properties header and in the iPhone Audio Units docs, but it isn’t really turned on, and you can lose a lot of time discovering this via experimentation. So, if you’ve chosen to use Audio Units and your implementation is working, you have a render callback function. This is where you can meter your samples. I have only written/tested this for 16-bit mono PCM data so if you are using something else, adaptations might be required.

To meter the samples in the render callback requires six steps.

Step 1: get an array of your samples that you can loop through. Each sample contains the amplitude.
Step 2: for each sample, get its amplitude’s absolute value.
Step 3: for each sample’s absolute value, run it through a simple low-pass filter,
Step 4: for each sample’s filtered absolute value, convert it into decibels,
Step 5: for each sample’s filtered absolute value in decibels, add an offset value that normalizes the clipping point of the device to zero.
Step 6: keep the highest value you find.

That end value will be more or less the same thing you’d get when using the metering property for an Audio Queue or AVAudioRecorder/AVAudioPlayer.

That should give you a metered decibel value which is analogous to the output of the metering property for an Audio Queue. If anyone has any corrections to this or comments I hope they’ll get in touch.

My starting point for learning this technique was a helpful response email from iWillApps’ Will to a silly question I had which got me on track analyzing the actual samples, and this page where the math behind displaying DB is broken down pretty thoroughly, and this post on Stack Overflow which explains that the process needs to be done on a rectified signal and has the low-pass filter code example.

About Halle

Subscribe and Connect

Sign up for the Politepix OpenEars frameworks mailing list here in order to receive infrequent notifications of when OpenEars frameworks (such as RapidEars and Rejecto) are updated and when they go on sale. We also occasionally have sales which are for mailing list subscribers only, so it's beneficial to sign up. We will not send you mail frequently, and you can easily unsubscribe any time you like. We do not share email addresses with anyone.

Thanks for pointing that out. When I work with Audio Units on the iPhone I use a C++ struct containing the elements I’m going to need to access such as the AudioUnit and the CAStreamBasicDescription for the input/output formats, etc. in this case, audioUnitWrapper is the name of the struct, and audioUnit is the actual Audio Unit that is being metered. So, audioUnitWrapper->audioUnit can just be replaced with the name of your Audio Unit object, however you’ve defined it.

Good question. The low-pass filter smooths the displayed amplitude so that you get a readout that would be similar to a VU meter on a piece of audio equipment, which is the kind of shift of the amplitude value over time that we’re used to seeing in audio metering. You can try leaving it out to see what it does – the values are pretty jittery and the reaction of a UI element displaying the values is going to be fairly erratic.

The negative decibel number is a particular convention for displaying power — it shows decibel values as a value below zero, where zero is the clipping point (point of distortion) for the recording device. You’ll see this approach on professional recording devices, and it is also the way that metering works for iPhone Audio Queues, which this code is intended to emulate.

So, the negative decibel values are not absolute decibel values that show the sound pressure level in the environment; instead they show how many more decibels of sound pressure are possible before the recording device starts to distort (-40dB means that if there were another 40dB, the mic would clip). Unfortunately, I don’t happen to know the absolute clipping point for the iPhone mic so my constant of -74.0 is an estimate.

If you want positive decibel values you could define DBOFFSET as 0.0 instead of -74.0. Let me know if that works for you.

First question is whether the logging of the values that is in the code example is working for you. Do you see the values logged when you’re running the code? If so, and everything is therefore working as expected, my guess for the reason you aren’t seeing a UILabel update as expected is that you are probably running this code on your main thread, which is the UI thread on the iPhone (so the audio unit code might be blocking the UI from updating).

Thanks for your quick response.
I come through AurioTouch example from Apple, but this code is doing processing on the basis of FFT but i wanted to an RTA on octave and 1/3 octave parameters, Not on FFT.

Right now i’m working with some code that uses a playbackCallback from the RemoteIO unit and sets the buffer for playback (to be sent directly to hardware) from the concurrent packets of an audio-file on disk that has been loaded into memory.

The code uses AudioFileReadPackets and the AudioBuffer->mData is UInt32 for some reason. If I try changing it, the sound gets all screwed up.

Long story short, how does using a UInt32 for Wave File PCM data change the algorithms above? Will it still work?

I tried the code. Do you by any chance know what UInt32 is doing for me? I’m used to using Float32’s where my samples are all from -1 and 1… where a waveform is easy to draw, decibel metering is just finding an average of the absolute value of these values scaled up to 1 being 0db.

After implementing your code, I’m getting positive values that go from about 40 to 105. Any thoughts? I implemented a simple UIView with a green background and had it draw itself at a width in ratio to the DB value i’m getting and (though it gets jittery a tiny bit) it seems to hit on peaks… I just don’t know the accuracy or even understand the point of UInt32 as a sample structure.

If I print out the raw UInt32 numbers, I get numbers as big as 3557007849. What does this mean in relation to the -1, 0, 1 relationship i’m used to seeing?

What is basically going on here is that there is a pointer to an array of 16-bit integers (SInt16* samples), that is full of the individual 16-bit samples that we are analyzing from the buffer. So one SInt16 out of this array is a single sample with signed 16-bit sample data in it.

A sample is itself a measure of amplitude since a DAC works by measuring the amplitude of an incoming sound wave however many times a second until it has enough data for a smooth rendering of the wave. The data that is stored to a single one of those samples is the height or depth of the wave at that moment in time. Since the sample has to describe the amplitude of a peak or a trough of a wave, a signed integer is going to put the midpoint at zero, where a negative value will describe a point on a trough and a positive value will describe a point on a peak. The values that are possible to store in a signed 16-bit integer are −32768 to 32767 so those are the maximum ranges possible above and below zero. I’m guessing that in the case of an unsigned sample, the midpoint would need to be the value that is half the largest value that can be stored in the sample, so if I’m correct the value range of your unsigned 32-bit sample would be 0 to 4,294,967,295 (the maximum value which can be stored in a UInt32) with a midpoint of 2,147,483,647. Below 2,147,483,647 represents a point on a trough and above it a point on a peak.

When you ask:

> If I print out the raw UInt32 numbers, I get numbers as big as 3557007849.
> What does this mean in relation to the -1, 0, 1 relationship i’m used to seeing?

I haven’t worked with arrays of Float32 samples but I’ve heard they are used in some audio formats for more precision, so I’m guessing that the answer to your question is just that you’re used to seeing arrays of Float32 samples that express wave amplitude on a scale of -1 to 1 with a midpoint of zero (probably with a ton of precision after the decimal point), and the UInt32 is expressing the same wave amplitude on a scale of zero to 4,294,967,295 with a midpoint (again, I’m guessing on this one) of 2,147,483,647. My decibel code attempts to express power as decibels on a scale of -n to zero because this is how Apple does it for Audio Queue services and I’m trying to make code that can work with the same UI whether it uses the built-in Audio Queue metering or this kind of Remote IO Audio Unit metering, so it is doing something different than describing wave amplitude over and under a midpoint (which is why we need to do some operations to the sample before we have that information).

On to why the code has unexpected results (sorry, I didn’t really think about the signing issue before answering previously) – the first thing we’re doing is getting an absolute value from the integer in which the midpoint is assumed to be zero, which isn’t a helpful thing to do to UInt32s because the values do not range from negative to positive values with a midpoint of zero.

What we could try instead is getting the absolute value of (samples[i] – 2147483647) instead of the absolute value of samples[i] by changing:

This should normalize the midpoint to zero before getting the absolute value. I think you might also need to change all or some of the Float32 variables to Float64 too.

There might be easier or more efficient ways to do this and an alternate approach might be evident to you as well so don’t hesitate to pitch in with ideas. I don’t have an audio project around that is easily configurable to produce unsigned 32-bit samples so you’re going to need to test it out and get back to me — I’ll be interested to hear what you discover.

So.. after spending my whole Saturday afternoon trying to figure out how to read this UInt32… I finally realized that the reason it’s being passed directly to the RemoteIO unit as 32-bit instead of 16-bit is because it’s interleaving both left and right channels in the low/order bytes.

So I split the uint32 into 2 SInt16’s… and go figure, you’re algorithm works beautifully :-) the same as the AVAudioPlayer…

thanks much man. This has been quite a hard road i’ve taken trying to figure out this core audio stuff… i’m just glad there are great people like you around to help us out.

Ha, that’s really interesting, I was also wondering about why it was in a UInt32, so it’s nice that you’ve solved the mystery. Glad it’s working for you! Audio Units for iOS is definitely a challenging area of Core Audio, although it’s impressive how it performs on the device once you get it working.

I currently am working on a project where I’m attempting to retrieve audio file amplitudes. I created an AudioBufferList and then passed it to the samples variable you listed above… However each time I pass through the buffer data I get a different amplitude value.
I’m new at using Core Audio so I may be doing this all wrong. The data I transfer into the AudioBufferList comes from a AudioBuffer which points to a casted void buffer. Also the audio coming into the buffer is opened, not streamed…
Do these points make any difference?
Do I need to run the audio through the AudioUnitRender prior to it working correctly?

Thanks, it was just poor memory management on my part. Now I face a different issue. I have everything hooked up as working to return db’s. I have noticed that the range is quite low however, only returning about a 6 DB change. When I use your default values for DBOFFSET and LOWPASSFILTERTIMESLICE, I get results from −21 to −15 when testing by keeping silent in the room to test the lows and yelling into the iPhone to test the highs. How can I increase the dynamic range here?

Interesting, when I check the frequency response on the part (and solely that part) that is identified as the LP filter in the example I see the results I would expect to. But after the LP filter, a peak frequency is selected out of the smoothed results. You can read a lot more about that IIR low pass filter at the Stack Overflow discussion that is cited in the post as the origin of the filter.

This also has an explanation of how to show a positive number in my tutorial. I don’t know offhand how to show a positive number with someone else’s tutorial that I haven’t tried, but usually you will just see if you can identify the lowest-possible negative number and then add that same number as a positive number to the result you want to convert. -74 + 74 = 0, the quietest possible value. -30 + 74 = 44, a middle value. 0 + 74 = 74, the highest possible value before the mic distorts.

I believe it is peak since the highest value after the low-pass filter is selected. Without knowing how the AudioQueue value is derived, I don’t know for sure that this method results in the same value.

You mention this code gives you about the same results as using metering. Is this due to the low-pass filter code?

Right now I’m using AVAudioRecorder metering in a dumb little free app that computes counts per minute from the clicks of a Geiger counter. It works fairly well with weakly radioactive things.

The problem is the peak level always sticks for 850ms, and though the avg level seems to recover faster, it’s not fast enough; increasing the timer polling from 30 Hz to 300 Hz doesn’t help. With a 1200 CPM uranium ore sample I’m getting ~200 CPM max on an iPad with little CPU use. At the very high end, on YouTube there are videos of CDV-700s getting 30K CPM readings from samples… I looked at the waveforms and yep, they really are 30K CPM, though it is difficult to tell.

Is the above metering code more responsive than Apple’s metering? Do you have any suggestions on filtering that can go up to say, 30K CPM, but not peg the CPU or count a single click twice?

Thanks for sharing this code though, it’s the closest example I’ve found for what I’m trying to do.

Thanks for this code, I have a question about the results I’m seeing though. I’m recording in mono at 8000kHz using signed big-endian integers. The decibel reading I get in the logs always stays around 10, very occasionally dropping to 9 or 8 if I stick a pair of headphones playing some audio in front of the mic of my iPhone. Are these values expected? I seem to get much more responsiveness when using metering with Audio Queues even taking into account the latency, so think it might be a problem with my PCM format rather than your code.

It’s the format — reverse the endianness of the sample before reading (I’m pretty sure there are some good bit-shifting examples on Stack Overflow) and verify that it’s 16-bit and SInt16 is the appropriate kind of sample array to use.

Thanks for getting back to me so quickly Halle. I’ll try reversing the endianess and see what happens. Also, I’ve read through the docs but can’t see an obvious way of determining what kind of data I should be getting in my AudioBufferList. I specify kLinearPCMFormatFlagIsBigEndian | kLinearPCMFormatFlagIsSignedInteger for the format flags, so I’m guessing that a signed integer is what I should be receiving, and it looks that way in my logging.

I actually saw your question about this on the iPhone SDK list and almost answered it there, but I didn’t have enough time, so I’m happy you found this post regardless. You don’t have to do anything so complicated as combining the code — in fact you probably don’t need to use my code example above at all because you are using Audio Queue Services and they have built-in decibel metering.

I think that what you are seeing is that the decibel levels that your Audio Queue reports are negative numbers, is that correct? And you want to report a positive number that represents the actual power of the signal that is hitting the mic. Am I right so far, or are the values you are getting from the Audio Queue wrong for your purposes in some other way?

For some reason I did not get your reply thru email. Anyways THANKS A BUNCH, for answering.

Yes, the values that I am getting from the Audio Queue are negative. Even if I talk very loud, the value that is reported from Audio Queue is negative, which I think is not right. Thats why I wanted to see if there are any other way I can get the Decibel value.

No problem. The negative decibel number isn’t actually wrong, it’s just a particular convention for displaying power — it shows decibel values as a value below zero, where zero is the clipping point (point of distortion) for the recording device. You’ll see this approach on professional recording devices, and it is also the way that metering works for iPhone Audio Queues.

That means that the negative decibel values are not absolute decibel values that show the sound pressure level in the environment. Instead they show how many more decibels of sound pressure are possible before the recording device starts to distort (-40dB means that if there were another 40dB, the mic would clip). This is a very useful approach for audio engineering purposes because it shows you the headroom available for the signal before it becomes unusable.

For your purposes, if you can:

1) obtain the decibel level when there is absolutely no input (might be something like -80, I don’t know if this is dependent on the device — to emphasize, this is not the value with very quiet input, this is the value with no input at all, for instance with the mic disabled),
2) get the absolute value of that number, and
3) add the absolute value to the negative value,

that will change your scale into zero to positive values.

This doesn’t turn the iPhone into a highly-accurate decibel meter, but I think it would give you the kind of output you are expecting for a just-for-fun project.

Yup, sorry, I thought I was clear that this would just transpose your scale of quietest->clipping into positive numbers instead of negative numbers, without changing the size of the scale or what it measures.

It has a role that is explained in the comments. It’s a quick way of throwing out any NaN values that might come through for unforeseen reasons. Here’s another example for you: http://stackoverflow.com/a/2109282/119717

Ive readed true out all the comments. But one thing I just cant figure out. If I want the meter to show me between 400 and 500 Hz what is the best way to get to work with this. The purpose of this is so you can tune instruments with this meter. I hope you still read this topic.

A couple of questions. Firstly the if that checks for rational number and isn’t infinite

if((sampleDB == sampleDB) && (sampleDB = -DBL_MAX)) {

Shouldn’t that be sampleDB != -DBL_MAX ? I seem to be getting values of DBOFFSET.

If I change to != then I get values out but the range is really small. I tried changing DBOFFSET to -94 however all that did was change the offset (DUH). In my case it seems to be around -9.7 with only slight variation between -10 & -9.6. Is that the low pass filter that is having this effect?

I’d guess that you’re getting unexpected results because of a different input besides a signed 16-bit integer using the full range of samples that this assumes. Since you’re getting floats out, am I correct that your input is a float value?

Hi,
Thanks for your resource full article . However when tired this example i always get -74.0 values. Do you what could be the reason for this.
I used recordingCallback function instead of AudioUnitRenderCallback could this be the problem?
Appreciate your answer or any suggestion.

-74.0 means no input at all. So, you only need to troubleshoot why there is no rendered content in the buffers that are in your callback. All Audio Unit and Audio Session code has a method of error checking (including the buffer callback), so log that error checking code and it should tell you what is happening.

NeatSpeech is a plugin for OpenEars that lets it do fast, high-quality offline speech synthesis which is compatible with iOS6.1, and even lets you edit the pronunciations of words! Try out the NeatSpeech demo free of charge.

AllHours® is a registered trademark of PolitepixThe Politepix site uses cookies in order to understand how the website is used by visitors and in order to enable some required functionality. You can learn all about which cookies we use on the About page, as well as everything about our privacy policy.TWITTER | CONTACT POLITEPIX | IMPRESSUM | ABOUT | LEGAL | IMPRINT