13 January, 2008, 10:35:38 PM

The dynamic range of a selection of music is dependent on both estimating the time-varying loudness of the music and the timescale used for loudness evaluation. I propose a numerical method of estimating dynamic range that satisfies those dependencies using a modified ITU-R 1770 loudness filter and three moving windo[/i]ws to estimate loudness across three different timescales. The goal is to more accurately measure and compare dynamic range between different music genres and different masterings and processing techniques for the same music.

Dynamic range = range between 50th and 97.7th percentile, for each timescale

[/li][/list]I've been kicking this around for almost a year, but I finally broke down and wrote the thing for real in an afternoon last November (it's been extensively tuned since then). The recent discussions about dynamic range have forced my hand, because so many important things were touched upon, and really, you can think of pfpf as an extremely elaborate reply to that topic.

This is a better way to measure dynamic range, for the following reasons:

It measures dynamic range as a ratio of loudnesses. Peak-to-average cannot claim this (it is fundamentally a comparison of two different units). ReplayGain comparisons cannot claim this.

It uses a real loudness model (flawed though it is) for the basis of loudness estimation. Waveform comparisons (especially for loudness-war-related discussions) are fundamentally flawed for this reason - what you get out of Audacity has a relatively tenuous connection to real perceived loudness.

Dynamic range is estimated across three different timescales - 3000ms, 200ms, and 10 ms - and each scale is fully decorrelated from each other. So pfpf can tell between when a quiet passage has a loud transient, or when a loud passage has a sudden pause. The timescales are configurable.

It uses a percentile approach on a histogram for estimating dynamic range, instead of min/max/avg. This makes the technique much more resilient to differences in mastering and medium; pops and ticks should not affect results, nor should small bits of digital silence, like in greynol's Tool example. (Yes, greynol, you can distinguish ppp from fff now.) The percentiles are configurable.

Background noise (when no music is playing) can be masked with a fixed threshold, so that silence won't pile up on one side of the histogram distorting the numbers, and the results should be invariant of any extra silence padding before/after music (this should make CD/vinyl comparisons a lot easier). The threshold is configurable.

Please read the paper, download the app and try it for yourself. Lemmeknow what you think.

This makes the technique much more resilient to differences in mastering and medium; pops and ticks should not affect results, nor should small bits of digital silence, like in greynol's Tool example. (Yes, greynol, you can distinguish ppp from fff now.)

Easy now, killer. There were no small bits of digital silence in the track I presented.

Anyway, I look forward to checking this out.

Great post!

Is 24-bit/192kHz good enough for your lo-fi vinyl, or do you need 32/384?

The numbers are not directly comparable to other metrics. You can't compare them to RG numbers or peak-to-average numbers. You need to evaluate them on their own.

That said, a lot of the constrained range is because of the percentiles I'm choosing. For the long term time scale, I could make a strong case for ignoring the 50th percentile entirely, and defining the range as between, say, the 5th and 95th percentiles. I suspect the same case could be made for the shorter timescales.

Changing from 0.5-0.977 to 0.05-0.95 would essentially double the results if the histograms are normal (and the medium/short time scales are).

The change in long term dynamics is expected. The basic problem is that the loudness computations maintain state over several seconds of music, and at the start of the file, that state must be initialized to something. There are three options for initialization:

Set it to zero

Initialize it with the music at the very start of the file

Initialize it with the music at the very end of the file

Choosing #3 would effectively stop the problem you are seeing with differing dynamics measurements, because you're essentially treating the .wav as a giant loop containing a periodic signal, and repeating the signal will not change the results any. But I would argue that such a situation simply does not exist with real-world music, and it is not as important to tune for it as you think.

#1 means that every analyzed file starts from a long-term volume of zero - and I believe that's wrong for most situations where music is played, when loudness is fairly equalized with what was played beforehand. The same problem exists for #3 - what happens if the music ends at maximum loudness, but starts very quietly? The loudness will incorrectly be initialized to a very high level. #2 avoids this issue, but results in the issue you see, where repeating the signal yields a different result.

---

In theory, a gated sine wave should have a dynamic range of zero, because the silence is masked in any listening environment. That is, the dynamic range of a recording is connected to the dynamic range of the listening environment. In reality, the thresholds should probably be raised from -80db because they are grossly generous to the listening environment.

Also, I think I see a bug in the histogram calculations that generate the long term and medium term dynamics calculations. If you look at the histograms and the percentile lines, they are way out in the middle of nowhere; they're interpolating between the high points on the histogram, when they probably ought to be clamped somewhere. I'll look into a nice way of fixing this.

Loudness, as a perceptual quality, is scale-dependent. It can vary across very large timescales (seconds to minutes), and it can vary across very short timescales (milliseconds), and the variation can be unrelated between timescales. This is important information that should be captured numerically, but capturing short term loudness also captures the long term loudness - one needs to isolate that out in order to estimate the short term dynamic range accurately.

Example: Say you have two recordings of two guys in a quiet field. One guy is speaking into the microphone at a varying volume from 1 meter away for a bit. The other guy yells at the microphone from 100 meters away after that, saying the same things that the first guy said, at the same volumes. Clearly, the overall, or long-term, loudness changes dramatically between the different speakers, and the two loudnesses are fairly constant. But at a smaller timescale, they're both guys who are yelling the same thing. If you remove the large-scale loudness difference, the short-term loudness varies dramatically (alternating between yelled words and silence), and the variation is going to be the same between the two speakers. In other words, the long term loudness differs greatly between the two speakers, but the long term dynamic range is very low; but the short term loudness, when equalized for long term loudness, is the same between the two speakers, and the short term dynamic range is higher.

In comparison, a simple program-wide loudness estimation at a small timescale, like 50ms, with a percentile measurement (50th for ITU-R1770, 95th for ReplayGain) would lock onto either the loudness of the closer guy, or average out at some ill-defined region of loudness that doesn't correspond to any actual loudness in the recording. This is correct for a program loudness equalization system, which those systems are designed for, but for estimating dynamic range, estimations of this kind lose meaning.

However, the same kind of problem exists with peak-to-average measurements, because it also uses a program-wide loudness estimation. And those are used to estimate dynamic range.

pfpf solves this by scaling shorter-term loudness by longer-term loudness. RMS power is first calculated in the size of the smallest blocks (10ms). This represents the loudness at the short term timescale. Then it holds two moving window of the last several 10ms blocks - one window is for 200ms, the other window is for 3000ms. Computing RMS power for these windows yields the medium term and long term loudnesses. Then, I divide the 10ms loudness by the 200ms loudness, and the 200ms loudness by the 3000ms loudness. This is how I claim to decouple the timescales. It's hokey, but it seems to work ok.

---

On a different note: Is Blogger a crappy way to publish this? Should I put this up on a different site, or just throw up my own HTML file, or make a PDF?

Looks like a pretty interesting project. Was going to give it a go but put off due to 90MB download of LabVIEW 8.2.1 Run-Time Engine -- then thought - no big deal -- but then was put off by the grand registration process just to download the runtime environment.

It could be just me being lazy, but I guess I've got used to apps being less of a deal to run.

I wonder if this in part explains the low response to what I would have thought (due to the whole loudness war issue) is a pretty hot topic on HA.

Looks like a pretty interesting project. Was going to give it a go but put off due to 90MB download of LabVIEW 8.2.1 Run-Time Engine -- then thought - no big deal -- but then was put off by the grand registration process just to download the runtime environment.

It could be just me being lazy, but I guess I've got used to apps being less of a deal to run.

I wonder if this in part explains the low response to what I would have thought (due to the whole loudness war issue) is a pretty hot topic on HA.

Just a thought.

Oh, yeah - I guess that could be a downer.

Here's a direct link to the small runtime installer - it's designed for web browser integration but I think it has enough to run pfpf. It's 23MB and doesn't require registration.

Otherwise, I could build an installer .exe that has pfpf and the runtime included, but then the download size jumps from 2MB to 64MB (!).

Quote

I'm not familiar with LabView -- do a lot of applications use it?

C.

It's used in a wide variety of scientific and engineering applications, but it's generally used more for institutional use than end-user use. (One notable exception is Lego Mindstorms NXT, albeit in a radically altered form.) I use it because it's the best tool I have available for the job.

Otherwise, I could build an installer .exe that has pfpf and the runtime included, but then the download size jumps from 2MB to 64MB (!).

64MB is better than the full 90MB + registration.

Also, if you have the space (1) a 64MB installer.exe could be one option, along with (2) the standalone program (2MB) as well as (3) the other alternative 23MB (browser integration) runtime which doesn't require registration

It's used in a wide variety of scientific and engineering applications, but it's generally used more for institutional use than end-user use. (One notable exception is Lego Mindstorms NXT, albeit in a radically altered form.) I use it because it's the best tool I have available for the job.

(Full disclosure: that's largely because I work for NI.)

Thanks for the info.

As for me -- if it was a 64MB all in one job (runtime + program) I'd download it and give it the test run it surely deserves.

Do you think this program would be helpful in working out audio levels for a release? i.e. if I was attempting to get db levels right across tracks of varying compression (not in the lossless/lossy sense) -- currently I use wavgain and then my ears for fine tuning -- can you see your app having a role in this kind of process?

Thank you very much for this great tool.One minor issue with the UI: I could not adjust it to smaller resolutions like 1024x786.Also, one feature request:http://img211.imageshack.us/img211/8289/declipperjd8.pnghttp://img87.imageshack.us/img87/3199/declipper2cz2.pngIt's the declipper from Izotope RX which features a so called "histogram of waveform levels" where you can see the sample distribution over the bitrange. However it is very limited as it just shows values from 0 until -8 dB and does not have a horizontal scale.Looking at an improved version would help estimating the amount of clipping.

Looks like a pretty interesting project. Was going to give it a go but put off due to 90MB download of LabVIEW 8.2.1 Run-Time Engine -- then thought - no big deal -- but then was put off by the grand registration process just to download the runtime environment.

C.

Yep that was the killer for me too.... Now dloading the smaller runtime, i think 1 complete pakcage would be better.

Yeah I really should have replied to yall sooner. Chromatix's work has convinced me to get off my butt. I just fixed all the links, so everybody can download pfpf again from the usual location.

Quote

Dloaded the small library and pfpf, installd both and rebooted. igot got all sort resource missing errors. Can not load frontpanel etc. So maybe look for a all in one package.

That's bizarre. Are you running a non-English version of Windows? You might need to download a bigger (or different) runtime in that event. Did you unzip everything before you ran pfpf.exe?

Quote

ps would it be usefull on good quality mp3 files?

In theory, the lossiness of a sample should not impact the measurements, because lossy files (with very few exceptions) should not affect the loudness or dynamic range of music.

Quote

Thank you very much for this great tool.

And thank you for taking all the trouble to run all those numbers They may come in handy for spotting problem samples, where too little or too much dynamic range is estimated.

Quote

One minor issue with the UI: I could not adjust it to smaller resolutions like 1024x786.

I'll see what I can do to reduce the resolution requirements, but I can't guarantee much. I may just punt and say that a 1680x1050 screen is required. I've already split the UI up into several different tabs and I think it's really important to keep all the histogram and loudness plots large and on the same page.

Clipping analysis really isn't what this is all about. There's a lot more meaning in trying to estimate how the ear is actually responding to dynamic range manipulations than simply pointing out the level characteristics of the signal.

That said... it wouldn't be hard to add.

Quote

Do you think this program would be helpful in working out audio levels for a release? i.e. if I was attempting to get db levels right across tracks of varying compression (not in the lossless/lossy sense) -- currently I use wavgain and then my ears for fine tuning -- can you see your app having a role in this kind of process

It could play a role for that, yes, although I would imagine that for pop masterings wavgain would give you great results. I'd love to hear from you as to which of the two tools matches your perceptions the best about the dynamic range. Certainly pfpf is more (over)engineered for that purpose, but it's entirely untested as to if it performs better

Hell, you can just download it from the FTP site. I'm mostly just deferring to the Web interface for deciding which installer to use, since my first pick was wrong.

I broke down and uploaded a pfpf zip with an installer, including a runtime. I haven't tested it, I just hit "build". It's 64MB. Don't overdownload it

Ah yes, that works much better. (Note to others: if you installed the "miniature" runtime, uninstall it first, otherwise it won't get replaced.)

Your default timescales for long and medium are somewhat longer than mine, I think. So my averaging-meter is coming in somewhere between your long and medium term measurements, and my peak-meter somewhere between your medium and short term measurements. With that said, we're getting respectably similar-shaped graphs, I think.

Because I'm measuring things in a different way, I get a kind of DC-offset on my medium-term graph, which I also factor into my measurements. This has the neat side-effect of eliminating the enormous negative spikes I see on your graphs, though I get (smaller?) positive spikes instead. I don't think the human ear is as sensitive to sudden decreases in amplitude as it is to sudden increases, which is why I am comfortable with using a 300ms/99% decay rate on both meters.

Unfortunately, it's very difficult to read anything from your medium-term graph, for several reasons. Probably the biggest difference to usability would be if the X-axes for all three graphs were linked, so that it was easier to zoom in on the detail. It would also be neat to listen to the track while watching meter needles, as an engineer would - perhaps I can write a tool to do that in Linux.

I've been trying to find out what ITU-R1770 actually is, in detail, but all of the useful free links I can easily find seem to have gone dead. Any pointers here?

Your default timescales for long and medium are somewhat longer than mine, I think. So my averaging-meter is coming in somewhere between your long and medium term measurements, and my peak-meter somewhere between your medium and short term measurements. With that said, we're getting respectably similar-shaped graphs, I think.

Yeah, I think the main differences are going to reside in how transients are handled, so the overall graphs are going to be really similar.

Quote

Because I'm measuring things in a different way, I get a kind of DC-offset on my medium-term graph, which I also factor into my measurements. This has the neat side-effect of eliminating the enormous negative spikes I see on your graphs, though I get (smaller?) positive spikes instead. I don't think the human ear is as sensitive to sudden decreases in amplitude as it is to sudden increases, which is why I am comfortable with using a 300ms/99% decay rate on both meters.

That is a very good point - you could make a convincing case for this sort of asymmetry based solely on temporal masking. I suppose I could implement that in a windowed fashion by shifting the window forwards in time a bit, but exponential decay is certainly easier (and potentially more accurate).

Quote

Unfortunately, it's very difficult to read anything from your medium-term graph, for several reasons. Probably the biggest difference to usability would be if the X-axes for all three graphs were linked, so that it was easier to zoom in on the detail. It would also be neat to listen to the track while watching meter needles, as an engineer would - perhaps I can write a tool to do that in Linux.

Hmm, I thought I added code to link the X-axes together - I'll need to revisit that. Doing live playback is reasonable enough.