Now that kode54 has released his lovely scanner for foobar2000, I've been playing around with the two, trying to come up with some subjective characteristics that RG and R128 handle differently.

The first big thing is sub-bass! RG routinely rates sub-bass-heavy music as quieter than R128. This makes sense, given that RG is driven by equal-loudness contours. I wonder about the validity in the context of the electronic dance music scene, however. Though the equal loudness contours are probably accurate for pure audibility, the reality is that sub-bass is perceptible in more ways than just listening!

http://www.youtube.com/watch?v=iz_IVmxKKdw -- This track, one of my favourites of 2010, has nearly 4dB difference between R128 and RG. I'm pretty confident it's due to the difference in the way the two algorithms perceive bass. The track is driven by a deep sub-bass melody with minimal high-frequency content.

I know that this is a quick and dirty "analysis", but I wanted to open the floor for people doing comparisons between the two. I'm quite excited to see some competition in this field.

In general, the two algorithms seem to be very strongly correlated. Differences of <1dB are pretty much routine on the music I've tested so far.

I recently did a large analysis on a 45,000 track test database. I'll extract the outliers (tracks with very high differences between RG and R128) and do some further subjective analysis. I also have an over-representation of electronic dance music so this should help with your concerns, Canar.

It'd also be interesting to see how much "better" ReplayGain could be with 66% window overlap and gating (removing silent windows)

I'm not sure I'd use the word "concerns" to describe my emotions. I'm excited that we've got a new tool from a different party to help fight the Loudness War with. I'm interested in learning how they differ subjectively. The objective differences are well-documented.

The couple of studies posted in the other thread suggest that EBU R128 matches human perception slightly better than ReplayGain.

Thanks for the graph lvqcl - that helps to visualise the situation really well.

It would be nice to have a similar graph with human perception on the x axis, and calculated values (two sets: R128 and ReplayGain) on the y axis. I don't know if the data sets are available though - the papers I saw just gave overall conclusions.

I currently have no evidence nor did I do any comparison between RG and EBU R128 yet, but I noticed about RG that it gets beaten by compression sometimes (heavy guitars + compression). But it's also not unusual for acoustic (voice and accoustic guitar) albums to pop out being too loud.

I'm planning to pick out some problematic albums where I noticed a big difference in the perceived loudness and let a EBU R128 scanner reevaluate them.

Has anyone listened to these tracks to try and determine which normalization value better matches subjective loudness?

For very sparse recordings, I can see where RG might be fooled. If less than 5% of the program is at "foreground" level, RG's histogram behavior will cause it to normalize to the background level. The gate behavior in R128 will cause it to normalize to the foreground level regardless of how sparse it is.

As for the doom genre, both RG and R128 use a high pass filter in their weighting. R128 listens to more of the bass than RG. RG will normalize bass-heavy material like this to higher levels than R128.

Interesting! So a new audio analysis utility is available eh? Alright so let me ask you this if I want to use this instead of RG and then later convert audio tracks with R128 applied how exactly can I do that? Do they use the same tags?

Interesting! So a new audio analysis utility is available eh? Alright so let me ask you this if I want to use this instead of RG and then later convert audio tracks with R128 applied how exactly can I do that? Do they use the same tags?

Let me search the forum for R128 for you and write a one page executive summary, boss.It will be on your desk by 4pm.

ReplayGain makes Noto sample way too loud. Those high frequency noises shatter ears and most likely brain too. R128 scanner makes it a bit quieter than other tracks but closer to proper loudness (for my ears). Ryoji sample sounds closer to other tracks with ReplayGain when wearing headphones but with speakers + subwoofer it doesn't sound loud at all - it just makes everything shake in the house. I can't really compare the loudness with my equipment.

Hm, maybe both could stay?I uploaded one more track sample from Kerne (track 3), for easier comparison of track gains within release

Attenuation is also noticeable in other genre I posted about above. It's like OP notice that "sub-bass-heavy music" is much quieter with R128, and then whole textures/layers being hardly noticeable where they should be - in some dark ambient (and similar genres) releases, where I prefer RG

I should note that I don't have high-end speakers and for particularly demanding music or comparisons I use headphones (also not very high-end - MDR7506)

I doubt romor is comparing 2 versions of the same song (Song A-RG vs. Song A-R128); rather he's saying R128 overly attenuates that genre so it doesn't fit so well with other songs in his collection that R128 has processed.

Afterall that's the purpose of equal perceptual loudness (R128 and ReplayGain). romor's statement seems a fair one to me.

I've listened to your samples. Thanks for posting. In addition to the two issues I mentioned above, I note that there is some extreme high-frequency content in a couple of these samples. RG's weighting filter rolls off high frequencies. R128 does not. RG is going to normalize to the non-high-frequency content under the assumption that most people can't hear very well above 10 kHZ. R128 uses a simpler filter. The fact that it is sensitive to extreme high frequencies is balanced by the fact that "normal" program material doesn't include much in the way of extreme high frequencies.

In general, since these pieces are alien to most people, you probably will not get a consistent subjective determination as to their loudnesses. The difference in modeled loudness between RG and R128 is therefore to be expected. These results actually gives me increased confidence in both models - the two models give similar results on "normal" material for which they were designed. They diverge when you get out here in the fringes.

Similarly, the contribution of low frequencies to perceived loudness is an area of active research. There is not likely to be a single right answer. BS.1770 does not currently even include the LFE in assessment of loudness of surround sound. Perception of loudness in low frequencies is minimally dependent on listening environment (you can't hear anything down there in a car) and reproduction equipment. The fact that there is divergence in the models in this area is not at all surprising.

I doubt romor is comparing 2 versions of the same song (Song A-RG vs. Song A-R128); rather he's saying R128 overly attenuates that genre so it doesn't fit so well with other songs in his collection that R128 has processed.

Afterall that's the purpose of equal perceptual loudness (R128 and ReplayGain). romor's statement seems a fair one to me.

I did not mean to be flip with my comment. Perhaps I was a bit terse. Romor said he preferred the RG normalized (louder) version. It doesn't matter if this is in the context of comparison to the R128 normalized (quieter) version or in the context of a play list of Rolling Stones hits. Verifying a model is about matching levels not about preference. A reasonable way to assess these models is described in the Swedish Radio study. You switch between a test sample and reference sample and adjust a fader until the two have the same apparent loudness.