The Ogg Opus draft specification currently with the IETF, expiring in January 2013 (and still open to amendment and comments) states that for the Comment Header, R128_TRACK_GAIN is introduced, based on the EBU R128 specification, and that it specifies the gain relative to the ID header's mandatory Output Gain field.

It is sensible to use Output Gain to match an intended level amounting to the same thing as Album Gain, again based on R128 techniques, which should be a little more accurate than ReplayGain's original algorithm and should deal better with sparse audio, such as dialogue or Audio Description for the visually-impaired thanks to gated measurement.

Output Gain SHOULD be implemented by virtually all players (a strong recommendation), but MAY be ignored, and R128_TRACK_GAIN may be implemented in addition depending on the mode required (e.g. some players may default to applying TRACK_GAIN when in shuffle mode). If Output Gain is modified, R128_TRACK_GAIN value MUST be modified accordingly if it is present (as it is applied after Output Gain and must still be correct).

The Comment Header "OpusTags" section says that to avoid confusion, normalization schemes other than R128 SHOULD NOT be used (they can, but are not recommended rather strongly). This does not mean that the intersample-compatible peak measurements of R128 cannot be used, but it seems a shame that a standardised comment tag name for R128_TRACK_PEAK and R128_ALBUM_PEAK has not been included, so as to prevent incompatible implementations (particularly in their number format) from being produced in different software.

I'd imagine that both PEAK values in the normalised domain should be calculated according to the R128 recommendation (with the same degree of oversampling to catch intersample overs) as if the audio's Output Gain were ignored (set to zero), and Track Gain was unused. The gain values could then be converted to effective peaks after whatever processing occurs (including volume controls etc).

Personally, I can manage without PEAK values as it's sensible to me to match Album Gain as my first priority and use something like fb2k's Advanced Limiter to manage any excessive peaks that remain and thus prevent hard clipping with negligible audible effect. If often also use a negative PreAmp gain or Volume control setting, meaning that Advanced Limiter less often needs to do anything.

I believe many of us on these forums are rather afraid of anything that adds distortion from time to time, possibly excessively afraid, given that it's usually transient, noise-like peaks which are likely to mask distortion that will trigger the Advanced Limiter (and would be hard to ABX), and that if they really cared about distortion, they should play back the digital at reduced levels with more headroom. About 10 years ago, I did some quick ABXing of pure clipping distortion (not even a soft limiter) on a Rachmaninov piano concerto via LAME (with peak of around 1.2 at the old equivalent to -V2) versus lossless and couldn't tell, though that's clearly not conclusive, and in fact the lossless might have still had intersample overs (all peaks were only a few samples long).

I believe many many more people use Prevent Clipping According To Peak than do what I do, and they let the volume become incorrect instead, so it seems there will be demand for this feature and it seems very sensible to acknowledge actual user behaviour (what is surely going to happen as evidenced by questions in these forums, rather than what ought to be necessary and sensible) to specify officially how the OpusTags Comment Header SHOULD be filled if implementers choose to implement R128_TRACK_PEAK and R128_ALBUM_PEAK and how it's calculated (i.e. refer to EBU R128) and do so before mutually-incompatible implementations spring up, while clearly indicating that support is not mandatory. Interoperability is important, and it's been specified with gapless support and R128 gain, so it's surely worth reporting peaks. Specifying the intersample peaks and promoting it as part of the normal tagging of music might encourage high-end audio hardware designers to provide sufficient linear headroom beyond digital full scale to sell it as a feature.

Being a result of oversampled measurement, the precise PEAK values produced by different tools will vary by small amounts, but ought to provide engineering headroom to allow clipping distortion to be prevented in the digital or analogue domains by competent design (and in the analogue domain, even ReplayGain hasn't achieved that, given that intersample overs still cause distortion in many high-end audio devices*)

*source: Thomas Lund talk on Loudness Wars and distortion for broadcast audio pros, which later in the talk sounds dangerously non-TOS#8 for a while when listening to SIDE channel only of Mid-Side lossy encodes, but ends up advising against lossy in the production chain - transcoding - which seems sensible, while saying it's fine for distribution, without really quashing the myths.

I believe many many more people use Prevent Clipping According To Peak than do what I do, and they let the volume become incorrect instead, so it seems there will be demand for this feature and it seems very sensible to acknowledge actual user behaviour

I haven't seen any evidence of this and normalizing lossy audio by peaks is demonstrably a bad idea: The peak value is a highly noisy metric and the lossy compression (esp in Opus because it is able to get away with very few bits in some parts of the signal) can cause wild swings in the peak value by several dB and can cause consecutive tracks to be wildly different in apparent volume.

Also, any processing of the audio changes the peak. This means that setting the peak on an Opus file requires separately decoding it to get the peak and it may make the peak meaningless if the playing device does any filtering or equalization.

Moreover, Opus (like most of the lossy music formats) does not have a bit accurate specified decoder and differences in the decoder can change the peak value. Even in the reference implementation you can get back different peak values depending on if use a fixed-point or floating point compiled decoder or especially if you use the float or 16 bit API (which is clamped). Opus itself can also decode at different sampling rates than the original if the caller requests it to do so, and this obviously changes the peak values. (As well as the EBU loudness values, though unless you go all the way to 8kHz output it usually doesn't change them much).

Your points about intersample clipping are quite important and they highlight why 'almost right' is simply _not enough_ for clipping prevention.

To prevent clipping what a player should be doing is (1) making use of normalization (either at the track or album level) that sets the output levels to a point which has ample headroom from clipping, and (2) placing a limiter with up-sampling based detection at the end of the processing chain. Sane normalization makes sure that the limiter will be fairly inactive and transparent, and the limit prevents clipping of the actual signal. If a player does not do these things even if it uses the PEAK levels it is not guaranteed to avoid clipping, and if it does these things the PEAK levels are unnecessary.