Compute root-mean-square (RMS) value for each frame, either from the
audio samples y or from a spectrogram S.

Computing the RMS value from audio samples is faster as it doesn’t require
a STFT calculation. However, using a spectrogram will give a more accurate
representation of energy over time because its frames can be windowed,
thus prefer using S if it’s already available.