I understand that compression works by exploiting certain psycho-acoustical characteristics of our auditory perception, but I'm still confused. I mean ultimately, the waveform must be recorded regardless of what has been done to do it, and I was under the impression that all waveforms were born equal? Does removing a band of bits needed to store that information?

4 Answers
4

Random data is hard to compress because it is random, data with a pattern is easy to store. Audio data itself appears fairly random at first, but in actuality there is a pattern to it. Additionally, when you allow certain details to be altered in ways that are not that noticeable to listeners, you can make it fit a pattern that is far easier/smaller to store.

Audio compression works by making alterations where the loss of detail won't be noticed in order to conform the waveform to a pattern that can be more easily compressed.

that, while true in general, does not apply to psychoacoustic compression. the best lossless data algorithms only achieve up to 50% (in very rare cases) reduction in size, after optimising of the order in which they walk the data (rar -mc6a+ anyone?) - mp3 easily does 90% or more and includes traditional compression at its final stage before storing the data.
– georgiApr 14 '14 at 21:27

@georgi - that is what I am talking about when I'm referring to altering certain details a little bit to make it fit a pattern better. Psychoacoustics alter the data so that it can be much more heavily compressed while not reducing the meaningful detail. You can choose to think of it as "saving the important parts" but it still comes down to making a simpler way to represent the data which always comes down to patterns. You just alter how close of a match is good enough. That's the basic foundation of all lossy compression.
– AJ Henderson♦Apr 14 '14 at 21:31

"psychoacoustics" do not alter the data for the purpose of better compressing it afterwards. they transform the original sample sequence into data that can be utilised to reconstruct the signal by feeding it back into complex math. there is much more discarding of data rather than reshuffling or reshaping the original. the compression achieved afterwards is because altogether different data is being compressed.
– georgiApr 14 '14 at 21:50

@georgi - and what is the goal of that complex math? To make the sound smaller by finding patterns in it, but you disregard certain details that don't matter in order to make it fit better. I'm simplifying it to an extreme, but it is still effectively what MP3 and JPEG and h.264 all do. It's just more careful about where you remove detail. The exact details of the simplification vary but I repeat all compression is fundamentally finding patterns and altering signals so that they can fit patterns. It is impossible to compress without a pattern because random data is uncompressable.
– AJ Henderson♦Apr 14 '14 at 22:26

And yes, a very large amount of data is "thrown out" because it can nearly be represented by nearby information, to the extent that it matters. In IPB frame video it is by working on nearby frames, in psycho-acoustics it is by discarding information less necessary to human perception of sound, but it is still fundamentally forming a waveform at the end of the processing and that waveform has to be compressible because it is still able to be mapped to an uncompressed waveform. Thus, the entire thing is discarding information that makes it fit a (very complex) pattern.
– AJ Henderson♦Apr 14 '14 at 22:31

The gist is: as long as any signal can be represented as the sum of sine waves (Fourier's work), and as long as you can deconstruct any signal back to sine waves (FFT, DCT math), then you can use a different way of encoding just the data that's required to reconstruct an approximation the signal, rather than linearly store every single sample of it. In the process, you can throw away a significant portion of it the data and trust the brain to reconstruct the reality. The signal is split in temporal frames over which the math happens so it's not the same math all the time. Depending on the compression method, the frames can vary in duration. (1152 samples for mp3 otherwise)

For a halfway between no-compression and perceptual coding, check out ADPCM, which only samples the difference between consecutive samples with much smaller bit resolution. 4-bit IMA ADPCM has beautiful noise artefacts.. almost vinyl-like.

Suppose you want to save the following sequence of numbers:
0123456789 0123456789

You can store those numbers individually, which would be equivalent to how uncompressed .wav or .bmp works.
But, if you agree on certain rules and define those in a certain format (like mp3), you can also save these numbers without explicitly writing down the value of each number:
Say we agree that r(0,9,1) means a linear ramp from 0 to 9 in steps of one. suddenly I only need to save 4 values to represent 10: r091 now gets me exactly the ramp. r091 r091 gets me the sequence I had before, using 8 instead of 20 characters.

So far, that's lossless encoding: the exact same data can be retrieved after decoding.

Now, suppose we know it's sufficient to build a ramp up to 7, because the values 8 and 9 hardly have any influence on the way the ramp works. Now we can save the values of the ramp as 3 bit values (2^3=8 spaces, just enough to save the number range from 0 to 7), where we needed 4 bit slots with 16 possible values before, because we had to save the number range 0 to 9. We just saved one bit for each value that is stored.
While this might be OK because the values 8 and 9 are not that important in this case, the original sequence can never be reconstructed again as soon as this information is deleted: lossy encoding.

Psychoacoustic compression (mp3 for example) works by removing the parts that you don't hear. That means if there is a loud signal you won't hear a quiet signal that is shortly after. For that time you do not need to save information.

Same thing frequency based. If you have a very loud frequency, then you won't hear a quiet frequency nearby. So you have fewer frequency to save...