Is not some wavelet/filterbank transform more relevant than the FFT for comparing with human hearing? If the time-frequency space is sparse, would it not be possible to guess at time/frequency speculatively (incorporating knowledge about the waveform), and produce better results than a general approach could ever do?

I attended a speech by Richard Lyon on models of human hearing. He was very critical of our tendency to approximate it using linear systems, when non-linear components seems to be integral to the whole thing.

Is not some wavelet/filterbank transform more relevant than the FFT for comparing with human hearing?

Yes, there's no reason to limit yourself to FFT. The most advanced psymodels don't use them exactly because of that reason, they use QMF filterbanks or similar. (This already implies that what's in that article isn't so shocking as you'd think)

Yes, there's no reason to limit yourself to FFT. The most advanced psymodels don't use them exactly because of that reason, they use QMF filterbanks or similar. (This already implies that what's in that article isn't so shocking as you'd think)

FFTs, MDCTs, QMFs and other filter banks are all fundamentally bound by the uncertainty principle: the product of the frequency resolution and time resolution cannot be smaller than 1. This is the case for any non-parametric model/transform, i.e. when you don't make any particular assumptions about your signal. There are however parametric models one can use. The best example is a model where you directly fit sinusoids of arbitrary frequencies (as opposed to Fourier, which uses sinusoids of predetermined frequencies). With such a model, the resolution is only limited by practical concerns like noise, other sinusoids, and modulation effects. As a trivial example, if you give me three samples and promise that they represent only a single sinusoid (no noise or modulation), then I can calculate the exact frequency of that sinusoid. So in theory, sinudoidal modeling solves all the time-freq issues of the FFT. The only problem is that it's damn hard to use, especially when it comes to having a good enough analysis. And that's why we don't don't have any high-quality sinusoidal-based audio codecs.