The interesting thing about this is that it has a very flat tail, much flatter than you might
expect. For example, if the probability of a given coefficient being zero or nonzero was an indepedent
random event, the distribution would be binomial; it peaks flatter and is then much faster to zero :

What this tells us is the probability of a given coefficient being zero is highly *not* idependent. They
are strongly correlated - the more values that are on, the more likely it is that the next will be on. In
fact we see that if there are 6 values on, it's almost equally likely there are 7, etc. , that is :
P(n)/P(n-1) goes toward 1.0 as n gets larger.

Also, amusingly the first two ratios P(1)/P(0) and P(2)/P(1) are both very close to 0.5 in every image I'm
tried (in 0.4 to 0.6 generally). What this means is it wouldn't be too awful just to code the # of values on
with unary, at least for the first bit (you could use something like an Elias Gamma code which uses unary at
first then adds more raw bits).

Now for pretty pictures. Everyone has seen graphics like this, showing the L2 energy of each coefficient
in the DCT : (none of these pictures include the DC because it's weird and different)

This shows the percentage of the time the value is exactly zero :

Now for some more interesting stuff. This shows the percent of correlation to the block above the
current one : (to the north) :

Note in particular the strong correlation of the first row.

The next one is the correlation to the block to the left (west) :

Finally the fun one. This one shows the correlation of each coefficient to the other 63 coefficients
in the same block :

The self-correlation is 100% which makes it a white pixel obviously. Black means 0% correlation.
This is absolute-value correlation in all cases (no signs). There are a lot of patterns that
should be pretty obvious to the eye. Beware a bit in over-modeling on these patterns because
they do change a bit from image to image, but the general trend stays the same.

And another one from a different image :

This one's from Lena. A few things I think are particularly interesting - in the upper left area, which
is where most of the important energy is, the correlation is most strong diagonally. That is, you see
these "X" shape patterns where the center pixel is correlated mainly to it's diagonal neighbors, not the
one's directly adjacent to it.

I assume the grayscale values are in linear scale, i.e. 50% gray = a correlation with magnitude of 0.5?

In any case, this is really interesting. The DCT is really obviously very far off from the optimal decorrelating transform for these images. It'd be really interesting to see how these images look for prediction residuals when doing H264-style coding.

"aren't things like the number of zeroes going to depend heavily on the choice of quantizations? which are you doing here?"

I'm always doing uniform quantization here BTW.

Changing quantizer absolutely changes the # of zeros, but it shouldn't affect whether a value being zero is an independent event or not.

My comment about the typical cases is based on my belief that this type of coder should only be used in the 0.25 - 1.0 bpp range.

"also, on the PCA blog post, the same issue crossed my mind: if you found the optimal transformation to some arbitrary basis, how would you quantize in that basis?"

Well you're giving up any attempts at perceptual quantization at that point. Uniform quantization in any orthonormal basis is equivalent in an R-D sense.

There is the issue that when you quantize you wind up seeing the quantization shapes. How that turns out exactly in human perceptual error is hard to say (though personally I find the claims that DCT are good in this sense to be quite dubious).