Both Canon and Nikon have introduced a finer level quantization of the sensor signal
in digitizing and recording the raw data, passing from 12-bit tonal gradation in
older models to 14-bit tonal depth in newer models. A priori, one might expect this
transition to bring an improvement in image quality -- after all, doesn't 14-bit data
have over four times the levels (16384) compared to 12-bit data (4096)? It would seem
obvious that 14-bit tonal depth would allow for smoother tonal transitions, and
perhaps less possibility of posterization. Well, those expectations are unmet,
and the culprit is noise.

In the absence of noise, the quantization of an analog signal introduces
an error, as analog values are rounded off to a nearby digitized value
in the ADC. In images, this quantization error
can result in so-called posterization
as nearby pixel values are all rounded to the same digitized value.
An example is shown in Figure 16;
here, an 8-bit grayscale image of a smooth tonal gradient
was generated in Photoshop using the gradient tool. The bit depth was then truncated
to five bits (32 tonal levels instead of 256)
using two successive levels adjustment layers, the first one reducing the
output range from 0-31 from an input range of 0-255,
the one above it setting the input range to 0-31 and the output range from 0-255.
Mouse over the gradient to see the posterization that results from bit depth truncation.

Fig. 16 - A gradient from 0-255 in 8-bit grayscale.
Mouseover to see the posterization that results from truncating the bit depth
to five bits.
In the absence of noise, having 8 times fewer available levels
is readily noticeable.

These two successive levels adjustments first compress the tonal range, truncating the
lowest three bits; then the second levels adjustment
layer restores the range of the gradient from
black to white on your monitor. The histogram of the 8-bit and 5-bit gradients is shown
as an inset in the figure. Clearly, in the absence of noise one wants the
highest possible bit depth in order to have the smoothest possible tonal gradients.

Now consider the effect of noise on the tonal transitions. Noise amounts to random jumps
of tonality between neighboring pixels; the bigger the noise, the
larger the random jumps are. If the jumps are larger than the interval
between successive quantized values of tonality, posterization effects will be
rendered imperceptible by the noise -- the stepwise transitions of posterization
(of magnitude 8 levels in this example) can't be discerned if the level is
on average randomly hopping around between any pair of adjacent pixels
by more than the spacing between levels
(in our example, the random jumps are by 12 levels on average).
The smoothing of transitions through the effect
of noise, sacrificing spatial resolution for smooth but noisy transitions,
is known as dithering.
Figure 17 shows what happens when noise of standard deviation 12 levels is added to
the smooth tonal gradient before the bit truncation to 5-bit tonal depth
(where the smallest allowed jump in tonality is 8 levels on the 0-255 scale).
The noise is more than sufficient to eliminate any trace of posterization.

Fig. 17 - A gradient from 0-255 in 8-bit grayscale,
with added noise of width 12 levels.
Mouseover to see whether posterization results
from truncating the bit depth
to five bits. Can you tell that one image has eight times fewer levels than the other?

Posterization can occur in the presence of noise, but only when the quantization step
is substantially larger than the noise. Figure 18 shows the effect of decreasing
bit depth on the noisy gradient, until eventually the noise is substantially smaller
than the step size. Posterization becomes apparent when the quantization step
sufficiently exceeds the width of the noise, the random jumps in tonality due to
the noise are no longer able to dither the discrete jumps due to quantization.

It is important that the noise is present before quantization of levels,
as it is in the processing of the signal from the camera sensor, where the
noise is contributed by the sensor electronics and by photon statistics, before
the signal reaches the analog-to-digital converter. Attempting to dither
tonal transitions through the addition of noise after quantization
is much, much less effective, as one may readily verify by altering the
order of bit truncation and noise addition in the above example.
If the reader would like to tinker with the effects of noise and bit depth
in creating smoother tonal transitions, the layered Photoshop .psd file used
to generate the figures 16-18 may be found
here.

Quantizing the signal from the sensor in steps much finer than the level of
the noise is thus superfluous and wasteful; quantizing the noise in steps much
coarser than the level of the noise risks posterization. As long as the noise
exceeds the quantization step, the difference
between the coarser and finer quantization is imperceptible.
As long as noise continues to exceed the quantization step in post-processing,
it doesn't matter how one edits the image after the fact, since
any squeezing/stretching of the levels also does the same to the noise,
which will always be larger than the level spacing no matter how it is
squeezed or stretched.
On the other hand, quantizing the signal in steps coarser than the noise
can lead to posterization.
Ideally, the noise should slightly exceed the quantization step, in order
that roundoff errors introduced by quantization are negligible, and that
no bits are wasted in digitizing the noise.

Raw data is never posterized. That does not mean, however, that
posterization cannot arise through raw conversion and post-processing.
The condition for the absence of posterization is that the noise
exceed the quantization step; we have seen that compression of levels
and bit truncation can reintroduce posterization (for instance, Figure 18
was generated by compressing levels more and more), and levels/curves
adjustments as well as gamma correction involve compression of levels
in some part of the histogram, effectively doing a levels truncation
in some exposure zones. Resampling involves
averaging over neighboring pixels, which can reduce the level of noise
below the quantization step; noise reduction has a similar effect.
Finally, 12- or 14-bit image data are displayed or output on 8-bit
devices, implementing a substantial bit truncation. All these effects
can introduce posterization where it didn't exist previously.
When posterization does arise, one must reconsider the processing
chain that led to it and try to find an alternative route
that avoids it. The main point of emphasis here is that
the bit depth of the raw data is never the culprit.

As an extreme example of how post-processing can undo the dithering effect
of noise, consider applying a radius 5 median filter on the bit-truncated (5-bit)
tonal ramp of Figure 16:

Fig. 17a - A median filter of radius 5 filters the noise and posterizes
the tonal gradient of Figure 17.

The median filter is almost designed to introduce posterization.
In an ordered list of numbers, the
median is the middle number
of the list (or the average of the two middle numbers if the list is of even length).
For instance, the median of the list {4,8,8,12,16} is 8.
The median filter in image processing replaces a given pixel's value with the median
of all the pixels within the chosen radius of that pixel.
In the bit-truncated gradient plus noise (the mouseover of Figure 17),
noise fluctuations trick the eye into seeing in-between tonal values
by virtue of the percentage of pixels having the nearby quantized values;
our perception averages over fluctuating pixel values {4,8,8,12,16} and sees the average
4+8+8+12+16/5=9.6. But the median filter takes the middle number of the list, 8,
thus assigning the nearby quantized value to the region instead of the average.
Running the median filter over the entire image
restores the posterization by restoring the same quantized value over a sizeable region.
Median filtering is a simple and common filter for noise,
and used in some raw conversion algorithms (for instance, the freeware raw converter dcraw)
because it removes the effects of hot pixels and extreme outliers in the noise distribution,
which are objectionably noticeable to the eye (if the top number in the above list
had been 64 instead of 16, the average would be raised quite a bit;
but the median would still be 8).
So the median filter indeed filters the noise,
but brings back posterization at the same time.
More sophisticated algorithms can filter the noise without introducing posterization.
Perhaps a lesson to take away here is that, while bit depth beyond S/N ratio doesn't convey
more image information, it may be more robust against deficiencies in raw conversion algorithms.

One might be concerned that while smooth gradients do not suffer under bit truncation
in the presence of noise, details do. Again this is not a problem.
Figure 19 shows the text+gradient introduced in the discussion of dynamic range.
Recall that the noise was four raw levels, so truncating the last two bits (down to 6-bit
tonal depth) should have essentially no impact on the legibility of the text.
To see whether it does, mouse over the image to reveal the original 8-bit image,
mouse off to return to the 6-bit version. Throwing away the noisy bits has little
or no effect on the ability to extract the detail in the image.

Fig. 19 - The gradient plus detail introduced above,
truncated to 6-bit tonal depth. Mouseover to compare to the 8-bit version.
The 6-bit image is very slightly darker in places, since the bit truncation can
result in slight shifts of the average tonality.
Can you tell that one image has four times fewer levels than the other?

The above example images were manufactured rather than photographed.
This was done so that the effects being illustrated could be carefully
controlled. For instance, one cannot have a tonal gradient smoother
than a uniform, linear ramp of brightness, so this is the ideal
testing ground. One of the major
claimed improvements of 14-bit tonal depth was to allow smoother
tonal transitions, but the increased bit depth does not help
in the presence of noise. Nevertheless, the reader may have a nagging
suspicion that somehow, real images might show improvement while
these manufactured examples don't. To dispel that notion, consider Figure 20,
a crop of a 1D3 image which was deliberately taken six stops underexposed.
The purpose of the underexposure was to move the histogram down six stops,
which means the highest six bits of the 14-bit raw data will be unused; the
lowest eight bits are where all the image data reside,
and that can be accurately displayed on our 8-bit computer monitors.

One of the two green channels of the raw data was extracted, and the tonal range
1020-1275 was mapped with a levels adjustment to 0-255 on the 8-bit scale
(recall the bias offset of the 1D3 is 1024). The image looks a bit dark
because no gamma correction has been done. The lowest two bits of the
image data have been truncated -- effectively the 13th and 14th bits,
the two least significant
bits of the original 14-bit image data, have been removed.
This bit-truncated image is what one would have obtained had the 1D3 been
a 12-bit camera. Mouse over to compare this effectively 12-bit 1D3 image
with the 14-bit original. Again it seems there is little to choose between the two.
The reader is again invited to play with the full image and its bit truncation;
feel free to download the
6-bit and 8-bit
versions of the cityscape and play with them.

Fig. 20 - Green channel of a
6 stops underexposed 1D3 raw image, showing the lowest 8 bits = 256 levels
of the 14-bit data
after the last two bits have been truncated.
The 1D3's noise exceeds four levels, so this should cause no image
degradation;
mouse over to compare to the image made
from the lowest 8 bits without the truncation.
The histograms are inset at the lower right.

Curiously, most
14-bit cameras on the market (as of this writing) do not
merit 14-bit recording. The noise is more than four levels
in 14-bit units on the Nikon D3/D300, Canon 1D3/1Ds3 and 40D.
The additional two bits are randomly fluctuating,
since the levels are randomly fluctuating by +/- four levels or more.
Twelve bits are perfectly adequate to record the image data without any loss of
image quality, for any of these cameras
(though the D3 comes quite close to warranting
a 13th bit). A somewhat different technology is employed in Fuji cameras,
whereby there are two sets of pixels of differing sensitivity.
Each type of pixel has less than 12 bits of
dynamic range, but the total range spanned from the top end of the less sensitive
pixel to the bottom end of the more sensitive pixel is more than 13 stops, and so
14-bit recording is warranted.

A qualification is in order here -- the Nikon D3 and D300 are both capable of recording
in both 12-bit and 14-bit modes. The method of recording 14-bit files on the D300
is substantively different from that for recording 12-bit files; in particular, the
frame rate slows by a factor 3-4. Reading out the sensor more slowly allows it
to be read more accurately, and so there may indeed by a perceptible improvement
in D300 14-bit files over D300 12-bit files (specifically, less read noise, including
pattern noise). That does not, however, mean that the data need be recorded
at 14-bit tonal depth -- the improvement in image quality comes from the slower readout,
and because the noise is still more than four 14-bit levels, the image could
still be recorded in 12-bit tonal depth and be indistinguishable from the
14-bit data it was derived from.

Another point to keep in mind is that, while the raw data is not posterized,
post-processing can make it so, as was seen in the noisy gradient example above.
This is again true for the 1D3 cityscape image. Applying a radius 5 median filter
brings out posterization in the 6-bit version, as one can see
here; the 8-bit version
is more robust against the median filter, as one can see
here. However, something as simple
as appending two bits with random values to the 6-bit version substantially eliminates
the posterization, as one can see
here.

S/N and Exposure Decisions:

A common maxim in digital photography is that image quality is maximized by
"exposing to the right" (ETTR) -- that is, raising the exposure as much as possible
without clipping highlights. It is often stated that in doing so, one
makes the best use of the "number of available levels" in the raw data.
This explication for instance can be found in a much-quoted
tutorial on Luminous-Landscape.com. The thinking is that, because raw
is a linear capture medium, each higher stop in exposure accesses the next higher bit
in the digital data, and twice as many raw levels are used in encoding the raw capture.
For instance, in a 12-bit file, the highest stop of exposure has 2048 levels,
the next highest stop 1024 levels, the one below that 512 levels, and so on.
Naively it would seem obvious that the highest quality image data would arise
from concentrating the image histogram in the higher exposure zones, where
the abundance of levels allows finer tonal transitions.

However,
the issue is not the number of raw levels in any given segment of the raw data
(as measured e.g. in stops down from raw saturation point).
Rather, the point is that by exposing to the right,
one achieves a higher signal to noise ratio in the raw data.
The number of available raw levels has little to do with the proper
reason to expose right, since as we have seen
the noise rises with signal and in fact the many raw levels available in higher
exposure zones are largely wasted in digitizing photon shot noise
(there will be more to say about this in a moment,
when we consider NEF compression).

Consider for instance the 1D3, exposed to the right at ISO 3200. Then consider taking
the same image, with the same shutter speed and aperture, at ISO 1600. The latter image
will be one stop underexposed according to the ETTR ideology. In particular,
the idea that the benefit of ETTR comes from the "number of available levels"
suggests that the image quality would be one stop worse for the ISO 1600 image
than it is for the ISO 3200 image, since by being one stop down from the right
edge of the histogram, fully half the available levels are not being used.
However, as we have seen, noise is much more than two levels in all
exposure zones at these ISO's, so the extra levels used in the ISO 3200 image
simply go into digitizing the noise, and are thus of no benefit
in improving image quality. In fact, the quality of the two images will
be very nearly the same (rather than one stop different):

The photon shot noise will be the same in photon counts for the same exposure
(aperture and shutter speed).
The photon noise is already present in the light
before it hits the sensor, and couldn't care less what ISO you happened to choose.
In raw levels, the photon noise will be half as much for the ISO 1600
shot because the gain (photons/raw level) is twice as much;
a given number of photon counts of noise translates into half the noise in ADU at half
the ISO.

The read noise for ISO 1600 (13.4 raw levels) is roughly half that
of ISO 3200 (26.2 raw levels). Thus both read noise and photon shot noise
will be half as much in ADU for the ISO 1600 image
as they are for the ISO 3200 image.

Each feature of the raw histogram of the ISO 1600 image will occur
at half the raw value it occurs at for the ISO 3200 image.

Thus the raw histogram, and the noise associated to each portion of it,
will be half as much in raw levels for the "underexposed" ISO 1600 shot as it is
for the ETTR ISO 3200 shot with the same aperture/shutter speed. Simply doubling
all the pixel raw values of the ISO 1600 shot will yield a nearly identical image
to the ISO 3200 shot; the fact that only the even raw levels are populated after this
doubling of raw values is of no consequence, since as we have seen the dithering
provided by the inherent noise of the image is more than sufficient to wash out
the effect of the finer quantization of levels in the ISO 3200 shot.

The proper reason to expose to the right comes from
figure 12
on page 2, showing the rise in signal-to-noise ratio with increasing exposure.
By increasing the number of photons captured, the S/N ratio improves, and the
image quality improves directly in proportion to that improved S/N ratio.
For instance, that ISO 1600 shot above, has one stop more highlight headroom
than the corresponding ISO 3200 shot, and (assuming the shooting conditions allow)
opening up the aperture by
one stop or slowing the shutter speed by half will improve the S/N
ratio while pushing the histogram one stop to the right.
Shooting at lower and lower ISO continues to provide more highlight
headroom, and thus higher and higher absolute exposure is possible,
allowing higher and higher S/N ratio. The end result is that
exposing to the right at the lowest possible ISO provides the highest image
quality, but not for the reason usually given.

Now, the preceding discussion might leave the impression that,
for a fixed choice of the shutter speed and aperture,
it doesn't matter whether one
has underexposed at lower ISO or exposed to the right at higher ISO.
In fact it typically does matter, but
the example above was specifically chosen such that the noise profiles
of the 1D3 at ISO 1600 and ISO 3200 are very nearly the same when the difference
in gain is accounted for, and so in that case it didn't matter which ISO was chosen
as far as noise at fixed exposure was concerned.
This is typically true at the highest ISO's; however, at lower ISO
it does pay to choose the highest ISO for which clipping is avoided.

Somewhat counter-intuitively, for fixed aperture/shutter speed,
it is best to use the highest possible ISO (without clipping highlights);
this result is consistent with the ETTR philosophy,
since using higher ISO pushes the histogram to the right
if one thinks about things in terms of raw levels (ADU).
However, the benefit from the use of higher ISO comes in the shadows,
not in the highlights where "there are more levels"; to demonstrate that
will require some more detailed analysis of the noise and S/N graphs
on page 2.

The read noise vs. ISO graphs on
on page 2
exhibit noise (as measured in raw levels) increasing with ISO.
The observed values were accurately fit by the model presented
there; in its basic form, the model gives the read noise R as

R2 = (G R0)2 + (R1)2

in terms of the noise R0 coming
from circuit components upstream of the ISO amplifier, whose ISO gain
is set to G; and the noise R1 arising from circuit components
downstream of the ISO amplifier.
The effect of increasing the ISO gain G emphasizes the contribution
of the upstream noise component, because it gets multiplied by a bigger
and bigger number the higher the ISO gain, to the point that at high ISO
the read noise is almost entirely due to upstream noises, with a
negligible contribution from downstream noises.
The noise measured in ADU is so-called output referred noise, the noise
in the output raw data after all the amplification etc that goes on in
the signal processing chain. It is sometimes useful to refer the noise
back to its input equivalent, in photoelectrons at the photosite, in order
to compare it to the photon noise or the signal, both of which are measured
in photoelectrons. This is accomplished by multiplying
by the gain g=U/G , the conversion factor between electrons and ADU; here,
U is a constant (the so-called universal gain) and G is the ISO setting.
When this is done, the opposite trend is observed --
read noise goes down with ISO in electron equivalents:

(R in electrons)2 = [ g * (R in ADU) ]2 =
U2 (R02 + R12/ G2)

The reason for this decrease is the decreasing influence of
downstream noise R1 with amplification; in referring the noise back
to input equivalents, the downstream noise gets divided
by the ISO gain, and hence becomes smaller and smaller at high ISO,
while the upstream noise didn't know about any amplification
and remains constant in input-referred units.

Bottom line: Read noise at high ISO is much smaller than read noise
at low ISO, in terms of the error in photon counting that it represents.
Thus, better image quality is obtained for using the highest ISO for which
the signal is not clipped.

One sees this trend clearly when plotting the photon equivalent of read noise
as a function of ISO:

Fig. 15a - Read noise vs. ISO for the Canon 1D3 in photoelectron
equivalents.
Points are data,
courtesy of Peter Ruevski; the curves are a fit to the read noise model.

The effect is quite
dramatic -- R1 is
more than five times larger than R0,
so when its contribution drops away,
the residual noise is more than five times smaller.
The read noise in electrons plateaus at high ISO because it stops
receiving contributions from the downstream electronics.
On the other hand, the downstream contribution is the largest one
at low ISO.

For the purpose of making exposure decisions, a variant of the S/N plots of
Figure 12 is useful. In that set of
graphs, the S/N ratio was plotted as a function of raw level. But raw levels mean different
things for different ISO, so for the purpose of deciding which ISO to use
it is more helpful to plot the S/N curves all normalized to the same exposure:

Fig. 12a - Signal-to-noise ratio of the Canon 1D3
for various ISO, as a function of normalized exposure value.

There are several ways to interpret this graph that are useful
to keep in mind for making exposure choices. If one has the
option to lower the ISO and the shutter speed (or widen the aperture),
the highest S/N for the image is obtained by increasing the exposure,
pushing the right end of the histogram right up to the upper edge of the
range of exposure on the horizontal axis. This is the usual ETTR
philosophy. Lowering the ISO one stop pushes the upper end of the
dynamic range one stop to the right in absolute exposure, and
pushing the histogram to the right climbs the rising S/N curve
to better overall image quality.

If on the other hand, one is limited by the subject matter
(freezing motion, depth of field requirements, etc) to a given maximum EV,
then it makes sense to raise the ISO to pull the top end of the camera's
dynamic range down to the top end of the histogram; this has little
benefit at that upper end, since all the curves are on top of one another
in that regime. Nevertheless it improves image quality by raising
the S/N ratio on the shadow end of the curves.

Note that the expansion on the bottom end of the range yields less and less
as ISO is increased more and more -- going from ISO 100 to ISO 200 yields
a big improvement at the lower end of exposure (at the cost of some
latitude at the upper end); on the other hand,
going from ISO 800 to 1600 doesn't make much difference at all in shadow S/N,
and in addition one loses an entire stop of raw headroom. Above ISO 1600
there is no expansion of the shadow range whatsoever, just more
and more lost from the top end. The ISO 3200 curve of the 1D3 isn't even
plotted above, because it lies almost exactly on top of the ISO 1600 curve,
apart from ending one stop earlier ot the top end.
This is why it makes no sense
to use absurdly high ISO's like 6400 if one uses raw capture --
it just throws away highlight headroom without getting
anything back at the shadow end; it's better to underexpose
by a stop or two at ISO 1600 if the shutter speed is needed,
than to use higher ISO.

It ceases to make sense to raise the ISO beyond 1600 on the 1D3 in raw capture,
because there is no improvement in read noise measured in electrons; one loses
highlight headroom with no compensating gain at the shadow end. (The bit depth
is far more than is needed given the noise level in ADU.) It is better
to underexpose at ISO 1600, since one obtains the same shadow detail
with more highlight headroom, provided the raw converter can accurately
apply the needed exposure compensation.
For the same reason, there is little reason with Canon DSLR's to use the "intermediate" ISO's
500, 640, 1000, 1250 (and only minimal benefit to ISO 125, 160, 250, and 320), since
their read noises are nearly the same as the next lowest "main" ISO of
100, 200, 400, or 800. Again, underexposure at that next lowest "main" ISO
has almost identical shadow detail in the raw data, but delivers more highlight headroom.
Nikon uses a different means of obtaining intermediate ISO's, such that the read noise
drops more uniformly with increasing 1/3 stop ISO settings.

Bottom line:
High exposure zones and/or high ISO, where photon noise
and pre-amplification read noise
dominate the noise, are rather insensitive to what ISO is chosen
once a choice of exposure is selected and care is taken not to clip highlights.
Underexposing by a stop, and doubling the raw values
in post-processing (that is, applying exposure compensation),
yields the same image quality as 'proper'
exposure under these conditions.
On the other hand,
in lower exposure zones at low ISO, where post-amplification
read noise becomes important, the read noise
goes down by a bit less than a factor of two (in electrons) when the ISO doubles.
In this situation, underexposing by a stop and doubling
the raw values in post-processing, yields more noise
than proper exposure, particularly in shadows.

By the way, underexposing at lower ISO is precisely what Canon cameras do
in the raw data when Highlight Tone Priority (HTP) is enabled; and what
Nikon cameras do when Active D-Lighting (ADL) is enabled.
Instead of using the ISO gain set by the user, the camera uses
a lower ISO (but exposes with the indicated aperture and shutter speed),
effectively underexposing the image; this provides more highlight
headroom. In post-processing, the image data can be brought back up
while preserving the highlights with a modified tone curve
in higher exposure zones. The place where image quality suffers
is in shadows at lower ISO, precisely as the above quantitative
model predicts.

An aside on "lossy" NEF compression:

The fact that level quantization much finer than the noise is superfluous,
since noise erases the perceptual impact of abrupt tonal transitions, means that many
of the levels used to record higher exposure zones are unneeded and indeed wasteful --
photon shot noise is much larger than the level spacing in midtones and highlights
on a typical DSLR.

That is, if the average photon count is 10000, there will be fluctuations
from pixel to pixel of as much as sqrt[10000]=100 photons in the sample.
Suppose each increase by one in the raw level corresponds to counting ten more photons
(i.e. the gain g=10 photons/ADU);
then noise for this signal is 100/10=10 raw levels. The linear encoding
of the raw signal wastes most of the raw levels, since the noise
is much larger than the quantization step.

In shadows, it's a different story. Suppose our average signal is 100 photons;
then the photon fluctuations are sqrt[100]=10 photons, which translates
to +/- one raw level for the supposed gain g=10.
At low signal level, none of the raw levels are "wasted"
in digitizing the noise.

Ideally, what one would want is an algorithm for thinning the level spacing
at high signal, while keeping it intact for low signal, all the while keeping
the level spacing below the noise level for any given signal (to avoid posterization).
Nikon's NEF "lossy" compression uses a lookup table to do just that, mapping raw levels 0-4095
(for 12-bit raw) into compressed values in such a way that there is no compression
in shadows, but increasing thinning of levels for highlights,
according to the square root relation between photon noise and signal.
The uncompressed digitized raw value is rounded to the nearest
value in the lookup table. The amount of compression increases as the
uncompressed raw value increases above some threshold, in the same way
that photon noise increases with illumination level.

Fig. 21 - Plot of a lookup table for NEF compression
(the particular lookup table is model-dependent, but all have
this basic structure).
The horizontal axis is the NEF compressed raw value; the vertical axis is the
uncompressed 12-bit raw value that it corresponds to.
Raw values that do not
correspond to a compressed value are rounded off to the nearest compressed value.
The blue curve is a plot
of the compression table, the red curve is the best fit
parabola to the nonlinear part of the compression table, as suggested by
the growth of photon noise masking level transitions.

The idea behind "lossy" NEF compression is thus to thin the levels in highlights
where levels are too finely distributed relative to the inherent noisiness of light,
but keep most or all of the levels in shadows where the magnitude
of the noise fluctuations is smaller. This is why the NEF compression table
is linear at the low end, and quadratic further up. The quadratic part
is a reflection of the quadratic relationship between noise and signal of light
(the noise squared is proportional to signal) --
the higher the signal the more levels can be thinned out. The linear part
of the compression table kicks in when the signal is low enough that
the noise fluctuations are less than or of order one raw level --
then no compression is possible and the output value must equal the input value.

Quantitatively, the appropriate criterion is that a jump in the compressed
value C by one should correspond to a span of raw values S
that is less than the noise (which is at least as big as the shot noise
sqrt[S/g]), so that the noise continues to dither the larger
jumps of the compressed image data where levels have been eliminated.
In other words,

dC > dS/sqrt[S/g]

A bit of calculus then leads to the relation between compressed values
and raw values

S < [(C-C0)2]/4g

where C0 is a constant offset equal to half the length of the linear
part of the table, where compression is not possible.
The best fit parabola to the nonlinear part of the lookup table,
which is the red curve in Figure 21,
has C0=134 and g=18. The criterion that the jumps
in compressed values not exceed the photon noise is that this value of
g should be more than the actual gain of the camera at its
base ISO. In fact, the typical base ISO gain for the Nikon models that use
this particular lookup table is about 6-8, so the
condition is met with a good margin of safety.

Bottom line:
The "lossy" form of NEF compression is a clever use of information theory
to save space by eliminating redundant raw levels. The noise which
is unavoidably present in light effectively dithers tonal transitions
so that the compression is lossless in that the image is still encoded
without loss of visual information. In this sense, "lossy" compression is
perhaps an inappropriate appelation.

Amusingly, Nikon engineers seem to have forgotten the logic behind
the thinning of raw levels when upgrading to 14-bit tonal depth --
the NEF compression table has roughly four times as many entries (2753) for
the 14-bit table as it does (689) for the 12-bit table in the
D3 and D300, even though there is no purpose to the extra values
given the relation between noise
and quantization step in efficient data encoding.

BIG PIXELS vs. small pixels:

-- How to compare noise and dynamic range --

The noise and dynamic range statistics
typically reported for cameras (and in particular those reported above)
are based on measurements for individual pixels.
The question then arises, how does one compare
noise statistics among cameras with different pixel sizes, different numbers
of pixels, and different formats (sensor sizes)?
The issue arises because, in general, smaller pixels are noisier.
Why is this so? Consider combining a 2x2 block
of pixels into a "superpixel" of twice the size. The raw value is proportional
to the number of photons collected; combining pixels, the photon counts add
and so the raw values should be added together:

S = S1 + S2 + S3 + S4

The noises combine in quadrature:

N = sqrt[ (N1)2+(N2)2
+(N3)2+(N4)2 ]

If the raw values are approximately equal, and the noise contributions are
approximately equal, the aggregate signal is four times that of the individual
pixel; on the other hand, the aggregate noise is about twice the noise of the individual
pixel. Thus the "superpixel" of four pixels binned together has twice
the S/N ratio of any of its component pixels. The effect
of combining pixels becomes somewhat more complicated in regions
where the image has texture or detail, since the noise fluctuations
Ni
in pixel values are being averaged together with tonal variations
in the signal values Si. The local variation of the measured
pixel values is some part noise, some part signal, and so the quantitative
reduction in the variation of pixel values will not give directly
the amount of noise eliminated, and the decrease in
the standard deviation of pixel values relative to average signal
will in general fall short of a factor of two.

Bottom line: At the cost of having half the
linear resolution, the superpixel
made by binning together a 2x2 block of pixels
has up to twice the signal-to-noise ratio.

But wasn't this S/N ratio already there before we started binning pixels
together? Of course it was. All we did by combining pixels was to
throw away resolution. The point is that noise, and signal-to-noise ratio,
are dependent on the spatial scale at which they are measured. Typically,
S/N will be twice as large at twice the spatial scale, as the binning
example demonstrates.

Another simple demonstration shows the scale dependence of noise:

Open a blank canvas in Photoshop. Fill with a uniform middle gray tone.
(Image>Adjustments>Levels and set the output level to 128)

Add some gaussian noise (Filters>Noise>Add Noise...
and choose some amount, say 10%). Look at the histogram
and note how its width has broadened from a sharp spike
at 128 into a bell curve by the addition of the noise.
The width of this bell curve measures the amount of noise.

Open the Gaussian blur filter (Filters>Blur>Gaussian Blur...).
Watch what happens to the histogram as the radius of the blur is varied
(you may need to click on the little caution triangle
in the histogram window to get the histogram to update).
Note how the width of the noise histogram decreases
in proportion to the radius of the blur. Since the width
is a quantitative measure of the amount of noise,
the noise decreases as the blur is increased.

The blur in step 3 decreases resolution by averaging over pixels.
Noise decreases in proportion to the amount of resolution decrease.
If one downsamples an image properly, one decreases the resolution,
and noise decreases in proportion to the linear change in image size
in regions of approximately uniform tonality. (Again, the effect
of downsampling becomes somewhat more complicated in regions
where the image has texture or detail, since the noise fluctuations
in pixel values are being averaged together with tonal variation
in the scene being imaged.)

There is little difference in this regard between shrinking
the size of the image by downsampling --
which reduces the resolution by the percentage of shrinkage --
versus keeping the image size fixed and applying a blur,
which again reduces the resolution by the ratio of the blur diameter
to the pixel spacing. Either way one is averaging over pixels
of the original image, and that reduces noise.

The conclusion is that noise is resolution dependent.
If one compares two cameras having the same sensor size
and technologies but different resolutions (pixel densities),
the one with the lower resolution (lower pixel density)
will have less noise at the pixel level, as the above exercise again demonstrates.

The same reasoning holds true when comparing two different camera models
with different size pixels. All other things being equal -- the percentage
of pixel area active in light collection, the efficiency of the pixels, etc --
the number of photons collected will be proportional to the pixel area
(the square of the pixel spacing), while the
photon shot noise will scale as the pixel spacing.
So for example, the Canon 40D has 5.7 micron pixels, its gain (the number of
photons collected per raw level) is 3.1 e-/14-bit ADU at ISO 400; the number of electrons
collected per raw level per square micron is 3.1/5.7^2=.095. The Canon 1D3,
on the other hand, has 7.2 micron pixels and a gain of 5.0 e-/14-bit ADU at that ISO; the
light collecting efficiency per unit area is .097, essentially the same
(the difference of the measured values is of order the
variation from copy to copy of the same model). Using the same technology,
Canon achieved the same light collection efficiency
per unit sensor area for both cameras. As a consequence, the photon noise
at fixed spatial scale is the same for both cameras.

Light collecting efficiency translates into photons, photon counts determine
photon shot noise, and photon noise is the dominant noise source in highlights.
The bigger pixels of the 1D3 collect more photons than the 40D, and this
is reflected in the S/N ratio plots above (40D
here, 1D3
here); at saturation signal of about 214,
the S/N ratio of the 1D3 at ISO 200 is somewhat over 7.6 stops, the 40D about 7.3 stops.
However, an ideal resampling of the image of one sensor to the pixel density
of the other would yield the same photon noise, because the sensors
capture the same light per unit area. Collection efficiencies of various sensors
on a per area basis are tabulated in Table 1 (data from various sources --
Roger Clark,
Bill Claff,
Christian Buil,
Bart van der Wolf,
Peter Facey,
and the author).

10D

1D2

D200

5D

350D

1Ds2

20D

400D

D300

40D

1D3

1Ds3

50D

D3

.042

.049

.054

.061

.064

.064

.076

.083

.089

.095

.098

.098

.100

.113

Table 1: Light collecting ability per unit sensor area
of various camera models: 12-bit
gain at ISO 400 divided by pixel area in square microns.

(Note: Canon ISO calibration on earlier (12-bit) models such as the
1D2, 5D, 20D, 350D etc was understated by 1/3 stop; consequently, to
fairly compare the above numbers, the efficiency per unit area
should be multiplied by 1.25 for these cameras.)

The figures for light collection performance per unit area in the above table
refer the sensor efficiency to a fixed spatial scale rather than
considering it at the pixel level, and so accounts for varying pixel size among cameras.
Fixed spatial scale is the appropriate metric for comparing
different pixel densities if one takes the same image from two cameras
with different pixel densities, at the same focal length.
Taking equivalent crops from each camera,
and resampling the image to the same print dimensions,
the photon noise will be the same.
A different metric will arise if one compares not to fixed absolute size on the sensor,
but as a percentage of frame size; that will be considered below.

Thus photon noise at fixed spatial scale is independent of pixel size
for a given per area collection efficiency.
What about read noise?
In order for read noise at fixed spatial scale to be independent of pixel size,
read noise per pixel must decrease in proportion to pixel spacing -- if the pixel spacing
is halved, there are four pixels in the area formerly occupied by one, with an
aggregate read noise in that area twice that of the individual half-size pixels
again, since noises combine in quadrature). Therefore, the half-size pixels must
have half the read noise in order that the total read noise at the scale
of the original pixels is the same.

The situation is somewhat subtle, however. The read noise varies with ISO,
as the data and model presented
on page 2
indicates. So at the very least one should examine the issue for each ISO,
since the conclusions could be different for low ISO versus high ISO.

Consider the 40D and 1D3.
At ISO 100, the 1D3 read noise is 4.8 ADU,
while the 40D read noise is 5.5 ADU;
at ISO 1600, the read noise is 13.4 ADU for the 1D3 and 22.6 ADU for the 40D.
A similar disparity
of read noises holds for the Nikon D3 and D300; both the larger pixel D3
and the D300 have about 5 14-bit ADU of read noise at lowest ISO,
but the D3 has much lower read noise at high ISO.

However, noise figures in ADU mean different things for different cameras
due to their differing efficiencies, and as discussed above different size
pixels entail different levels of resolution which also affects the amount
of noise observed in an individual pixel. So the question arises as to how these numbers
should be compared. One way to normalize them is to compare relative to photon noises,
and to the signal in photons,
by converting the read noise values in ADU to their photoelectron equivalents.
To convert read noise in ADU to photoelectron equivalents, one multiplies by
the gain in electrons/ADU.
For instance, the gain of the 40D is 3.1 electrons/14-bit ADU at ISO 100,
while the 1D3 gain is 5.0 electrons/14-bit ADU at that ISO
(for ISO 1600, divide these numbers by 16,
since the gain value is inversely proportional to ISO).
Then the ISO 100 read noises are 17 electrons for the 40D, and 24 electrons for the 1D3;
the ISO 1600 read noises are 4.4 electrons for the 40D, and 4.2 electrons for the 1D3.

A second correction should be made for the differing pixel sizes.
To normalize the noise of different size pixels one should rescale by the
pixel spacing to refer them to the same spatial scale. When measuring noises
in electrons, this normalization divides the read noise by the pixel spacing
(bigger pixels means fewer per unit area, so fewer noises to combine).
Thus at ISO 100, one compares 17/5.7=3.0 for the 40D, to 24/7.2=3.3
for the 1D3; at ISO 1600, one has 4.4/5.7=.77 for the 40D, and 4.2/7.2=.58 for the 1D3.
On a per area basis, the smaller pixels do better than the larger ones at low ISO,
while the larger pixels do better than the smaller ones at high ISO.

Naively, the above results would seem to suggest a slight dynamic range advantage
for small pixels at low ISO, compared to a slight low light advantage to
big pixels at high ISO. Further evidence for this trend will be found below
when much smaller digicam pixels are compared to those of the 1D3.
This reversal of fortune between low and high ISO
is nicely explained by the read noise model developed
on page 2.
Read noise decreases between low and high ISO for DSLR's, as is seen in Figure 15a above.
On the other hand, for the tiny pixels in digicams, the read noise
in electrons is practically independent of ISO. Furthermore, light-collecting
efficiency tuns out to be largely independent of pixel size
over a very wide range of pixel sizes, from the biggest DSLR pixels down to
the smallest digicam pixels.
Thus the dynamic range per area, which is the maximum light collection ability per area
divided by the read noise per area, is largely determined by the read noise per area
for which the comparison will be worst for DSLR's at low ISO and best for DSLR's at high ISO.

The factor that limits the low ISO performance of DSLR's turns out to be
the limited dynamic range of the ISO amplifier and ADC components
downstream from the sensor, and not the sensor itself;
the analysis leading to this conclusion also leads to a simple proposal
for recovering an extra two stops of dynamic range from current DSLR's.
However, rather than devote space to developing that explanation and proposal here,
since it would take us somewhat off the main topic,
that discussion is deferred to a
supplemental page to this article.

For comparing pixels on a patch of sensor of a given area, the normalizations
given above are the appropriate ones. However, one is often interested in
comparing the overall noise of the image coming from different sizes of sensors,
so noise per area is less relevant that noise normalized to the image frame.
Thus another metric for comparison is
to consider the entire frame, and if comparing two different sensor formats
(such as the 1D3 and 40D considered above), shoot with different focal lengths
(say 130mm on the 40D and 160mm on the 1D3) so that the field of view is kept fixed.
In the case of fixed framing, we should refer the noise characteristics not to a fixed spatial
scale in microns, but rather to a fixed percentage of the frame height.
An appropriate figure of merit is to
divide the per pixel noise values by the frame height in pixels,
again because the noise combines linearly in the number of pixels combined.
The clear rule of thumb that emerges from such an exercise (not surprisingly)
is that larger size sensor formats are less noisy than smaller size sensor formats.
A simple example is the 40D relative to the 1D3; both are 10MP cameras, with 2592 pixels
in the frame height, and so in this particular example
the frame-referred noise levels are the same as the
pixel-level noises, and these are lower (as measured in ADU)
at all exposure levels and ISO's for the 1D3. The read noises are similar in
photo-electron units, but the 1D3's bigger photosites
capture more photons, hence have higher S/N ratio.
Bigger sensors have higher S/N ratios, because bigger sensors collect
more photons.

-- Extrapolating to tiny pixel sizes --

An amusing extrapolation of the analysis of the effect of pixel size
results from consideration of digicam raw data. The Panasonic FZ50
is a 10MP superzoom digicam with RAW capability. An analysis by John Sheehy,
as well as the author's rough measurements on
this FZ50 raw file using the noise vs. exposure graphical method outlined
on page 2, yields
a gain g of roughly .29 photons/12-bit ADU at ISO 400.
Dividing by the square of the
2 micron pixel size yields .072 photons per ADU per square micron,
comfortably in the middle of the efficiency table above. However, just as
the Canon ISO calibration was off for its earlier models, digicam ISO calibrations
differ from those of DSLR's. Typical DSLR's leave about 3.5 stops between metered
middle grey and raw saturation; digicams put middle gray about 0.5-1.0 stop
closer to raw saturation, due to their lower dynamic range. This means that their
ISO calibration is about 0.5-1.0 stops understated in relation to DSLR ISO
calibration, and so the FZ50 efficiency figure is actually higher than .072
by a factor 1.4-2.0, making the FZ50 sensor among the most efficient per unit area
in capturing photons of any digital camera sensor currently available!
Of course the sensor is only about 5.5mm x 7.3mm in size, so the photon noise
referred to the frame size is rather poor, as it is for any digicam; but
the photon noise at fixed spatial scale in microns or millimeters,
rivals or betters the 1D3, 1Ds3 and D3.
If the FZ50 sensor could be scaled up to the size of full frame, it would
indeed rival these cameras for photon shot noise performance; and the resolution --
the 2 micron pixels translate into a 216MP (!) full frame camera (of course, for
many applications the actual resolution will be limited by diffraction
and lens aberrations).
It is currently unclear whether this performance could be maintained as the sensor
is scaled up by a factor of nearly five in linear dimension -- practicalities
of supporting electronics for the pixels, speed in readout and processing, etc,
may make it difficult to maintain the FZ50's performance in a scaled-up version.
Nevertheless, what should be clear from the preceding analysis is that
there is virtually no difference in photon collecting efficiency over a very
wide range of pixel sizes, from 2 microns to over 8 microns.

So a digicam sensor can rival or even exceed the performance of the best
current DSLR's in photon capture efficiency, and thus shot noise performance.
What about read noise? Naively one might expect that the read noise
is a fixed cost per pixel, so more pixels per area mean more read noise.
John Sheehy reports a read noise for the FZ50 of 2.7 ADU ~ 3.3 electrons at ISO 100; my
rough measurements on the raw file linked to above yielded
4.5 ADU ~ 5.4 electrons of read noise.
Let's compare this to the Canon 1D3 by referring to fixed scale.
This means dividing the read noise per pixl in electrons by the pixel spacing.
The 1D3 read noise is 1.2 12-bit ADU = 24 electrons at ISO 100,
and the pixel spacing is 7.2 microns, so its read noise figure of merit is
24/7.2 = 3.3; the FZ50 read noise figure of merit is somewhere between
3.3/2.0 = 1.7 (Sheehy's measurement) and 5.4/2.0 = 2.7 (my measurement).
So again the FZ50 is as good or better
than the best DSLR's in read noise (fairly compared by referring
to fixed spatial scale), at the lowest ISO's.
The comparison gets even better if we again factor in the relative
calibration of ISO between the two cameras.
At high ISO, things are
again reversed -- the 1D3 read noise at ISO 1600 is 3.4 12-bit ADU = 4.2 electrons,
while the FZ50's is still about the same in electrons according to Sheehy.
Now when dividing by the pixel spacing to refer to fixed spatial scale,
the comparison substantially favors the big-pixel DSLR, even when the relative ISO
calibration is accounted for.

The above DSLR/digicam comparison outlines the extremes of what may be possible
with current or near-term technology, if digicam pixel densities were used
to populate full-frame sensors. The fact that a digicam's performance is in the
same ballpark as the best DSLR's when referred to fixed spatial scale, suggests that
the problems with noise in digicams is not due to their ever smaller pixels,
but rather it is due to their continued use of small sensors.

Bottom line: Among the important measures of image quality are signal-to-noise
ratio of the capture process, and resolution. It was shown that for fixed sensor format,
the light collection efficiency per unit area
is essentially independent of pixel size,
over a huge range of pixel sizes from 2 microns to over 8 microns,
and is therefore independent of the number of megapixels. Noise performance
per unit area was seen to be only weakly dependent on pixel size.
The S/N ratio per unit area
is much the same over a wide range of pixel sizes.
There is an advantage to big pixels in low light (high ISO) applications,
where read noise is an important detractor from image quality, and
big pixels currently have lower read noise than aggregations of small pixels
of equal area. For low ISO applications, the situation is reversed in current
implementations -- if anything, smaller pixels perform somewhat better
in terms of S/N ratio (while offering more resolution).
A further exploration of these issues can be found on the
supplemental page.
Rather than having strong dependence on the pixel size,
the noise performance instead depends quite strongly on sensor size --
bigger sensors yield higher quality images, by capturing more signal (photons).

The other main measure of image quality
is the resolution in line pairs/picture height; it is by definition
independent of the sensor size, and depends only on the megapixel count.
The more megapixels, the more resolution, up to the limits imposed by the
system's optics.