Most mid- and high-end DLSRs offer two or three sizes for RAW capture. When the camera is generating the medium or small sized RAW files, how does it make them smaller? Does it capture less information onto the sensor? Does it capture the full amount of information and then apply some sort of in-camera compression? Does it do something else that I'm not describing?

4 Answers
4

Douglas Kerr gives a masterful and largely non-mathematical summary at The Canon sRaw and mRaw Output Formats . The situation is complicated and not perfectly understood, but much has been deduced by reverse engineering. Evidently sRaw is a 2 x 2 aggregation but with some chrominance subsampling; mRaw is likely a bona fide resampling (involving local interpolation), with heavier chrominance subsampling. One might indeed characterize each as a form of "in-camera compression" performed in a sophisticated way to optimize the appearance of detail to the human eye for a given output file size.

In a nutshell: smaller "raw" files aggregate the sensor values within blocks of pixels.

For instance, Canon's RAW format conveys information about individual "sensels." Each sensel (or "photosite") responds to a restricted range of frequencies (termed red, green, and blue). Each one of these, when later "developed," will be located at a single pixel site in the final image.

Canon's sRAW format, however, conveys summary information about 2 x 2 blocks of sensels. It reports brightness (luminance) data for each block, but "decimates" (skips over in a regular way) some of the color information. As such, several important things happen:

The individual sensel data are no longer available. (The sRAW data are indeed "processed.")

The resolution of the image is reduced (it is halved, implying there are a quarter as many pixels).

The file size of the data is reduced approximately by two-thirds.

The sRAW data are not a "subset" of the RAW data. They are a different encoding of the raw data, with less information. No sensels are "ignored."

(Normally, reducing the resolution of an image by a factor of two will decrease its size on disk to one quarter the original. Here, though, the original sensels deliver about 14 bits of information, amounting to 56 bits in each 2 x 2 block in the RAW format. In sRAW, each 2 x 2 block is encoded as three 8-bit pieces, or 24 bits. The resulting data stream is therefore only 24/56 = about 1/2 the size of the original, and is reduced by another 1/3 by the decimation of the chrominance data, for a net reduction of 2/3. Lossless compression is applied in sRAW, so the ratio may differ slightly.)

Simply put the camera resizes the photo so there are fewer pixels in an sRaw file, thus the file size are smaller. For example the Canon 50D sRaw1 is 7.1 megapixels compared to the full 15.1 megapixels of the sensor.

I believe this happens after colours have been interpolated from the Bayer array so the full sensor data is used, also an sRaw file contains full colour information at each pixel. Lossless compression is also applied to reduce the size further.

Some manufacturers offer lossy raw compression using the full image resolution, however these files wont be as small as the reduced resolution raw files in most circumstances. Lossless compression is also used on full resolution raw files, which has no effect on image quality however takes more time to read/write the files (due to the need to compress/decompress) so is sometimes not used.

I'm not sure about lossless compression. That would probably be very brand and model specific, and likely not a feature that every camera model uses. Most RAW files are truly raw, untainted data from the sensor, without any compression of any kind.
–
jrista♦Sep 21 '10 at 16:33

I think the major manufacturers have been using some form of lossless raw compression for a while now - looking back through my Canon raw files from my old 30D, they are all quite different file sizes, if it was raw untainted data they would all be the same size!
–
Matt GrumSep 21 '10 at 17:01

2

@jrista The smallest file size from my 30D was 5.6MB and the largest was 9.6MB that's too large to be the JPEG thumbnail. Seeing as the small raws have large areas of pure white, I'm pretty sure Canon use run length encoding to provide lossless raw compression.
–
Matt GrumSep 21 '10 at 20:29

It is true that raw files contain all the data captured by the sensor but there can still be redundancies in the data that allow the camera to save space without losing any of the original information!