Compressed, uncompressed and raw: the differences

Having concluded in a previous piece that compression was a disaster, it's somewhat comforting to realise that an increasing number of cameras now allow us to record pictures which are uncompressed, nearly uncompressed, raw, or some combination of those things.

The differentiation between those things has been blurred by the forces of marketing to the point where it's easy to misunderstand. Our purpose today is to discuss the low and no compression options, so that we can make reasonable choices about which formats to shoot, and what the storage requirements are going to be on set and in post.

Uncompressed

Most people understand what uncompressed data is. The simplest such workflows are based on what may still be the oldest approach: a directory full of DPX files. Each file consists of a list of numbers, and each number represents the red, green or blue value of a pixel. DPX is broadly the same thing as a BMP file. Similarly, we can store completely uncompressed data in AVI or Quicktime files using the R10K or R210 codecs commonly used alongside things like Blackmagic and AJA SDI capture boards. These not-really-codecs simply standardise a way of listing red, green and blue values. When people say “uncompressed,” this is generally what's being discussed.

(Slightly beyond the scope of this article is the issue of the component (slightly inaccurately, YUV) images in codecs like V210. If we consider colour subsampling compression, which it really is, then no component file is really uncompressed — but that's a story for another day.)

Lossless compression

Either way, imaging specialists would claim that there is a lot of “redundant information” in photographic images stored this way. That usually implies accepting some small changes to the image in exchange for compression, which is where compressed formats like ProRes come in. Today, though, we're talking about compression that doesn't cost us any image quality. There is a lossless version of JPEG, although it's only commonly applied to moving images as part of CinemaDNG (of which more below).

The HuffYUV and Ut Video codecs are available for use on Windows and Mac and can be used to compress data lossless in Quicktime or AVI files. These approaches achieve modest compression ratios below 3:1, and usually around 2:1. That's lossless or mathematically lossless compression. The results are identical to uncompressed workflows, although the demands on CPU time and the risks to compatibility are greater.

Raw

At these ratios, HD or 4K material can still be hard work for spinning disks and Flash is often too small for serious online storage. The thing is, most modern cameras use a single, Bayer-filtered sensor (which we discussed here). If a (say) 4K camera has a 4K sensor, the unprocessed sensor data comprises one 4096 by 2160 monochrome picture. If we convert that to 4K RGB data, we now have one 4096 by 2160 layer for each of red, green and blue. Two-thirds of that RGB data is interpolated (er, made up). If we simply store the raw sensor data we might end up storing less data than even lossless compressed RGB at no image quality penalty.

In its original sense, the word “raw” meant files from digital stills cameras which worked exactly like that. They mainly store simple lists of pixel values, although DSLR manufacturers have managed to make it complicated, sometimes on purpose, by wrapping those lists of numbers up in a sufficiently involved proprietary file format to make them hard to read. Solving that problem has been the problem and the joy of the open-source program dcraw, which supports hundreds of formats from different cameras and has been a significant tool for archiving and preservation of old data. In general, though, it's just an issue of working out where in the file the numbers are, and processing the Bayer mosaic data into a viewable image. So, that's raw and in a camera, the results are generally identical to an uncompressed workflow.

Compressed raw

It is perfectly possible to compress raw data. It’s still a series of photographic images and the compression can be either lossy or non-lossy. The first motion-picture camera to use lossy compression on raw sensor data was probably Silicon Imaging's SI2K. The Cineform codec associated with the SI2K was thus among the first widely deployed ways of compressing raw data for later processing. The same approach which would later be used by Red which also applied a wavelet codec. Both Cineform and R3D files represent a complete motion clip in a single file, even though neither of them uses the similarities between frames to compress data. The result is broadly similar to Adobe's freely available CinemaDNG format, which stores a single frame per DNG file, having been designed as a generic format for DSLRs. Assuming the DNG file uses the lossy compression, that's lossy compressed raw. DNG can use a non-lossy compression on raw, too.

Conclusion

So, we've considered uncompressed RGB images, lossless compressed RGB images, and storing raw sensor data with or without compression which can be lossy or non-lossy. About the only thing we've overlooked is the OpenEXR format, which is mainly used in visual effects. It can use various types of compression, particularly some designed to work well on noise-free, computer rendered images. There are only so many ways to store images on disks, in theory. Of course, in practice, we've managed to come up with dozens of mutually incompatible reimplementations of broadly the same idea, which at least keeps things interesting.