Video

Video Compression Explained

Note: This is a rather in-depth article about video compression. If you are new to video, you’ll want to read I’ve Shot Video, Now What? first.

When we started carrying the nanoFlash, Roger, our video challenged owner bought it because he was told it was great. When he read about it he became completely overwhelmed by the 762 different formats it would record to and from and demanded an explanation for just what the heck all that meant. Hence this article (which is written for video novices, like Roger). A photographer like Roger is used to working in only two formats: RAW, which has the maximum information the image contains, and JPG, which is compressed to a more convenient file size. As many photographers learning to shoot SLR video have found out, video is different: everything is compressed to one degree or another, but in an apparently bewildering array of methods and formats, each of which comes with their own cryptic codes and abbreviations.

What video compression is not

Compression is not the resolution of the video, but video resolution has a lot to do with how much information will need to be compressed. Common video formats are 720p, 1080i, or 1080p. The 720/1080 part is pretty straightforward, it simply refers to the number of pixels on the vertical scale of the image: 720 is a 1280×720 pixel image. 1080 is a 1920×1080 pixel image for each frame of video. That’s a big difference in the amount of information: a 720 frame has 921,600 pixels. A 1080 frame has just over 2 million pixels.

The i and p parts refer to whether the frame is interlaced or progressively scanned. To be (very, very) brief, progressively scanned is better, especially when there is a lot of motion in the image, but doesn’t make as much difference when objects in the image are fairly static. 1080p is the best of both worlds, but takes a lot of bandwidth to do (it’s overkill for web video or most television, for example). 720p and 1080i actually both take up roughly the same amount of bandwidth and are what are used for HD television as an example (some networks use 720p, some 1080i). 1080p generates significantly more data than either of the other two formats. That’s why many lower end cameras, storage devices, etc. can handle either 720p or 1080i, but not 1080p.

The other variable, that is not compression, but that does influence how much data must be compressed, is the FPS (frames per second) the camera records. Film cameras shoot at 24 FPS. Many, but not all, video cameras can shoot at 24 FPS for a ‘film-like’ look, but standard video is usually shot at 30 FPS. (Note: These standards refer to the US and a few other nations. There are other standards worldwide.) Some lower end cameras shoot at lower frame rates than these, and some high end cameras can shoot at much higher frame rates.

Obviously a 1080p image shot at 60 frames per second is going to generate a lot more data than a 720p image shot at 24 FPS. The bottom line, though, is video is generating a lot of information: 1 to 2 million pixels recorded 24 to 30 times a second, at 8 to 16 bits per pixel, plus audio is a lot of data to record. And, practically speaking, it has to be compressed somehow to make the file size manageable.

Compression and Bit rate

Simply put, bit rate (usually expressed as megabits per second, or Mb/s) is the amount of data recorded each second. After the camera (or computer) has done its compression thing, the file size will equal bitrate X seconds of video. If you dig around, you can often find the maximum bit rate that a camera, storage device, or processor handles. In theory, higher bit rate means more data is stored, which (assuming everything else is equal) means higher quality compression. But there are a lot of other variables.

Different cameras use different codecs (COmpression-DECompression algorithms) to compress the data. Your camera choice sets your choice of compression algorithm (more on this later), since different manufacturers have chosen different codecs. In general, compression algorithms are sorted into two general categories (this applies for audio and other data too, not just video) lossy and lossless. Lossless means that after decompression each pixel is exactly the same as the original, no data can be lost. There are no video cameras (other than a few amazingly expensive professional cameras) that record lossless.

Lossy compression isn’t an exact pixel-for-pixel match when uncompressed, but it offers much higher compression ratios than lossless compression. (The compression ratio is the size of the original video compared to the compressed video. Uncompressed video is 1:1. Lossy ratios can get very high— 1:200 compression isn’t unheard of for some heavily compressed video formats. The codecs used in video cameras offer better quality, but less dramatic compression ratios. More on the order of 1:50.) There are lots of different codecs in use today. The better codecs are usually newer and offer a higher compression ratio with similar quality image. For example, MPEG-4 gives a higher quality image than MPEG-2 at the same bit rate. Some high-end cameras, though, use less aggressive codecs with less compression to maintain the best possible image quality. In exchange for that, they require significantly higher bit rates to record their data.

How video compression works

There are two ways to compress the data in a video clip: Intra frame and inter frame. Intraframe compression takes each frame of the video and compresses it just like you would use JPEG to compress a still image (in fact one format, Motion JPEG, does exactly that). With intraframe compression every frame of the image is complete, although slightly compressed. This can be important if your video has lost a frame – since the frame before and after the lost frame are complete, not much damage is done. It’s also important when you cut and paste video clips – the video editing software needs a complete frame at the beginning and end of each transition. Intraframe compression, though, doesn’t really make the file size all that much smaller. Compression ratios of about 1:20 are about as good as it can do.

To really get more significant compression, video codecs also use interframe compression. The basic idea is simple. Video consists of multiple still frames, (anywhere from 24-60 per second typically). Interframe compression looks at each frame, compares it to the previous one, then stores only the data that has changed. Usually it doesn’t look at individual pixels, but rather at square blocks of pixels (less time consuming and resource intensive). But each frame in an interframe compressed video contains only the changed parts of the image.

But interframe compression brings a new problem: What happens when you’re sending this video to wherever (or importing it), and it skips a frame? If each frame is referencing the previous frame, you’re in trouble until the entire picture has changed. If you have a 3 minute clip of the same scenery, there would be a problem. And the same problem would occur, if you wanted to cut the scene halfway through that 3 minute clip: the frame at your transition wouldn’t be a complete frame, just a compression of the changes that occurred from the previous frame. And so on. The solution all interframe compression formats use is the key frame.

Key frames and long-GOP compression

Interframe compression codecs record a Key Frame every so often: a frame that contains the entire image data set, whether the scene has changed or not. The key frame is shot every x number of frames (usually 15) and that frame contains a complete image. The next group of frames (until the next key frame) is heavily compressed, containing only the changes from the previous key frame. Using this method, if you skip a frame, you only lose (at most) 15 frames before you’re good to go again (or until your next editing point). It’s still a relatively long time, but allows for a much smaller file size than intraframe compression alone. This key picture followed by several the compressed pictures until the next key frame is abbreviated GOP (group of pictures). Since there is a fairly long group of images grouped associated with each key frame, this is often referred to as long GOP compression.

A final note about frame skipping: it’s rare. In fact, it almost never happens when using quality equipment. Because of this, long-GOP encoding is usable and safe. Intraframe-only compression does protect against frame skips, but requires a lot more disk space (and a higher bit rate for the same quality image). Since video editing software can only cut at a key frame, some high quality video recording devices (like the nanoFlash that started this discussion) will record video with only intraframe compression (a half-second until the next key frame can be an eternity to a video editor), but the resulting files can get very, very large.

Luminance and Color Compression

Since the days when video was analog, luma (the black and white values) and chroma (the color) have been stored separately. Y’CbCr is how video is stored today, typically using a process known as chroma subsampling. Y’ (sometimes simplified to just Y) is the luma (grayscale) information. Cb and Cr each store a portion of the color information (like LAB color space in Photoshop, for you photographers).

We are less sensitive to color and very sensitive to the grayscale value of an image, so video cameras today discard some of the color information to further compress the video data. The proportions are usually shown as a ratio with 4, indicating no compression. Recording video at 4:4:4 would be ideal, but it takes up an enormous amount of space and isn’t feasible in most situations with today’s equipment. Top quality video formats, like XDCAM422 and DVCPRO HD, keeps twice as much Luma data as either color (Cb and CR) data in a ratio of 4:2:2. This reduces bit rate by 1/3 with very little image compromise. Other video formats such as HDV, AVCHD, MJPEG, and MPEG-2 (DVD quality) use even less chroma data, storing video at a ratio of 4:2:0. This may sound extreme, but DVD quality video is recorded at 4:2:0, so it is intentionally missing 3/4 of the color information originally present. Don’t we all think DVD is pretty high quality? Even Blu-ray is only keeping 1/2 of the color information, storing video at 4:2:2 chroma compression.

Many professional video cameras use a 4:2:0 ratio to keep the bit rate manageable when recording in camera. When absolute image quality is critical, however, these cameras (The Sony EX1 and EX3, for example) that internally record at 4:2:0 have HD-SDI output from the camera which can output a higher quality 4:2:2 signal, but will require an external recording device (like the nanoFlash that started this discussion).

Recorded Bitrates

Interframe compression algorithms record so many bits-per-second. Using a set bit rate stores the same amount of data for every second of video, regardless of how the frames change over that second. With a set bit rate you know exactly how large a 5 or 10 minute video file will be, since the bit rate is fixed. When we used film to record to (or MiniDV tape), the bit rate had to be constant, because the tape moved at a constant rate. DV footage, once digital, still records to tape at a fixed rate of 25Mbps. HDV, a descendent of DV that uses MPEG-2 compression, records at a fixed rate of 35Mbps.

Most cameras and codecs today, however, record using variable bit rates because it is more efficient. This changes the recording bit rate based on the amount of information change frame-to-frame. If it sees an almost identical previous frame, very little data will be encoded. However, if a large part of the frame is changing, there is much more data, and a higher bit rate will be recorded. The takeaway message, though, is that every recording device, whether in-camera or external, has a maximum bit rate it can handle. The various compression codecs have to provide final data at a bit rate that is acceptable to the recording device or bad things will happen: missed frames, jumping, etc.

HDVXDCAMAVCHD and every other codec

The terminology involved in the various codecs is beyond chaotic. To a video-outsider it’s incomprehensible, but we can try to clarify things a bit. Like most simplifications, what follows is a bit generalized in the interest of keeping it easy to follow. We’ve left out and ignored some arguable points that would easily lead to 4 more pages of clarification in an effort to make it readable. As a general overview, however, this is a pretty reasonable summary. First, we need to separate containers (sometimes called formats) from the underlying codecs. A container is a format that can use (or be used by) many different (but not all) codecs. AVI, Quicktime, RealMedia, DivX, and many other containers exist, but they are (with a few exceptions) not actual codecs.

There are several codecs in common use today, each following a set of standards developed by the Motion Picture Experts Group (MPEG), the ITU-T Video Coding Experts Group (VCEG), or the Joint Video Team (JVT) from both groups. These standards provide a lot of customizable options to the various camcorder and software manufacturers. Some cameras let you choose between two codecs, but most only offer one. The reason behind this? Different codecs require different processing algorithms to encode video. The processor in the camera (yes, cameras have processors very similar to computers) is designed so that it can handle the encoding for that specific codec. And the memory used to store the video is designed to handle the bit rate needed for decent quality video from that codec. Etc…

The most current families of standard codecs from MPEG and VCEG are combined as the H.264/AVC/MPEG-4 standards. H.264/MPEG-4 (also referred to as MP4 at times) allows for a much lower bit rate then previous codecs while still achieving excellent quality. It is used not only for video compression during recording, but also for compression after editing. Youtube, Blu-ray, and the iTunes Store all use H.264 for encoding video. AVCHD codecs are H.264 based codecs used in newer high end Sony and Panasonic cameras, but many other newer camcorders use H.264 based codecs.

Several other codecs remain in common use. Motion JPEG is used on many point-and-shoot video cameras and Nikon Video SLRs. It doesn’t compress nearly as much as H.264 codecs, but requires a lot less processing power and is particularly suitable for nonHD video and lower resolutions. The HDV and DVCAM family of codecs use largely MPEG-2 compression as does the XDCAM codec. These files aren’t usually as tightly compressed as H.264 codecs, although MPEG-2 Long-GOP comes close. These codecs are often found on high-end digital video cameras. The reason why they don’t use H.264? Depending on the source of information you read, this is because the files are easier to edit, or because the manufacturer had lots of chips made for these codecs and was going to use them. I suspect both reasons are true.

So basically each manufacturer chooses which of the standard codecs to implement in their camcorder. Well and good. However, they then modify it a bit, build the chip they’ll install in the camera to use their version and identify it with a cryptic set of initials in an apparent attempt to prevent anyone from understanding that their codec has anything in common with anyone else’s codec. Let’s look at one example. Sony and Panasonic jointly developed AVCHD (Advanced Video Codec High Definition) for their consumer camcorders, which is also used by Canon. AVCHD is MPEG-4 AVC/H.264 compliant so it can also get tagged with those initials. Panasonic tweaks AVCHD with some higher bitrates and markets this codec as AVCCAM in their professional cameras, or downgrades it to 720p recording only and calls that version AVCHD Lite. Sony calls their version NXCAM in their newest professional cameras (as opposed to the XDCAM, a different codec used in many of their current high end cameras). Canon and Panasonic use a High-Profile level 4.1 modification of the AVCHD codec in some cameras which allows a maximum bit rate capture of 24 Mbits/sec, while most camcorders using AVCHD capture a maximum bit rate of 17Mbits/sec. On the editing side, Adobe Premier required a third party plug-in to convert certain versions of AVCHD, but does fine with others, Final Cut Pro converted this format to Apple Intermediate Codec before editing was possible, and Vegas had no problems with the format at all.

Pretty confusing, huh? The takeaway message, with a lot of caveats, is that most codecs in higher level cameras are MPEG-4/H.264 compliant and fairly similar as to how effectively they compress video while maintaining quality. They may differ in offering 1080p (some don’t), in how high of a bit rate they can record (which, given similar codecs, is a fair estimate of image quality), how often they record a key frame (which may be user adjustable in-camera), and how easily your editing program can convert it into an editable format. There are a few common codecs that you’ll run across regularly that fall into several groups:

DV/DVC/DVCPRO/DVCAM – largely legacy technology, but many HD/HDV systems are backwardly compatible with DV/DVC, and it is used in some high-end video and video broadcast cameras.

HD/HDV – Used by Sony, Canon, JVC, and Sharp, originally designed for recording to tape. It uses MPEG-2 compression, 4:2:0 chroma sub-sampling, and writes with a constant bit rate. Used in many tape-based camcorders, but also some digital recorders.

XDCAM – Designed by Sony, but also used by JVC, originally designed for recording to disc. (In some ways a container {see above} rather than just a codec as most cameras using XDCAM can also record in DVCAM or MPEG-2 variants.) Uses an MPEG-2 or MPEG-2 Long-GOP codec, 4:2:0 chroma subsampling, and writes in a variable bit rate to 35 Mbits/sec. However, the XDCAM HD422 version uses a 4:2:2 chroma subsampling profile and writes a maximum 50 Mbps rate.

Motion JPG – intraframe only compression, usually used in point-and-shoot video cameras, but also the Nikon D90 and Pentax K7. It is less efficient than other codecs, so usually image size or frame rate are limited.

Or if you’d rather see what some common camcorders and videoSLR cameras use:

Camera

Algorithm

Maximum Bitrate

Panasonic HVX200

DVCPRO

100Mbps

Sony EX1, EX3, JVC HM100

XDCAM

35Mbps

Sony Z7U

HDV

35Mbps

Canon HV30, HV40

HDV

35Mbps

HG21

AVCHD

24Mbps

Canon 5D MkII, 7D

H.264

40Mbps

Nikon D90/D300s/D3s

motionJPG

bitrate unknown

Panasonic GH-1

AVCHD

40 Mbps

Compare those bit rates to what an external recorder like the nanoFlash can record: 230Mbps.

Conclusion

What does all of this mean? In general, you want the highest bit rate, using the most efficient compression algorithm possible. MPEG-4/H.264 codecs probably produce the best quality/compression ratio. However, top-end professional editing may require a less lossy format, such as an MPEG-2 based codec with resulting larger file sizes to get the absolute best image quality. Some high end cameras will allow you to take the video feed directly out to an external device and record it at an even higher bit rate with less compression for critical footage. Hence an external recorder like the nanoFlash, provides higher bit rates (180Mbits/sec) and less compression than is possible in-camera. (A note of sensibility: the 230Mbps bit rate of the nanoFlash is excessive for use with your $300 handycam or even the Canon HV40. Your image isn’t going to improve beyond the quality of the camera.)

What you intend to do with the footage after recording is also important. Some of the higher compression codecs can be difficult to work with in a non-linear editor and require upcoding to an intermediate format (read: lots of processor power and hard drive space) for editing. Some of the simplest formats, like Motion JPEG, can be drag-and-drop edited in even the simplest programs. And less compressed, but larger files (or even uncompressed files in certain high-end devices) can be a dream to edit and provide the absolute best quality after processing.

Excellent article! I recently began doing a lot of digital video work at my company and was looking for something to lay out the relationship between codecs, resolution, and formats. This was a huge help to say the least! 🙂