The JPEG Family Circus

The discussion in the comments of my recent article about HD Photo (a.k.a. JPEG-XR) got me thinking about all of the different beasts that go by the name “JPEG.”

JPEG: What most of us consider to be “JPEG” is just one of many processes for image encoding and decoding defined within the same specification. The process that makes up 99.99999% of all of the JPEGs ever created is “JPEG Baseline (Process 1)” for 8-bit lossy compression. (That’s just my estimate, which is probably low. It’s probably better to say “almost 100% of all JPEGs.”)

This process divides an image into a bunch of 8×8 blocks, uses the discrete cosine transform (DCT) to move the data into the frequency domain, and compresses the data by (among other things) removing some of the high frequency data that the human visual system usually can’t detect. You can think about it as abridging a novel by taking out a few sentences per paragraph. Unfortunately, if the quality settings are too low, it’s really easy to notice that something has gone missing; or if a scene has a lot of information — such as one with lots of fine detail — there will be blocky artifacts where there the detail should be.

While the removal of high-frequency detail is inherently lossy, even with a maximum quality setting, the original JPEG standard specified a separate lossless mode not based on the DCT. Images compressed this way can be completely retrieved from the compressed data. This is important when you need to preserve all of the data within in image or when adding artifacts can have devastating consequences. “Is that a nodule in the patient’s chest X-ray or a JPEG compression artifact? I guess we’d better do a biopsy just in case….” In fact, the lossless modes for JPEG are really only used within DICOM files, the format used for digital imaging and communications in medicine.

Old school JPEG also supports 12 and 16 bits of data in each channel of a pixel. For color images, this is the difference between about 17 million colors for an 8-bit image, 68 billion colors for a 12-bit image, and 281 trillion colors when using 16 bits. Once again, only those medical imaging people use the extra bit depths, and they just use the gray colors.

JPEG-LS was supposed to be a better lossless format but never really got going. The promises of JPEG 2000 probably had a lot to do with this.

JPEG 2000 is (1) a wavelet-based compression method, (2) a scheme for encoding wavelet-compressed images into randomly accessable “codestreams”, and (3) a file format for encapsulating compressed codestreams. Because it uses a discrete wavelet transform (DWT) the results are generally better than the older JPEG format when comparing images with the same compression ratio.

Images in JPEG 2000 can have an arbitrary bit depth (1 – 32 bps), and different planes can have different bit depths. (For example the luminance channel of a YCbCr image can have a high bit depth to support HDR imagery.) Certain portions of an image can have higher spatial resolution or be encoded at a different compression level. JPEG 2000 has both lossy and lossless components as part of the baseline. Several colorspaces are supported, including bi-level, grayscale, sRGB, YCbCr, and indexed imagery. Hyperspectral and n-sample images are supported using a somewhat convoluted “multi-component” schema. Images can also include alpha channels for transparency. A really amazing thing about JPEG 2000 is that its possible to reorder the parts of the codestream to change how the data is accessed (e.g. access regions faster v. access different resolutions faster) without decompressing and recompressing the data, which can be expensive.

The JPEG 2000 file format uses about 20 hierarchical “boxes” to nest metadata about the compressed codestreams. While the file format is technically unnecessary to read and process a JPEG 2000 image, the extra formatting facilitates random data access, long-term cataloguing and IP management, and efficient transmission. JPEG 2000 files can also contain a limited subset of ICC color profiles. EXIF metadata support is not part of the JPEG 2000 standard, although it can appear as a private metadata field.

JPEG 2000 was touted as the format to replace the 1991 JPEG standard, but this didn’t happen for several reasons. Perhaps most important, the algorithms at the heart of JPEG 2000 require a lot of processing power, making it slower for desktop computers than rendering old-school JPEG and prohibitive for many embedded devices. As of 2007, few Web browsers have built-in support for it, and consumer-level digital cameras don’t produce imagery in the format. In 2007, Adobe Photoshop CS3 stopped including the JPEG 2000 export module in a typical installation.

But because of the smaller file size, flexibility, and more pleasing artifact appearance, the medical and remote sensing communities have adopted it. Both NITF and DICOM have incorporated JPEG 2000 data into their files. NITF is the friendly format used for “national imagery.” I will let you Google that so the NSA can start tracking you.

JPEG-XR is the name that Microsoft’s HD Photo format might have if it’s standardized, which I sincerely hope it will be. JPEG-XR uses a principal componentsphoto core transform (PCT) which I know absolutely nothing about but which promises equivalent performance to JPEG 2000 with lower computational complexity — which means you can put it on a consumer device more easily — and much better size-versus-quality performance compared to the original JPEG format. It also supports more bit depths, high dynamic range imagery, lossy and lossless encoding/decoding using the same algorithm, and wide gamut color; uses a linear light gamma making it possibly suitable to replace RAW formats or enable post-CRT workflows; and can store bucketloads of metadata including EXIF and XMP.

JPEG-Plus. And then there’s JPEG+, which you might reasonably call JPEG – 20% because it’s essentially the same as the original DCT-based JPEG with a modest file-size performance improvement and some claims about better visual appearance. I’m not holding my breath for it; but given the 29+ processes that made up the original JPEG standard, what’s an extra one that no one will implement?

Update: For posterity, the PCT stands for “Photo Core Transform” not “Principal Component Transform”. Thomas Richter said this about it on sci.image.processing:

The transform is an overlapped 4×4 block transform that is related to a traditional DCT scheme, or at least approximates it closely. The encoding is a simple adaptive huffman with a move-to-front list defining the scanning order, and an inter-block prediction for the DC and the lowest-frequency AC path of the transformation.

Some parts are really close to H264 I-frame compression, i.e. the idea to use a pyramidal transformation scheme and transform low-passes again (here with the same, in H264 with a simpler transformation).

The good part is that lossy and lossless use the same transformation. The bad part is that the quantizer is the same for all frequencies, meaning there is no CSF adaption, and the entropy coder back-end is not state of the art.

Update 3-February-2009: JPEG-XR has advanced to “draft standard balloting,” which means it’s very likely it will become a standard (unless everyone hates it, of course). More info…

While the market dynamics will not repeat the past in this case, recall that 35mm still photography came into being as a side effect of the film industry. In any case this means that in the near future, the nearest J2K codec might just be in your TV.

Somewhat creepily, JPEG 2000 is also gaining quite a presence in surveillance systems, an is, I believe, the image format for the new US biometric passports.

Just to be clear, I think that JPEG-2000 has a future which will include both industrial and consumer applications, just not the same ones that were proposed by its earliest proponents.

I don’t think that JPEG-2000 is going to be the replacement for JPEG on the web or in digital still camera workflows. But I do think it will (eventually) replace a bunch of proprietary lossless and wavelet-based compression formats. We’ve been seeing evidence of this already in the digital cinema and TV realms.

Consider this July 2007 press release: “The success of JPEG 2000 in Digital Cinema continues to grow as the Digital Cinema Initiatives (www.dcimovies.com) has recently approved the JPEG 2000 based DCI specification 1.1 for distribution of digital movies to theatres/cinemas worldwide. The strength of Digital Cinema is apparent with nearly 4,000 JPEG 2000 compliant servers deployed and nearly 5,000 systems expected by the end of 2007. The most recent JPEG 2000 digital cinema releases include Transformers from Paramount Pictures and Harry Potter and the Order of the Phoenix from Warner Bros.”

One of the things I do in my job is keep track of all of the various formats out there for imagery as well as the compression methods that they use.

There are literally hundreds of formats out there, so each new format faces an enormous challenge making any kind of dent. Standardization does help, but usually you need a good raison d’être to get people to join on your standardization committee. Being Microsoft or Adobe really helps, too.

Why make a new format or compression mode? Most people get the answer wrong. The answer is not because you have something that provides marginal improvement over existing solution. Worse yet is that the format works well in one particular, application-specific context. (I’m not saying PGF does either of these. I haven’t looked at it.)

Instead you should think seriously about whether it enables new solutions. Right now these solutions — high dynamic range imaging, nondestructive image editing, video and imagery from mobile devices, etc. — might need:

Support for bit-depths that are in use in consumer electronics (or soon will be)

Wider color gamuts

Richer metadata support, such as geolocation

Lower power consumption by the codec

Once you’ve thought about whether you have something to add, step back and think again. Many of these problems already have solutions — such as XMP metadata, color appearance modeling, etc. — and formats that are capable of plugging in these solutions. For example, you can add just about anything to TIFF, and HD-Photo looks like a winner going forward.

I’m not saying you shouldn’t innovate or make improvements on existing technologies. But you should seriously question whether your innovations and improvements will be spectacular enough to get many adherents.

And if you say, “Well, it’s a proprietary format that people will only use with the software that we create or license,” then you really need to stop. People’s data is not something to be owned by a software company. It’s the height of hubris to think that your software is the way that people should be accessing their data forever or that you in some way own their use of it.

Finally, there’s the “28 Days Later” phenomena. For those who haven’t seen the film, it’s your much better-than-average zombie movie where a virus gets loose from a government lab and destroys humanity as we know it. The analogy with new formats is a bit doomsday, I will grant you, but the cause is the same: Someone thinks that they’re making a special purpose format that’s never meant to see the light of day because it’s (you know) “internal use only” and the next thing you know, people are asking the makers of third-party software to support the (usually) undefined file format. Tell your grad student to implement a lightweight interface to libTIFF or HDF5 if you “need a new file format.”

Once again . . . I’m not saying that the creators of PGF have any of these faults, but I’ve seen it over and over. Here ends my public service announcement.

Great article, thanks!
I found this website – http://www.xdepth.com – where they claim they’re able to “inject” most (all?) of the new features required in tomorrow’s digital imaging in whatever “base format”.
They’ve released some plugins to show High Dynamic Range “into” Jpeg-1…and it actually works very good. (it’s got HDR while keeping compatibility with the jpeg-1 format)
If they come out with a “48 bit/pixel” version (RAW, or sort of…) as well, wouldn’t this be a killer format, being already compatible with jpeg-1 ?

Pixpush, I’m not sure if you’re involved with Xdepth somehow — it’s getting harder for me to tell the difference these days between folks trying to pimp stuff in the comments and those who just really like something.

It’s an interesting approach that XDepth takes, stuffing HDR and wide-gamut support into (I assume) the application markers for JPEG. Others have tried this before with inconsistent results. EXIF took off, which makes sense, since you can use it or ignore it with impunity. But Greg Ward’s JPEG-HDR encoding hasn’t gone anywhere. I think the reason for that was it involved reinterpreting the RGB data, albeit in a “backward compatible” way. (It’s worth noting that JPEG-HDR is not a fundamentally different kind of JPEG encoding unlike JPEG-XR.)

Personally I’m really doubtful of the prospects for proprietary JPEG extensions, as compared to open ones like EXIF. The computing world is moving away from closed formats. And any extension to JPEG that changes the fundamental meaning of the data — which it seems is what XDepth does — seems like an uphill battle.

Best of luck to XDepth. It would be interesting to know how it works. Perhaps it actually uses the ideas in JPEG-HDR. I guess we’ll only know if they tell us. (Although I will reverse engineer formats for fun and profit.)

Oh well, I’m not involved with them, just genuinely interested.
As you say, they should “open” it…so it gets implemented more easily.
But I personally do prefer something that – apparently – shows a higher quality than a new “wannabe” standard like JpegXR, and that is immediately compatible with “everything jpeg”.
I mean, JpegXR can well never estabilish itself as a replacement for jpeg-1…(jpeg2000 never did) and for the sake of it, probably no format will ever do.
Unless…it’s already “jpeg”! I think that’s the real beauty of it.
I’m really curious to know the underlying algorithm as well…so if you “get to know it”, just post on your blog!

Oh, and how about the 48bit/pixel ?
HDR is still a pretty “niche” technology, but think about RAW formats (digital cameras, medical images etc.).
Now it’s a total mess…and camera makers just cannot rely on a “not widely spread” image format (like HD Photo or Jpeg2000).
A “jpeg-compatible” RAW format…I think it would be killer. (again, given they open it)

Pigeonholed…

Medical disclaimer…

I am not a doctor and do not have any medical training. Your diabetes may vary. You should always check with your health care team before making any changes to your diabetes self-management or exercise regimen.