Fail-Safe Amazon Images

Amazon Web Services (AWS) allow anyone with some coding skills to create applications using Amazon's data. It's fairly easy to transform an AWS response into HTML and show a list of products and images on a remote site. Many ISPs put checks in place to stop "image leeching" — referencing an image URL from a remote server directly in a <img> tag. Amazon doesn't mind if you leech its images, though; in fact, they encourage it. But relying on someone else's data on someone else's servers introduces some challenges, and when you're putting together a dependent, distributed application you need to prepare for the worst while you're planning for the best. In this article, I'll show you how to properly display Amazon product images in your apps.

When you request information about a product through Amazon's Web Services, you get an impressive amount of data about that item. Each XML response also includes three URLs that point to small, medium, and large images of the product. (The XML tags that hold the image URLs are self-describing: ImageUrlSmall, ImageUrlMedium, and ImageUrlLarge.) With the URLs for the product images in hand, you have a couple of choices about how to use them.

To Host or Not to Host the Images?

If you want to display Amazon product images on your web site, you can save the image to your server and use a local path in the source attribute of your <img> tags, or simply use the Amazon URL in your <img> tags and let Amazon's servers do the work of displaying images.

At first glance, it seems that letting Amazon serve the product image is the best way to go: there's no extra coding to cache the image, and you save bandwidth. In practice, though, Amazon's image server can be unreachable for a variety of reasons. Keeping product images local means that you will have an image to display, whether Amazon's servers are responding or not. In my experience working with Amazon, the image server is rarely unreachable, but when it's not responding—even for just five or ten minutes at a time—a page full of broken images looks pretty bad. Another reason to consider caching product images locally is that some products don't have images. That sounds counterintuitive at first, too, but the process of caching the image gives you a chance to see if the product actually has an image.

When There Are No Images

Amazon has an incomprehensible number of products, and it's not possible for every one of them to have an image. The problem is that Amazon's API doesn't let you know which products have images and which products don't have images. Amazon always returns the image URLs, whether the product actually has an image or not. If the product doesn't have an image associated with it, all of the image URLs returned by the Amazon API will be single-pixel GIFs. The GIFs are transparent, and with some designs, that works fine. But if your product images have a border, or rely on the image being there for spacing, the single-pixel GIF can wreak havoc. There are a few ways to detect which products have images and which don't, and which method to use depends on whether you or Amazon are hosting them.

A Server-Side Solution

If you've decided to cache images locally, it makes sense to do the "image detection" at your server. In Amazon Hacks, Hack #84 provides code for this by checking the resulting image file's byte size. This method works well, but it requires your script to download the entire file, which can cause delays if you're working with the large image size. There's an even shorter way to determine whether or not the image is there for you to display.

The transparent GIF that's returned still has the JPG extension. By all appearances, it's a valid file. But the HTTP headers don't lie. By examining the headers for a given image URL, you can find out whether it's really a GIF or JPEG you're about to download. The headers give all sorts of information about the response, but the only values we're really interested in are the Content-Type (GIF or JPEG, in this case) and the Content-Length.

For example, Amazon Hacks has an image and the image URL returned by the Amazon API for the medium image is:

http://images.amazon.com/images/P/0596005423.01.MZZZZZZZ.jpg

We can use a script to see what the relevant HTTP headers for the request are:

Content-Length: 5148
Content-Type: image/jpeg

By contrast, the book Using Email Effectively doesn't have an image. The URL returned by the Amazon API for the medium image is:

http://images.amazon.com/images/P/1565921038.01.MZZZZZZZ.jpg

And its relevant headers:

Content-Length: 807
Content-Type: image/gif

As you can see, the image type is completely different and the content length is much smaller. Using this difference as a criteria for whether or not the image "exists," you can write routines in any scripting language to do this check for you.

In ASP

This ASP function uses the Microsoft XML parser to request the image headers. If the image's Content-Type is image/jpeg, the function returns True, otherwise False.

With these functions, you can check to see if an image exists and take the appropriate action: display it if it's there, replace it with a generic "no image available" graphic if not. Using these methods makes the most sense if you're caching images locally. You wouldn't want to examine the HTTP headers for every image you want to display every time someone requests a page on your server; all of this "pre-processing" would slow down your application. Also keep in mind that you can't cache images indefinitely — Amazon's terms of service require you to refresh any cached images every 24 hours. To get a jumpstart on caching Amazon images, refer to Hack #93 in Amazon Hacks, "Cache Amazon Images Locally."

And this still doesn't solve the problem of image availability. If the Amazon image server isn't up, there aren't any HTTP headers to look at, anyway. But there is another way to work around products without images — even if you're letting Amazon do the work of serving them.

A Client-Side Solution

By examining the resulting HTTP document after it's loaded, you can tell which products have images and which don't. JavaScript has access to web pages and all of their elements, so you can create a script that resides in the page to verify that all of the images are valid. Instead of examining HTTP headers to find the single-pixel GIFs, you'll look at image height and width. Yep, you guessed it, single-pixel GIFs are only one pixel high and one pixel wide. Finding those, you can replace the single-pixel GIF with a suitable "no product image" image.

Using the document.image collection, you can loop through every image on a web page and get its attributes. The attributes we're most interested in are width, height, and source. The source is important because you only want to process images being served by Amazon. If the page has local single-pixel GIFs for design purposes, you wouldn't want to replace those with a "no product image" image. Here's a bit of JavaScript that loops through all of the images on a page and gets the height and width of those being served by Amazon:

Take a look at the last if inside the loop. If the image width or height equals 1, you know you have a product without an image, and the image source is replaced with your generic "no product image" graphic.

While we're at it, why not tackle the problem of a non-responsive Amazon image server? Another image attribute JavaScript has access to is complete. If the complete attribute is true, that means the images have successfully loaded and your visitor can see it on the page. If complete is false (or doesn't exist), we can assume our visitor will be looking at a broken image soon. By adding a line to handle this, you can replace non-responsive images with your "no product image" graphic as well: