Checking whether an image is broken with HttpClient

Let’s say you have a task to check whether a certain image is broken on your page. In case of a broken image, instead of it being rendered properly on the page in your browser, you will see a suggestive icon, like an X or something similar (depending on the browser), suggesting that it’s broken.

Although intuition might suggest you should use the ‘isDisplayed’ method from Selenium to check the image, this is the wrong approach. In Selenium, interacting with an element from the HTML is done by first defining the webElement that represents it. Hence, when you want to interact with an image, that has, let’s say an ID, you will define something like:

@FindBy(how = ID, using = “theID”) public WebElement theImage;

When you would use the ‘isDisplayed’ method on this element, like ‘theImage.isDisplayed()’, you will only check that the webElement appears on the page. Meaning that, if an element with the id ‘theID’ exists in the HTML, this method call will return true. This does not confirm that whatever is defined within the element ‘looks’ good. A broken image means that the webElement is there on the page, but whatever it tries to show to the user is not where it is supposed to be.

An image, as defined in the page’s HTML, is an <img> tag. It has some attributes, one of which being the source from where it is loaded (the ‘src’ attribute). Therefore, in order to check that the image displays properly, you need to check whether it is loaded properly from its’ defined source.

So checking for an image to be displayed properly requires two steps:

1. The Selenium part:

Using Selenium, check that the image is configured to be loaded from the correct location. This means – checking that the ‘src’ attribute of the <img> tag corresponds to where you know the image exists.

assertEquals(webElement.getAttribute("src"),
urlForCorrectImage);

2. The HttpClient part:

Using HttpClient, make sure the call to that location returns an HTTP status of 200, as described below.

This means connecting to the URL from where the page is supposed to be loaded by doing a get operation. In case the connection operation will return an HTTP status of 200, the image is sure to load properly. If another status is returned, that means that for some reason, the image is not reachable – whether it does not exist at the URL provided, or you don’t have access to see it, the URL will give you a response that is not an image.

In the try clause, a get is done on the URL where you expect to have the image, and in the response, the status is checked against the HttpStatus.SC_OK value, which is 200. If the status in the response is anything but 200, the actual status code will be printed to the console, for reference and for understanding what happened with the get call. Furthermore the test will be failed right there.

Some exceptions

One thing worth mentioning: the approach with checking the http response status will work fine if the images your application uses come from an image storing service or application you own. So if your images come from your own application, or you have a server that only deals with images, and whose url you know, you are ok with this approach. As long as the url configured on the image corresponds to your image server (it has the same domain), you are good.

Some exceptions would be:

when the image has a “src” attribute whose domain does not correspond to an existing one, than another exception will be thrown when trying to connect to it: java.net.UnknownHostException. So let’s say someone mistakenly configured the url to be http://www.zmgoogle.com, which does not exist, you will see this exception, and not an invalid status code. This is because you cannot perform a get operation on a domain that does not exist. That is what the exception says.

when you are using an external image server, that might fail gracefully when an image does no exist. One such example is Flickr. They might server a “pretty” failure image instead of having your page display a broken image container. In this case, on the get call, they will return a 200 status, because, from their point of view, they are sending you an image on the call you make to them, it’s just not the one you wanted.