Updating the XHTML Sanitizer

Imagine you are a junior developer working on a large scale Web Enterprise Application written using Java Servlets. As a junior developer, you have no design responsibilities; you’ve only been given responsibility over a small existing component: the XHTML Sanitizer.

The XHTML Sanitizer

In this WebEA, certain end users can submit content – such as comments, forum postings, tutorial material or class slides. However, this content needs to be sanitized before output, or the system would be vulnerable to Cross Site Scripting Attacks. The XHTML Sanitizer removes any content it deems unsafe.

The component was written by another junior developer who has since left the company.

Since times have changes and soon HTML5 will be the standard, the business analysts have decided the component should now allow users to leverage new HTML5 elements.

Test Driven Development and Unit Testing

Since your company practices Test Driven Development, and is big on unit tests, you will have to write unit tests for any and all new functionality you intend to introduce before you’ve made your change to the component.

You are also required to run all old unit tests to ensure backwards compatibility. If the new code is not backwards compatible, you will be fired.

If you find any bugs which the old unit tests did not catch, you are required to write new Unit Tests for them. If the functionality is un-testable, you may make minor changes to the code, documenting and justifying them.

The Changes

Note that the point of this activity is not to re-develop the component from the ground up; you have to keep your changes to a minimum. You should be able to do this exercise – in its entirety – by changing only the value of an instance variable and modifying at most one line of code in the XHtmlSanitizer.java file.

HTML5

You have to perform maintenance on the XHTML Sanitizer, and change it so it accommodates the following new HTML5 items: <section>, <article>, <aside>, <hgroup>, <mark>, <meter>, <time>, <wbr>, and the <ol> element’s new reversed attribute.

Case-Insensitive

Additionally, some of the users have been using Microsoft Word to create their XHTML markup. As a consequence, it is sometimes a mix of UpPeRcAsE AnD lOwErCaSe. You will have to make the sanitizer case insensitive.

The Activity

Step 1 – Get The Source Code

Create a new Java Project in Eclipse, and then Import the files you downloaded as a File System.

Step 2 – Run the Unit Tests

The previous developer wrote a lot of Unit Tests. Run TestXHtmlSanitizer and TestXHtmlBadFormatting just to familiarize yourself with JUnit a bit.

There are many more Test Cases than just TestXHtmlSanitizer and TestXHtmlBadFormatting though, and since good unit tests are automatic, you’ll want to write a Test Suite. Be sure to add all Test Cases currently in the project to the Test Suite, and then run it.

Create A New Class And Give It Only One Method; public static Test suite()

It should be noted that when practicing true TDD you should only write one test method, and then proceed with implementation necessary to pass that test, before writing another test method. For the purposes of this tutorial, however, it is easier to explain a slightly more waterfall-ish approach.

Run your Unit Tests; they should fail since you haven’t changed the XHtml Sanitizer to accommodate the new features.

Step 4 – Make Your Changes to the Code

Since you have tests which are failing, it means you have to make changes to the system.

Looking at the XHTML Sanitizer, you may have noticed an array String[] whiteList which looks like it is used for configuration; any <!ELEMENT > items in this array allow a new element (tag) to be in the content and <!ATTLIST > allows an attribute. Add items to this to accommodate the following new HTML5 tags and attributes:

The <section> element.

The <article> element

The <aside> element

The <hgroup> element

The <mark> element

The lt;meter> element

The <time> element

The <wbr> element

The ordered list <ol> element’s new reversed attribute

Step 5 – Run All Tests Again.

This time, things should be a little different.

If then original tests fail but some of your new HTML5 tests pass, don’t panic – everything is good. You’re doing fine; the Unit Tests have just helped you uncover a software fault, which has now resulted in errors and into failures. (for an explination of the difference between a "fault", "error" and "failure" see What is a software fault in testing? on Stack Overflow)

Since you both have to retain all old functionality while introducing new functionality, you will have to use the Eclipse Debugging tools to determine why they are failing and fix any defects causing the problem.