At the Forge - Checking Your HTML

Integrate HTML validation into your test suite for better HTML from the get-go.

W3C Validator

One of the best tools for checking the validity of a
page's markup is the World Wide Web Consortium's validator, available
at validator.w3.org. I use the validator
almost exclusively from within Firefox, into which I have installed
the Web Developer plugin. This plugin lets you validate the HTML of
any page, simply by selecting Validate HTML from the browser. The
browser submits the page's URL to the W3C validator, which then gives
a line-by-line indication of what problems (if any) the page
contains.

The W3C validator has at least two problems, however.
First, it requires that you submit each page, one at a time, to
the validator program. This means a great deal of time and effort,
just to check your pages. A second consideration is more practical;
the validator works only with pages that are accessible via the
Internet, without password protection. If your site is being
developed on your local computer, and if you have a firewall
protecting your business from the outside world, you
probably will be unable to use the validator via the Web.

One solution to this problem is to install the W3C validator on your
local computer. You can get the source code from
validator.w3.org/source, which comes in the form of a Perl
program. On modern Debian and Ubuntu machines, you can install
w3c-markup-validator, which makes it available via your local Web
server, ready to be invoked.

If you end up installing the validator manually, it requires a number
of modules, which you might need to download from CPAN
(Comprehensive Perl Archive Network), a large number of mirrors
containing open-source Perl modules. It might take some trial and
error to figure out which modules are necessary, although if you are an
experienced user of the CPAN.pm installer, this shouldn't be too much
trouble. Note that the SGML::Parser::OpenSP module requires the OpenSP
parser, which you can get from SourceForge at openjade.sf.net.

As you might be able to tell, a number of these modules are required
in order to handle alternate encoding schemes, particularly those for
Asian languages. Even if you aren't planning to handle such
languages, the modules are mandatory and must be installed.

The validator program, called check, should be put in a directory
for CGI programs or in a directory handled by mod_perl, the Apache
plugin that lets you run Perl programs at a higher speed, among other
things. You also will need to install a configuration file, typically
placed in the directory /etc/w3c, but which you can relocate by
setting the W3C_VALIDATOR_CFG environment variable.

Validating Rails Templates

Now that you have the W3C checker installed on your own server, you
can feed it URLs that aren't open to the public. But, if you are
developing an application in Ruby on Rails, you can go one step better
than this, integrating the W3C validator into your automated testing.

In order to do this, you need to install the html_test plugin for
Rails. Go into your Rails application's root directory, and type:

With this plugin in place, you now can use three new assertions in your
functional and integration tests: assert_w3c returns true if the W3C
validator approves of your HTML; assert_tidy returns true if you're
using the HTML Tidy library, described below; and, assert_validates
calls both of these.

So, if you have a FAQ page you want to check with an integration test,
you can write something like this:

def test_faq
get '/faq'
assert_response :success
assert_w3c
end

If the HTML for this page is approved by the W3C validator,
everything is fine. If this page is not valid, you will get
quite a bit of output, which you should redirect to a file. This file
will contain not only the results of your tests, but also the same
HTML output that you would have gotten from the public, Web-based W3C
validator. This means you'll get a complete and easy-to-read
description of what you did wrong.

You'll often discover that a large number of validation errors can be
fixed with a small number of corrections. For example, when I ran
this test against a sloppy FAQ page, I got six validation errors. I
was able to fix all of them by indicating the appropriate namespace in
my <html> tag and removing an extraneous </p> from the end of the
file.

Checking HTML validity in this way is nice and easy. (It can be
time consuming, however, to invoke the validator on every single page;
I think the trade-off is worthwhile, but you might disagree.) If
you always want to check HTML validity, you can change your test
environment's configuration somewhat, so that it'll happen
automatically, without having to invoke assert_w3c each time.

To do this, you need to modify test_helper.rb, which sits at the top
of the test directory, and which is included into every test
program. All you have to do is add:

With these four lines in your test_helper.rb, you can run
your integration tests once again. If any of the validation tests fail, you can
look at /tmp/w3c_last_response.html, which will contain the complete
output of that failure. This doesn't help very much if you have
multiple failures, however.

If you have designed your templates using the DRY (don't repeat
yourself) principle, fixing HTML markup problems shouldn't be
too bad. In many cases, you will need to change only one tag in the
layout to fix everything.