use Groovy (potentially in conjunction with a specialist HTML parser) to parse HTML pages as if they were XML

use Groovy to simplify the code required to drive a Java API browser simulator, e.g. HtmlUnit or HttpUnit

use Groovy to simplify the code required to drive a Java API for manually driving a real browser, e.g. IE or Firefox

use Groovy to interact with a higher-level testing library which uses one of the above two approaches, e.g. Watij (for the equivalent of Watir in the Ruby world) or WebTest (to open up the possibility of testing more than just web applications)

We examine a few approaches below.

Groovy with CyberNeko HTML Parser

NekoHTML is a library which allows you to parse HTML documents (which may not be well-formed) and treat them as XML documents (i.e. XHTML). NekoHTML automatically inserts missing closing tags and does various other things to clean up the HTML if required - just as browsers do - and then makes the result available for use by normal XML parsing techniques.

Here is an example of using NekoHTML with XmlParser to find '.html' hyperlinks on the groovy homepage: