Background

HtmlUnit is a “GUI-Less browser for Java programs”. It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc… just like you do in your “normal” browser.http://htmlunit.sourceforge.net/

Problem

We want to use a headless browser’s functions to scrape a webpage for all instances of <a> to verify each contains a title="" attribute. This will be an accessibility test.

Environment

I am using OSX, Eclipse for Java, and JUnit but everything I cover can be applied to whatever environment you develop in. My environment is the one setup in a previous post, http://timothycope.com/?p=274

Solution- Using HtmlUnit to Scrape Webpages

We’ll need to import HtmlUnit’s .jar file into the Eclipse project. After that’s done we can create an instance of HtmlUnit, called a WebClient.

Background

Selenium automates browsers. That’s it! What you do with that power is entirely up to you. Primarily, it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should!) also be automated as well.http://docs.seleniumhq.org/

JUnit is a simple framework to write repeatable tests. It is an instance of the xUnit architecture for unit testing frameworks.http://junit.org/

Eclipse is a platform that has been designed from the ground up for building integrated web and application development tooling. By design, the platform does not provide a great deal of end user functionality by itself. The value of the platform is what it encourages: rapid development of integrated features based on a plug-in model.https://www.eclipse.org/