Tuesday, 18 May 2010

perl modules for web testing

Some modules and its links to do web testing and web automation. There has been a long time since I was doing a lot of web scrapping in the past(parsing webs with locus specific genetic mutations), and now I need it to do it again. So I am trying to find out which are the latest modules for web scrapping and refresh my memory.

This is my first list of modules to explore. I will post later my progress in this issue.

Selenium Remote Control (SRC) is a test tool that allows you to write automated web application UI tests in any programming language against any HTTP website using any mainstream JavaScript-enabled browser. SRC provides a Selenium Server, which can automatically start/stop/control any supported browser. It works by using Selenium Core, a pure-HTML+JS library that performs automated tasks in JavaScript; the Selenium Server communicates directly with the browser using AJAX (XmlHttpRequest).http://www.openqa.org/selenium-rc/
This module sends commands directly to the Server using simple HTTP GET/POST requests. Using this module together with the Selenium Server, you can automatically control any supported browser.
To use this module, you need to have already downloaded and started the Selenium Server. (The Selenium Server is a Java application.)

The HTML::Query module is an add-on for the HTML::Tree module set. It provides a simple way to select one or more elements from a tree using a query syntax inspired by jQuery. This selector syntax will be reassuringly familiar to anyone who has ever written a CSS selector.HTML::Query is not an attempt to provide a complete (or even near-complete) implementation of jQuery in Perl (see Ingy's pQuery module for a more ambitious attempt at that). Rather, it borrows some of the tried and tested selector syntax from jQuery (and CSS) that can easily be mapped onto the look_down() method provided by the HTML::Element module.

One of the modules that I wrote for MUTRES was doing exactly this. downloading the page, if the page was changed (md5 value) I was extracting the tables that I wanted and calculating the md5 for the text (not the html) and stored in a db both md5. Every week a cron process launched the script and tested if the page or the wanted contend had changed and was sending emails accordingly.