Office automation with Selenium WebDriver

Selenium (seleniumhq.org) is a browser controller, but I’d like to imagine that it’s a nice, big robot using a web browser with its somewhat clumsy robot paws. And it can be quite good for automating web-based office processes.

Web automation is not a very new thing to techies, especially DevOps. Even a 0.6 programmer like me managed to rig up an automated load test for CLAS with JMeter a year ago, after a weekend learning the thing. Well, while JMeter is very good at jackhammering at a web app with what looks like an army of users logging in at the same time, I don’t like it for office automation. JMeter works at a HTTP protocol/request level, so simulating an army doing simple login + page jumps is easy, coming up with the kind of long, precise, and timing-heavy sequence of actions that mimics a real person is difficult, error prone… But most of all not easily repeatable. What you record in JMeter is the communication between the client and server, not actually what you did on the browser, so in order to fix bugs, you need to reverse engineer how the site works. This is why I have hesitated to reassure my boss that I can automate the tedious parts of his job, because I can’t tell for sure how long that would take.

The difference between Selenium and JMeter is WYSIWYG. What you record in Selenium are browser actions, which you can confidently debug by looking at what the site shows you. A Selenium script starts up a browser, search for DOM elements on a page, and send events to that elements. It operates at a closer level to the actual human user, compared to the HTTP protocol level. At the user interface level, you can be more certain that what your automated script does is equivalent to what a human being does, because you are, for example, entering usernames and passwords on actual input fields instead of trying to match session cookies on requests. Selenium is slower and heavier, because it invokes and drives the browser, but it can be very precise and human-like.

Moreover, its final runnable test suite can be easily wrapped inside a shell script, a Windows batch script, or an Apple .app created with Apple’s Automator. Wrapping the actual test runner behind an executable breaks the dependency between the developer and the automation end user with regards to scheduling. Now whoever needs to use the automation script can schedule the script to run via an iCal alert or the windows task scheduler, both far more friendly than editing crontabs. If the action sequence needs to be changed, then the sequence file (Selenium test suite) can be changed without affecting the final script, or its scheduling.

Here’s a quick way to get started with Selenium:

Download the Selenium IDE, a Firefox plug-in, and try recording some browser action sequences, save that as an .html file.

Run the whole test suite you just recorded. You will notice that it fails quickly because the pages cannot keep up with the script. You will have to add “pause (target milliseconds)” commands between every actions. I recommend using a time-based pause rather than an element-detection command like “waitFor”, because element ids and order of appearance change a lot quicker in a website than whether an element representing a core site functionality is there or not.

Refine the timing some more. Some pauses may need to be many minutes long, if a button triggers an AJAX event that fetches massive amount of data, for example.

Debug mysterious “ineffectual clicks”, where an actual click with a mouse on an element does something, but the click event on the DOM does nothing. Chances are, some elements in certain web sites are keyed to related events like mouseUp and mouseDown instead of a click. Watch out for these types of events that are similar in meaning.

Refine element search logic. The Selenium recorder is only smart enough to know that you click on “the third LI inside the second DIV” not “the LI containing a label that contains the text ‘Faculty of Arts’.” You need find the target of that entries, and replace positional identifiers like “//div[@id=’sectionForm:faculty_panel’]/div/ul/li[3]” with semantic identifiers (in XPATH syntax), like “//div[@id=’sectionForm:faculty_panel’]/div/ul/li[contains(text(), ‘ARTS’)]” or “//div[@id=’sectionForm:subjects_panel’]/div[2]/ul/li[label[contains(text(), ‘COGS’)]]”

Note: you may also want to replace any command target that searches elements by ids, because ids like ‘id_12345’ are often randomly generated to prevent just the kind of automation that you are trying to do (since spammers do it too).

Download the Selenium java WebDriver, and write a shell script to run your test suite without having to open up the browser and press buttons on the IDE test runner.

Some Caveats: you may need Apple’s own java on yosemite. Oracle’s one sometimes installs correctly but the “java” command cannot be found on the terminal. Your user needs to schedule the task to run when the computer is not sleeping, so probably during the work day, or if after, then the computer must be set not to sleep at that time. This limitation doesn’t have anything to do with Selenium, of course, just a limitation of task schedulers in general.