Selenium for data integration into a big data repository

Ashutosh Bijoor

Selenium is a software testing framework for web applications. It is an open-source platform, released under the Apache 2.0 license.

Selenium is used to automate functional testing of web interfaces within web browsers. It can also be used for automating web applications, especially for testing purposes. However, Selenium being a single-threaded JAVA application may not be suitable for load testing. In addition to this, Selenium can be used to automate web-based administration as well. In particular, we had a requirement for a media company to automatically extract data from multiple web applications, each having separate login credentials.

Selenium standalone server is jar file. The testing script can be either created by record/playback tool of selenium, or a selenium web-driver can be used. Most popular among these is PHP-web-driver. The script instantiates a browser with the given URL, and then user events can be fired from the script. The browser can also be opened as a headless instance on servers.

There are two Selenium components namely Selenium IDE and Selenium Web Driver

Selenium IDE is a complete integrated development environment (IDE) for Selenium tests. It is implemented as a Firefox extension, and allows recording, editing, and debugging tests.

With the help of Selenium IDE, scripts may be automatically recorded and edited manually. This would provide autocompletion support and the ability to move commands around quickly.

It provides a record/playback tool for authoring tests without learning a test scripting language.

This is implemented through a browser-specific browser driver, which sends commands to a browser and retrieves results.

Most browser drivers actually launch and access a browser application (such as Firefox or Internet Explorer).

For our applications, we used the Selenium Standalone server with Selenium WebDriver and built our scripts using the PHP-web-driver.

Selenium runs on Ubuntu and Redhat servers.The browser is opened in a headless instance with X-Virtual Frame Buffer (Xvfb). We then used Selenium for automating 3rd party web application login. The front-end takes the user credentials and third party identification and calls the selenium script. Selenium script has configuration for logging in different applications. The script identifies the target and inputs the user credentials. On successful login the front-end proceeds with subsequent functionalities or parsing the web content on the site/application and passes it to a backend web application to store it in a big data repository.

if an error is found, it is returned to the front-end, which prompts user to input the credentials again. The errors are either generated by the third party application such which include invalid user credentials, specific password length, etc. or they are custom configurable error messages.