Crawling a Website that loads content using Javascript with Selenium Webdriver in Python

Selenium is a browser automation tool that is used primarily for testing web applications. You can simulate real user actions and interactions with your web applications. Selenium supports all the major browser platforms and operating systems. There are bindings for all the popular programming languages. The power of Selenium is not just restricted to testing your web apps, one other use can be of crawling or scraping websites, in particular, the ones which don't provide an API and load content lazily using Javascript.

Today, we will be crawling an online merchant website www.jabong.com with Selenium using its python bindings. Jabong loads more products as you scroll down a web page. We will use Selenium to simulate this user action of scrolling down a web page and then retrieving all the product titles and the corresponding links to the product detail pages.

We use pyvirtualdisplay which is a wrapper around xvfb and enables you to run Firefix headlessly.

Page to Crawl:

A quick "Inspect Element" on a shoe above shows that each of the product is wrapped by a "div" element with class "hover-box" and the title and links are embedded in an "a" element within those "div" elements.

Good article.It is difficult to get data from website that call data using ajax. There are web scraping softwares like content grabber or screen scraper using which you can download data from jabong.com very easily.

It is very difficult to get data from websites that has infinite scrolling like jabong.But with some tricks one can also get data from those websites.I am web scraper and doing web scraping since last 5 years.