Best Scraping Tools in 2018 - updated

This is an updated version of article from 2017. If you have experience with any other scraping tools which you think should be on the list, do let us know.

Data scraping is a computer technique to extract data from human-readable output coming from another program. Extracting data from websites is called web scraping. Sometimes it is referred to as web harvesting or web data extraction.

Here at Kurzor we have completed numerous projects using various web scraping techniques. We tried some DOM parsing approach using Selenium driver. This approach needs a programmer to define a whole sequence of steps and actions to extract data from a web page. It takes expert skills in a programming language, HTML, DOM structure and various selector types (xPath, CSS, jQuery).

A more flexible but less powerful approach is to use a service which does not require you to be the IT guru. So if you need to make a script to grab all products from an e-shop, articles from a blog or collect some images, it is easy to try one of the following tools.

Apify is a web scraping and automation platform that extracts structured data from pages and turns any website into an API.

Apify doesn’t have a user interface where you select the data you want to extract by clicking with your mouse. Instead, you tell your crawler what to extract using JavaScript, so it’s perfect for scraping websites that don’t have a regular structure.

Pricing:

Developer: Free, 5k monthly pages, 1 parallel request (maximum number of web pages that can be requested at a time by all your crawlers) and 7 days of data retention.

Web Scraper is a company specializing in data extraction from web pages. It offers 2 great options for our users: free Google Chrome Web Scraper Extension, and cloud-based Web Scraper.

Web Scraper Extension (Free!)

Using the extension, you can create a plan (sitemap) of how a web site should be traversed and what should be extracted. Using these sitemaps, the Web Scraper will navigate the site accordingly and extract all data. Scraped data later can be exported as CSV.

Cloud Web Scraper

Cloud Web Scraper offers top quality results driven at the level you require. This option allows you to extract large amounts of data, run multiple scrapings at once, and even run them on a set schedule!

Image: Webscraper.io Chrome extension in action

Pricing: Chromium extension is free of charge. Prices for cloud service start at $50 for 100,000 pages and goes up to 2,000,000 pages credit worth $250.

Pros:

You develop scripts quickly for pages with a regular structure.

You can play the script and see the behavior directly in Chrome browser.

You can define most elements on a page by just clicking on them.

There will be an API soon to call your webhooks upon scraping job finish.

You don't need knowledge about programming to prepare the scripts.

The data never expire.

Cons:

Sometimes script working in the Chrome extension produces a different output when running in a cloud service.

Some advanced selectors need to be defined by user as xPath or jQuery selector.

It is a web-based platform to extract data from websites without writing any code.

Users enter a URL and the app extract the data that it thinks you need. If the data obtained is not what you needed, you have an interface to click and select the specific data you want to extract. The data collected by users are stored on Import.io's cloud servers and can be downloaded as CSV, Excel, Google Sheets, JSON os accessed via API.

Parsehub is a web scraping software that supports complicated data extraction from sites that use AJAX, JavaScript, redirects and cookies. It is equipped with machine learning technology that can read and analyse documents on the web to deliver relevant data. Parsehub is available as a desktop client for Windows, MacOS and Linux and there is also a web app that you can use within the browser. You can have up to 5 crawl projects with the free plan from ParseHub.

Desktop Application

Lightening-fast and self-service data extraction software for windows designed to easily extract data from websites using CSS selector or REGEX in few minutes.

Advanced Web Scraper (Chrome extension)

A very simple & advanced data scraping extension by Agenty to extract data from websites using point-and-click CSS Selectors with real-time extracted data preview and a quick data export into JSON/CSV/TSV.

Octoparse is a cloud-based web crawler that helps you easily extract any web data without coding in real time. Simulates human operation to interact with web pages. You can use the point-&-click UI to easily bulk extract web data from web pages (including those using Ajax, JS, and etc.) and there are various export formats of your choice like CSV, Excel, HTML, TXT, and database (MySQL, SQL Server, and Oracle).

Octoparse’s cloud service (available in paid editions) can extract and store large amounts of data to meet large-scale extraction needs.

Pricing:

Basic: Free to use. You can run 10 scripts and extract an unlimited number of web pages.

Standard Plan: $75 per month when billed annually or $89 when billed monthly. You can run 100 scripts, 6 cloud servers and extract unlimited web pages.

Professional Plan: $158 per month when billed annually or $189 when billed monthly. You can run 200 scripts, 14 cloud servers and extract unlimited web pages.

Professional Data Service: starting from $299. Contact the company and they will do the work for you.

Dexi.io is a web scraping tool for IT professionals. Delivering the most powerful web extraction (web scraping) tool available. With the web data extraction and robotic process automation (RPA) tool, you can extract and transform data from any source.

Image: Dexi interface in action

Pricing:

Free trial.

Standard: $119 per month (or $105 per month if you paid annually), you can only run one script at a time.

Professional: $399 per month (or $355 per month if you paid annually), you can run three scripts at a time.

Corporate: $699 per month (or $625 per month if you paid annually), you can run six scripts at a time

Competitive pricing, but costs goes up for capacity in terms of running lots of robots simultaneously.

Conclusion

Each of the services offers a slightly different approach and pricing. Some of them will suit your project more, some less. The main goal is to select the best scraping service for your project. Definitely, 2017 can be seen as the year where data extraction from web pages is gaining its place in Kurzor’s company portfolio.

Disclaimer: Prices in the article are from October 2017. We are not paid or otherwise advantaged by any of the services mentioned for promoting them.