How does Web Scraping work?

Search engine spiders illustrate the challenge associated with bots. Not all bot activity is bad, in fact some is vital for business success. Good bot activity includes content aggregation for display on aggregation sites or content scraping by affiliates to be used to help them market your products and services.

Malicious web scraping on the other hand, can cause a business to suffer severe financial losses if the data is extracted without consent. Two frequently used methods of malicious web scraping are price scraping and content theft.

Content theft – Bots gather content from your site, such as a piece of journalism or paid-for data, to be used elsewhere without your consent.

Price scraping – Scraper bots target the pricing information of competing businesses to undercut rivals and increase their own sales

How to detect Web Scraping

The complexity and range of web scrapers hitting every website means that we need to look at more than just the behaviour that indicates a visitor is carrying out scraping activity, such as the frequency of requests, or whether they identify themselves as a Googlebot.

Our Intent Analytics engine uses advanced machine learning techniques to detect scrapers and categorise them based on the scraping activity. For instance, what information are they collecting and what patterns are emerging in the collection methods?

How to prevent Web Scraping

Once the activity has been successfully identified, to prevent further Web Scraping we combine information about the unique attack with data from a wide range of industry sources.

This adds an additional layer of insight to the activity categorisation and allows us to successfully establish appropriate bot management policies. With custom whitelisting available, we can ensure that no known affiliates will be stopped.