United States of America

United Kingdom

Japan

Canada

Germany

See all locations

Callback vs. Real-Time: Best Data Delivery Methods

Gabija Fatenaite

May 28, 20197 min read

Share post:

Data is the new king of this era, and many businesses are quite aware of the new monarch. In order to grow or stay amongst the top players of the market, data collection and analysis has become the necessary solution for many companies.

For this reason, businesses turn to build proxy infrastructures to gather the needed data. However, maintaining a proxy infrastructure is quite expensive, and a more cost-efficient solution is often sought after.

Luckily enough, such solutions exist. Heavy duty all-in-one scrapers and crawlers provide businesses with the option to extract data from targeted websites without the need to implement proxies. Oxylabs’ Real-Time Crawler and Web Scraper are exactly such solutions.

What is Real-Time Crawler?

Real-Time Crawler is a data collection tool specifically built for data extraction from search engines and e-commerce websites. It is a customized scraper designed for heavy duty data retrieval operations.

What is Web Scraper?

It is a swift and easy solution to scrape any target of your choice. It is also a little easier solution than Real-Time Crawler. Widely popular when there is a need to scrape a lot of targets, with the added benefit of forgetting to manage the whole proxy infrastructure.

How does Web Scraper work?

Web Scraper is very similar to Real-Time Crawler, just a bit easier to manage. All you need to do is give us a URL, and the Web Scraper gives back the data in HTML format.

The differences between data crawling and data scraping

When it comes to defining web scraping and web crawling, there are three main differences to look out for:

Web scraping

Web crawling

Only “scrapes” the data (takes the selected data and downloads it)

Only “crawls” the data (goes through the chosen targets).

Can be done manually, by hand.

Can be done only with a crawling agent (a spider bot).

Deduplication is not always necessary as it can be done manually, hence in smaller scales.

A lot of content online gets duplicated, and in order to not gather excess, repeated information, a crawler will filter out such data.

Delivery methods

Knowing the solutions, let’s learn the methods. Real-Time Crawler has two delivery methods: callback data method and real-time scraping. The differences between real-time scraping and callback data methods are as follows:

Real-Time data delivery method

With the real-time data delivery method, the required data is retrieved on the same connection.

This means that you submit your request and get your data back on the same open HTTPS connection, so you get real-time web scraping.

Callback data delivery method

With the callback data delivery method, you don’t have to keep an open connection or check your task status. Instead, Real-Time Crawler sends a notification when the required data is ready.

Keep in mind that in order to use the callback data delivery method, you have to set up a callback server. Then, you simply create a job request and send it to Real-Time Crawler. Real-Time Crawler returns job info and starts collecting the required data.

Once the data is ready, Real-Time Crawler lets you know about it by sending a POST request to your machine and providing a URL to download the results in HTML or JSON format.

Using Real-Time Crawler for e-commerce websites

Real-Time Crawler was built having e-commerce sites in mind. It’s currently customized to support data extraction from the most popular retail marketplaces. However, our team can always offer a custom solution for you.

With Real-Time Crawler, you can extract data from product pages, product offer listing pages, reviews, questions & answers, search results, or from any URL in general. All localized domains and pagination are supported. Historical pricing data is stored as well.

Using Real-Time Crawler for search engines

As with e-commerce websites, Real-Time Crawler is currently customized to support the most popular search engines. You can retrieve paid and organic SERP data, extract ranking data for any keyword in raw HTML or formatted JSON format.

Real-Time Crawler for search engines allows you to discover the most profitable keywords and track their performance. It supports any number of requests done for any location and keyword.

What to choose: Real-Time Crawler or Web Scraper?

When deciding which solution to choose, Real-Time Crawler or Web Scraper, it all comes down to what targets you intend to scrape. As specified above, Real-Time Crawler is specifically built for search engines and e-commerce websites. So, if heavy duty real-time scraping or callback data retrieval is required, it’s best to work with Real-Time Crawler.

However, if you require a quick and easy data extraction solution for any target of your choice, Web Scraper is the way to go. Just like Real-Time Crawler, with Web Scraper there is no need to build a scraper tool as it’s all set up for you, so no coding required.

How our clients use Real-Time Crawler

Based on our quarterly data, it is safe to state that web scraping continues to be an effective method to gain valuable insights into consumer preferences and needs, market research, and other fundamental factors.

A data analysis conducted by our Research Department has found that when comparing Q1 2019 to Q4 2018, the average traffic volume increased by 4.74%, and total requests grew by 7.02%.

Requests

Statistically in January, after the busy festive period, the e-commerce industry experiences a stagnation, as consumers’ spending power diminishes due to decreased discretionary income.

This particular Q1 statistics indicate no exception, as the number of requests during the first calendar’s month was recorded to be relatively low, and was steadily increasing throughout the upcoming months.

As you can see, overall requests were fluctuating throughout the Q1 period. This can be explained due to targeted websites changing their structure and altering or removing specific parameters. Accordingly, this has a direct impact on the request volume inconsistency, which can be notably observed from the start of February to closing stages of the Q1.

Traffic

In February 2019, traffic volume significantly increased mainly due to e-commerce industry economic stimulation. As you can see in the traffic graph, two noticeable spikes were recorded – 6th and 13th of February.

These spikes are related to market research, and pricing intelligence carried out data operations, right before Valentine’s Day celebrations. Our clients were gathering data to timely respond to their direct competition’s pricing changes in order to stay competitive and drive the volume of sales.

Wrapping up

When choosing whether to use real-time or callback methods, it’s essential to know what you think will work best for you. If you wish to collect data on the same connection and get an immediate response from the Real-Time Crawler – real-time method is the way to go.

However, with the callback method, you don’t have to keep an open connection or check your task status, and the Real-Time Crawler will send you a notification when the data is ready. Just keep in mind that you’ll need to set up a callback server for this method.

Our team is always here for you!

If all of this seems just a little bit confusing or you wish to learn more about the most effective ways of extracting data from the web, don’t hesitate to get in touch with us via [email protected]. Our amazing sales and account managers will get you sorted in no time!

About Gabija Fatenaite

Gabija Fatenaite is a Content Manager at Oxylabs. Having grown up on video games and the internet, she grew to find the tech side of things more and more interesting over the years. So if you ever find yourself wanting to learn more about proxies (or video games), feel free to contact her - she’ll be more than happy to answer you.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

New feature!

Fast and easy access to over 70M+ residential proxies.

Close

Fill in the form

1

Tell us about yourself

2

Schedule an intro call

3

Enjoy block-free scraping

!

This field is required

!

Please enter your business email

!

This field is required

!

This field is required

This field is required

Error

By filling in this form you agree with Oxylabs.io to process your personal data. Provided data will be processed with the purpose of administering your inquiry, informing you about our services and presenting you with the best proxy solutions. You can withdraw your consent to process personal data at any time. For more information on your rights and data processing, please read our Privacy Policy.