Developing a Facebook API Crawler for a subscription based WordPress site

In 2010, Pete Warden created a web application (basically a web scraper) to collect data from different Facebook profiles (Using Scraping Bots) which would show how different countries and regions would interact with each other.

But when this web app was spotted by Facebook, they immediately called up Pete to take it down. He was let off with a warning because he had good connections with the guy on the Facebook team.

If we wanted to create an application ( Say a WordPress Plugin) which would scrape data from Facebook on weekly or monthly basis, we wouldn’t be allowed to do so. At least not without their written permission. Also today, robots.txt are far more restrictive and probably useless to scrape data from a post login page, more so, on a site like Facebook.

So we will need to use alternate methods to extract data from Facebook! But the question that comes is, Is there a need to extract or scrape data from Facebook?

Why to scrape data from Facebook?

Here we are not talking about useless profile data, but data from the business pages on Facebook. The data from such pages can be used to keep an eye on the competition. For instance, If you run a Facebook page for your business then you might be interested to watch out for five other similar pages which have about same number of likes on them. Additionally, this data can be extremely useful for someone looking at lead generation through Facebook To make this possible, collecting data from different business pages and grouping it into different categories is very important. This simple idea itself can turn into a new Business Site which provides free and paid services to search useful business data available on Facebook.

A Subscription Based WordPress Site

The idea is to create a WordPress based website which allows the user to search for their interest based Facebook pages. Roughly the user would be able to perform the following tasks:

Search for Facebook pages/groups that match a certain criteria then sort & filter on results.

All of it can be programmed using conditional statements. In addition to this search mechanism, the website can implement subscription based access. This way the owner can monetize by providing few features for free but advanced features only for premium members. The payments can be handled using Paypal. For this we need to integrate WordPress with Paypal using one of the existing Paypal Plugins.

Using the Facebook API for data extraction

We cannot take advantage of robots.txt of Facebook to crawl their data since its legally not permitted. Also this type of extraction route is probably fixed from their side and its not possible anymore to use this method. Therefore we need to turn to Facebook API for solution. We can use Facebook API to retrieve useful data from pages. Check out this link for different Facebook Graph API examples. Since we are interested in pages and groups, we can take advantage of these two Facebook API graphs –

As you can see, we are able to retrieve some useful data as listed below –

Page Name

Page Likes

People Talking about the Page

Group Owner, etc.

We can run Cron jobs on weekly basis to collect and organize the data into proper groups. And then serve the data to the users via an interface which allows to filter and search the required information. A Cron Job is basically a Linux command for scheduling a script on your server to complete repetitive tasks automatically. An example of Cron Jobs would be the option which you can find in the cPanel of Linux Web Hosting –

Limitations on data to be extracted

There are certain limitations on the data extraction from Facebook. For instance, If you need to list down the total number of people who like a facebook page then that’s practically not possible using the Facebook API. But we might be able to list few of them using a technique which you can read on StackOverflow.

In the age of marketing automation and personalization, brands leave no stone untouched to market themselves on social media. A WordPress based application consisting of such a detailed API integration could be beneficial for brand marketers as well as businesses looking for targeted social lead generation.