Simple SERP Tracker PHP class

Recently I had some thoughts about the way the searching for a position should be, and since I couldn’t stop thinking about it, I’ve decided to to come up with something I believe should be a good and accurate solution for this. I wrote most of this class in the plane on my way to Denver, CO using my trusty tablet, with an app called WebMaster’s HTML Editor and another called View Web Source to view the source of the results. Overall writing a PHP class on android tabled works, but I found out that it takes a lot more time it should take because of the lack of keyboard and copy/paste solution.

Long story short, I created a simple Simple SERP tracker class and after some cleaning of the code generated in the plane, I decided to share my experience with you. Before we move on, let me start with some theory:

What is SERP?

A search engine results page (SERP), is the listing of web pages returned by a search engine in response to a keyword query. The results normally include a list of web pages with titles, a link to the page, and a short description showing where the Keywords have matched content within the page. A SERP may refer to a single page of links returned, or to the set of all links returned for a search query.

What is SERP Tracker?

The tracker crawls trough every page on the search results for a specific keyword and it looks for the first appearance of your site on it, effectively replacing the need of doing this by “hand”.

Ok, so now that we know this, let’s start creating the class itself. Every class like this needs to perform at least three basic functions: crawl, parse and find. Below you will find description for each one of those:

Parse

Gets the array with URLs with the specific keyword to be searched from the crawl() method , processes it and passes the resulting HTML to the crawl() method.

Crawl

Gets the html, sends it to the find() method and waits for the result. It decides if it should pass another set of URLs based on the result of the method.

Find

Looks into the provided HTML for a specific string (a website URL in our case) and gives the result back to the crawl(), in order for it to continue searching or stop, depending of the result. This method will process the given HTML differently for each search engine, but it will return the same results: the position of the result (if found), or FALSE. This method will be abstract in the parent class because of it’s nature.

Those functions are generic, and are used for every search engine, so we need to create an abstract class with all the needed requirements, which later will be extended for a specific search engine. You can see them in the parent abstract class:

It sets the basic parameters such as keywords, the url of the site we are searching for, the limit of the results to search in and the start time of the execution. At the end, it runs initial_check()

initial_check()

Makes sure that the URL supplied by the child class using the abstract method set_baseurl() contains the required keywords “keyword” and “position”. This URL will be used as a template to generate the actual URLs for the crawl() method. If the requirements are not met, it will stop the execution.

use_proxy($file = FALSE)

Making a lot of requests to a search engine raises a red flag, so eventually you will get a 302 redirect from it (Google redirects the user to a page with captcha, to make sure that the user is not a bot). One of the most effective ways to combat this, is to use proxy. If you run the method from withing the child class, supplying a txt file with proxy IP’s, the class will use a random line from it, before it makes the request to the search engine.

parse(array $single_url = NULL)

One of the important functions in the class: it initializes a new cURL multi handle, allowing us effectively to perform multiple requests to the search engine. It uses the $this->baseurl, which contains array of already pre-made URL’s for every keyword supplied in the constructor. As a result it returns another array with the HTML strings for every result page of the request. We can override $this->baseurl if $single_url is supplied as argument.

crawl()

Another important method, mentioned earlier – it takes he resulting array of the parse() method, and it passes every HTML string from it to the find() method in the child class. Based on the result, it will end the search for a specific keyword and remove it from the $this->baseurl array, or it will grab another HTML from the parse() result and feed it to find(). It will execute itself while changing the current page of the search until it finds all the keywords, or it hits the limit of the results set in the __constructor() as $limit.

setup()

All this method does, is to get all the keywords and build the current array. The initial array is build based on the keywords from the constructor. Later in the process, this is done using only the keywords not found for every run of the crawl() method.

run()

This only starts the crawl() process. One of the few public methods in the class.

get_results()

Returns the array with the results from the search.

get_debug_info()

Returns an array with some debug info – in this case, the time it took for certain keyword to be found.

We also have two abstract methods:

set_baseurl()

The URL for every search engine is different, and so is the syntax of the search terms. In order to make the class more generic, this method should provide a string with two keywords – “keyword” and “position”, which will be later replaced with the actual values in setup() for every specific URL.

find($html)

Every search engine returns the results differently, so this method takes a generic HTML string, and looks for the specific URL of the site. Sometimes it is better to use Regex, sometimes it’s better to traverse DOM.
Whatever the case, the possible outcome for every search engine result is either the result to be found or not. The result of this method should be FALSE (if nothing found) or the position of the result on the current HTML.

So let’s say that I want to create a SERP Tracker for Google – all I need to do is to extend the abstract class, and pass the two methods – set_baseurl() and find(). The class will do the rest:

<?php
class GoogleTracker extends Tracker
{
function set_baseurl()
{
// use "keyword" and "position" to mark the position of the variables in the url
$baseurl = "http://www.google.com/search?q=keyword&start=position";
return $baseurl;
}
function find($html)
{
// process the html and return either a numeric value of the position of the site in the current page or FALSE
$dom = new DOMDocument();
@$dom->loadHTML($html);
$nodes = $dom->getElementsByTagName('cite');
// found is false by default, we will set it to the position of the site in the results if found
$found = FALSE;
// start counting the results from the first result in the page
$current = 1;
foreach($nodes as $node)
{
$node = $node->nodeValue;
// look for links that look like this: cmsreport.com › Blogs › Bryan's blog
if(preg_match('/\s/',$node))
{
$site = explode(' ',$node);
}
else
{
$site = explode('/',$node);
}
$urls[$current] = $site[0];
if($site[0] == $this->site)
{
$found = TRUE;
$place = $current;
}
$current++;
}
if(isset($found) && $found !== FALSE)
{
return $place;
}
else
{
return FALSE;
}
}
}

This will look for ‘git’ in the first 50 results in Google, and it will report the position and the time it took to find it. I hope that this article will help you and if you can think of more ways to improve it, please leave a comment, or if you’d like to lend a hand, simply fork my repository below, hack away and contact me when you’d like to merge something.

Post navigation

12 comments for “Simple SERP Tracker PHP class”

Nice script. I made a similar script back in December.. Works great however it gets confused based on the geographic location. So say if your server is in Orlando, Florida and you run the script to check your SERP in Nevada you’re going to come up with totally different results. I’m trying to figure out how to manipulate Google’s Geolocation to send the results based on IP.

In a way the script is almost pointless unless you have local clients; or have national exposure for the specific keyword you’re checking..

You have a point about the results, even though I think the “localized” search results are based on the IP. So if you want to check your SERP in FL, you can use a proxy which is based there, I guess… Let me know what you think.

I absolutely love your site.. Great colors & theme. Did you develop this site yourself? Please reply back as I’m attempting to create my own personal site and would like to learn where you got this from or what the theme is named. Appreciate it!

It seems that now it returns an empty array.
However google still uses and it does not seem to be a parsing error.
Any help would be greatly appreciated 🙂

Andrey Voev

June 28, 2018 at 10:48

Hello Emil,
Thanks for your comment. I will take a look to see if this is something easy to be fixed, but considering the fact that this script was created in 2011, I would think that there will be a lot of changes since then.