Login

Determine Link Relevance and Unique Class C IP using Yahoo Links API

The Yahoo Site Explorer Inbound Links API comes in handy for a variety of purposes. You can get all sorts of data from it using PHP, from counting your backlinks to analyzing where they’re coming from. This article will show you how to build a backlink checker tool that can do this and more.

This is the continuation of a series on the Yahoo Site Explorer Links API tutorial in PHP. It is suggested that you will read and understand the following tutorials if you are new to Yahoo Links API:

In those five articles that are part of the series, you will not find a discussion of how to make a backlink checker tool that will also analyze the inbound link relevance and check if those unique domains belongs to unique class C IP address.

It is not capable of sorting out unique class C IP addresses. However, since the script is capable of extracting the domain name, you can get the IP address of the website server using the gethostbyname() function in PHP.

Using string manipulation functions, you can arrive at the Class C IP address. The following script is added to the original code as follows:

IMPORTANT NOTE: Do not worry about the complete and final source code of this tutorial, it will be provided at the end of this tutorial along with a link.

{mospagebreak title=Determine the Quality and Relevance of Inbound Links}

This is trickier to do. However, the strategy is simple. Yahoo Links API provides a title tag of the inbound link pages. Since a title tag basically tells us what the page is all about, you can analyze the keywords contained in the title tag and then return statistics to the user to provide information on whether the inbound links to the website are related. The code to be added to original script is as follows:

//Extract the title tag from the Yahoo Links API array
//This is placed below the previous code to extract the URL which is:
//$myurl= $display1[$x]['Url'];

$mytitletag = $display1[$x]['Title'];

//You also need to assign the title tag to an array.
//This is placed at the last section just below this line:
//$domainarray[]=$domain; and before $x++
//This $titlearray[] contains all the title tag of all inbound link
//pages to the website

$titlearray[] = $mytitletag;

//Now since there are inbound link pages that comes from the same //domain, hence also contains the same title tag, you need to filter
//unique title tag from the array. This code is placed just outside the //while loop: while ($x < $count) {
//}
//And just before:
//$uniquedomains=array_unique($domainarray);

$uniquetitletags = array_unique($titlearray);

//Start of link relevance analysis

//Reset array

reset($uniquedomains);
echo ‘—————————————————‘;

echo ‘<br />';

echo ‘|THIS IS THE LINK RELEVANCE REPORT FOR ‘.$domainurl;

echo ‘<br />';

echo ‘—————————————————‘;
echo ‘<br />';

echo ‘<br />';

//A short explanation about the link relevance analysis strategy

echo ‘The relevance of your backlinks are computed based on the title tag of your backlinking pages. These title tag are important because it tells us what the backlink page is all about. The keywords from the title tag are then extracted and analyzed.';

echo ‘<br />';

echo ‘If these keyword lists that is sorted by percentages MATCHES with your domain or website topic or niche, then congratulations; your backlinks are relevant to your website.<br />';

echo ‘<br />';

//Combine all words inside the unique title tags array to make it as
//one sentence for the analysis

$sentenceforanalysis= implode(" ",$uniquetitletags);

//Compute the keyword occurrence percentage

//Parts of the code is taken from http://bit.ly/5egil, authored by Tom
//str_word_count($str,1) – returns an array containing all the words
//found inside the string

$words = str_word_count(strtolower($sentenceforanalysis),1);

//Count the number of words

$numWords = count($words);

//array_count_values() returns an array using the values of the input
//array as keys and their frequency in input as values.

$word_count = (array_count_values($words));

//sort the results

arsort($word_count);

//stopwords PHP array by Armand Brahaj found here:
//http://bit.ly/4vNhpu
//It is important to exclude the stop words from the analysis because
//they are not vital for relevance computations
//the script for stopwordslist.php can be found here:
//http://bit.ly/910X2Q
//you need to change the path of the PHP include to reflect your own
//file path

include ‘/opt/lampp/htdocs/backlinkcount/stopwordslist.php';

//now that the stopwords array is in placed
//you need to check if the keywords in the title tag are not stopwords
//first define the $stopwordsarray which will contain the stopwords //found in the title tag keywords

$stopwordarray= array();

//next is to loop through the array

foreach ($word_count as $key=>$val) {

//to count the number of stop words, first you need to gather all the stop words found in the keyword title tags according to //stopwordslist.php
//this is done using a PHP array and assigning the stop words to
//$stopwordarray

//Also check if the word consist entirely of English alphabets, this
//will filter non-words in the title which are not important for //analysis

//Exluded also in the analysis are words that consist of less than 3
//characters which are not also important.
//if all of the above condition are true, the keywords are assigned to
//the $stopwordarray[]

Suppose you are interested in getting all the backlinks pointing to all of the pages of http://www.devshed.com, not just the home page.

So you need to enter http://www.devshed.com under “Enter root domain URL." Do not place the trailing slash “/” at the end of the website’s URL.

Since you are interested in getting all of the backlinks pointing to all of the pages, select the option “Entire website.”

Enter the captcha at the form and click “Submit.” Wait for at most two minutes.

The first thing you should see is this line:
ESTIMATED TOTAL BACKLINKS FROM UNIQUE DOMAINS POINTING TO ENTIRE SITE: 285006

This means that 285006 backlinks are from unique domains, but these domains NOT still checked if they belong to a unique Class C IP address.

The second bolded summary is this: ESTIMATED TOTAL BACKLINKS FROM UNIQUE DOMAINS IN UNIQUE CLASS IP POINTING TO ENTIRE SITE: 157044

So 157044 is an estimate of all domains linking to the pages of www.devshed.com and coming from unique class C IP addresses.

Finally, one of the most important results is the “Link Relevance Report.”

The screen shot of the report above tells us that the pages linking back to www.devshed.com pages have a high percentage of “search,” “tools,” “fedora,” “seo” and “linux” used in their title tags, which are also related to the topics discussed on Dev Shed.