Query for Google PageRank

Karthik Nadig

If you have used Google toolbar, you might have used the PageRank tool. It is a nifty tool, which shows PageRank of page or web site you are currently viewing. I was trying to figure out how it works. After googling for it and filtering out the sporadic bits of information, this is what I got.

Suppose you want to find the PageRank of some web site, lets just say Microsoft (www.Microsoft.com). You can either visit the web site and get page rank from the Google Toolbar or use web sites which are designed for this purpose like www.prchecker.info. In either case they generate a query which will look similar to this:

Go ahead click on the above link you’ll get some text in the browser window that appears like this:

Rank_1:1:9

The last number in the above string is the PageRank.

Most parts of the above query remain constant, the values that change are that of ch and q. The the value of q is very easy to generate, but the value of ch will need a bit of work. Let me explain the easy one first. Lets trim the fat and concentrate on the value of q.

info:http%3A%2F%2Fwww.microsoft.com

There are two parts to this value first is the "info:" part which is constant for any query, and the "http%3A%2F%2Fwww.microsoft.com" which is an escaped version of the original query that you requested( i.e., www.microsoft.com). "http://" has to be prefixed to the original query if it was excluded in the original string.

Any string an be easily converted to escaped string in almost all scripting languages. In Perl its uri_escape(), in .Net its Uri.EscapeDataString(), other scripting languages will have similar method to get the escaped string.

These are the steps to generate the value of q:

Check if http:// is prefixed in the input string, if not then include it (original_uri).

Get the escaped version of the input string (escaped_uri).

Value of q = "info:" + escaped_uri.

Now that the easy part is done lets move to the other one, ch. The value of ch in generated by hashing the input uri. So, the input to the hash function will have the following format, "http://….". This is a pretty simple algorithm, I’ll try to make it as generic as possible. I have broken the algorithm into 3 parts, Hash1, Hash2 and MainHash.

Hash1:

Input: Unsigned Integers a, b, c.

Output: Unsigned Integers a, b, c.

Algorithm:

1: a = a - b

2: a = a - c

3: a = a xor (c right_shift 13)

4: b = b - c

5: b = b - a

6: b = b xor (a left_shift 8 )

7: c = c - a

8: c = c - b

9: c = c xor (b right_shift 13)

10: a = a - b

11: a = a - c

12: a = a xor (c right_shift 12)

13: b = b - c

14: b = b - a

15: b = b xor (a left_shift 16)

16: c = c - a

17: c = c - b

18: c = c xor (b right_shift 5)

19: a = a - b

20: a = a - c

21: a = a xor (c right_shift 3)

22: b = b - c

23: b = b - a

24: b = b xor (a left_shift 10)

25: c = c - a

26: c = c - b

27: c = c xor (b right_shift 15)

Output: values of a, b, c

Hash2:

Input: String (character indexing should be allowed)

Output: Unsigned Integer

Algorithm:

1: Unsigned Integer a = 0x9e3779b9

2: Unsigned Integer b = 0x9e3779b9

3: Unsigned Integer c = 0xe6359a60

4: k = 0

5: len = Length of input_string

6: Unsigned Integers va,vb,vc

7:

8: Repeat until len >= 12

9:

10: va = input_string[k + 0]

11: va = va | (input_string[k + 1] left_shift 8 )

12: va = va | (input_string[k + 2] left_shift 16)

13: va = va | (input_string[k + 3] left_shift 24)

14: a = a + va

15:

16: vb = input_string[k + 4]

17: vb = vb | (input_string[k + 5] left_shift 8 )

18: vb = vb | (input_string[k + 6] left_shift 16)

19: vb = vb | (input_string[k + 7] left_shift 24)

20: b = b + vb

21:

22: vc = input_string[k + 8]

23: vc = vc | (input_string[k + 9] left_shift 8 )

24: vc = vc | (input_string[k + 10] left_shift 16)

25: vc = vc | (input_string[k + 11] left_shift 24)

26: c = c + vc

27:

28: a, b, c = Hash1 a, b, c

29: k = k + 12

30: len = len - 12

31:

32: end of loop

33:

34: c = c + Length of input_string

35:

36: if (len > 10) c = c + (input_string[k + 10] left_shift 24)

37: if (len > 9) c = c + (input_string[k + 9] left_shift 16)

38: if (len > 8 ) c = c + (input_string[k + 8] left_shift 8 )

39: if (len > 7) b = b + (input_string[k + 7] left_shift 24)

40: if (len > 6) b = b + (input_string[k + 6] left_shift 16)

41: if (len > 5) b = b + (input_string[k + 5] left_shift 8 )

42: if (len > 4) b = b + (input_string[k + 4])

43: if (len > 3) a = a + (input_string[k + 3] left_shift 24)

44: if (len > 2) a = a + (input_string[k + 2] left_shift 16)

45: if (len > 1) a = a + (input_string[k + 1] left_shift 8 )

46: if (len > 0) a = a + (input_string[k])

47:

48: a, b, c = Hash1 a, b, c

Output: Unsigned Integer c.

MainHash

Input: String (character indexing should be allowed)

Output: Unsigned Integer

Here that algorithm get a little bit complicated. So, I’ll explain it in bit more detail. First two steps are direct, others need explanation, so here they are:

Note: This is a sample table, you need to generate this table for each query.

To generate the table, first step will be to apply the value of "x" in the function f(x, c), convert the result to 32 bit Hex values and reorder the bytes in little-endian format. Next step is to convert the individual Hex bytes to corresponding characters, which will give you a string that will appear similar to that shown in the last column. Finally you have to append the strings together.

After appending all of the strings together the string will appear as shown below.

úñèßÖ….

Let me show what happens to the values in each step of five steps in this algorithm with http://www.microsoft.com as input. Remember this is the main search query not the input to this algorithm. Input to this Hash algorithm will come from another function, I’ll get to that later.

This entry was posted on Tuesday, December 30th, 2008 at 08:53and is filed under . You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.

One Response to “Query for Google PageRank”

While browsing we had come across your resume and found that you had done a project on offline handwritten character recognition using opencv which we have currently doing as our final year project. So could you please provide us with details on how you went about it.It will be very helpful.You can mail me. thank you.