News

Welcome to End Point’s blog

Stuff you can do with the PageRank algorithm

I've attended several interesting talks so far on my first day of
RailsConf, but the one that got me the most excited to go out and
start trying to shoehorn it into my projects was Building
Mini Google in Ruby by Ilya Grigorik.

In terms of doing Google-like stuff (which I'm not especially
interested in doing), there are three steps, which occur in order of
increasing level of interestingness. They are:

Crawling (mundane)

Indexing (sort of interesting)

Rank (neato)

Passing over crawling, Indexing is sort of interesting. You can do
it yourself if you care about the problem, or you can hand it over to
something like ferret or sphinx. I expect it's probably time for me
to invest some time investigating the use of one or more of these,
since I've already gone up and down the do it yourself road.

The interesting bit, and the fascinating focus of Ilya's
presentation were the explanation of the PageRank algorithm and the
implementation details as well as some application ideas. Hopefully I
don't mess this up too badly, but as I understand it, it simplifies
down to something like this.

A page is ranked to some degree by how many other pages link to
it. This is a bit too simple, though, and trivially gamed. So, you
make it a little more complex by modeling the following behavior, a
random surfer will surf from one page to another by doing one
of two things. They will either follow a link or randomly go to a
non-linked page (sort of how I surf Wikipedia). There is a much
higher probability (.85) that they will follow a link than that thay
will teleport (.15). If you model this (hand waving here) then you
come up with a nice formula (more hand waving) that can be used to
calculate the page rank for a page in a given data set. A data set in
this case is a collection of crawled pages.

For large data sets, these calculations can be somewhat intensive,
so we are recommended to the good graces of the Gnu Scientific Library
and the appropriate Ruby wrappers and the NArrary gem to do the
calculations and array management.

One suggestion of a practical applications of this technology is to
apply it to sets of products purchased together in a shopping cart to
provide recommendations of the sort for 'people who bought that also
bought this.' I'm pretty excited to try to implement this in Spree. But...

...what really piqued my interest was the idea that this could be
applied to any graph. The Taxonomies/Taxons/ProductGroups with
products could give me a nice big (depending on the size of the data
set of course) directed graph to play with. The question, I suppose,
is what the PageRank applied against such a graph means.