Scientist Finds PageRank-Type Algorithm from the 1940s

February 17, 2010

Scientist Finds PageRank-Type Algorithm from the 1940s

Google’s PageRank algorithm was developed in 1998. But a project to trace the history of such algorithms reveals an example from the 1940s.

The PageRank algorithm is a key part of Google’s method of ranking web pages in search results. It uses the network of links between web pages to determine their value and, famously, judges a page to be important if it is linked to by other important pages.

One crucial feature of this idea is that it requires an iterative approach to constantly re-evaluate the value of a page as the importance of others varies. Iterative ranking algorithms have since become an important part of network theory.

PageRank was developed in 1998 by Google’s founders Sergey Brin and Larry Page and its impact has been such that it’s easy to forget that the approach was not entirely novel. Massimo Franceschet at the University of Udine in Italy points out that the idea has been successfully exploited a number of times in 20th century science, even before Brin and Page were born. Today, he presents a short history of iterative ranking algorithms and charts their evolution prior to Google’s emergence.

He begins in reverse chronological order with the work of Jon Kleinberg, a computer scientist at Cornell University, who developed an almost identical approach to PageRank, just a few years earlier. Brin and Page even reference his work in their famous paper introducing PageRank.

Kleinberg called his algorithm Hypertext Induced Topic Search or HITS and it treated web pages as “hubs” and “authorities”. It used the circular definition that authorities are pages that are pointed to by hubs and hubs are pages that point to authorities and requires an iterative approach to solve.

In the heady days of the dotcom boom in the late 20th century, before Google became so successful, Kleinberg’s work received considerable media coverage.

Franceschet also examines the work of Gabriel Pinski and Francis Narin who developed a way of ranking journals. Their rule was that a journal is important if it is cited by other important journals. Like PageRank and HITS, this requires an iterative method to exploit the structure of links between journals to come up with a ranking.

Long before this, however, Charles H Hubbell at the University of Califronia , Santa Barbara, was analysing social networks in a similar way. In 1965, he published a technique for determining the importance of individuals based on the importance of the people who endorse them. This again has the characteristic circular definition and iterative solution. Hubbell is acknowledged by many including Kleinberg as a pioneer in iterative ranking theory.

But the big surprise is Franceschet’s discovery of an even earlier forerunner to PageRank in the work of the Harvard economist Wassily Leontief. In 1941, Leontief published a paper in which he divides a country’s economy into sectors that both supply and receive resources from each other, although not in equal measure. One important question is: what is the value of each sector when they are so tightly integrated? Leontief’s answer was to develop an iterative method of valuing each sector based on the importance of the sectors that supply it. Sound familiar? In 1973, Leontief was awarded the Nobel Prize in economics for this work.

What’s clear is that the ideas behind PageRank have a venerable history but the surprise is that they date back to at least the 1940s. It’ll be interesting to see if anybody can find any similar work that predates this.