Placement, PageRank, and the PGR: Part 1

Suppose you pick a philosophy PhD program at random and you go visit
their website. There you pick a random person from the faculty list and
see where they got their PhD. Then you go to that program’s website and
repeat the exercise: pick a random faculty member, see where they did
their PhD, and go to that program’s website. And again, and again.

You’d come back to some programs more often than others in the long run.
Which ones? Those whose graduates are in demand at other programs whose
graduates are in demand at still other programs etc. In other words, the
programs you keep coming back to are central in the hiring network of
philosophy PhD programs.

This is the idea behind Google’s famous “PageRank” algorithm. In a
classic
paper,
Google’s founders imagined a web surfer starting at a random page,
following a random link from there, then following a random link found
on the new page, and so on. Pages where the surfer winds up more often
are more likely to be of interest to the users of a search engine.

Wallace’s idea was to apply the same algorithm to the
APDA’s placement data. And he found
that the resultant rankings correlated closely with the Philosophical
Gourmet Report’s ratings from
2006.

These posts expand on the idea in two parts. This post explains the
theory behind the PageRank algorithm, and reproduces Wallace’s rankings.
The next post considers the possibility of
predicting a department’s PageRank, using either past PGR scores, or
past placement data fed into the PageRank algorithm itself.

Motivation

Why care about a department’s PageRank? You might think it’s an
indicator of a program’s “quality”, but I’m more interested in its
potential use to students. Some want to become professors at programs
where they will train PhDs who go on to do the same.

I’m also just intrigued by the PageRank algorithm itself. It’s
mathematically very nifty, and it has a delightfully philosophical
flavour too. It takes a problem that looks like a vicious regress and
shows how to give it a virtuous grounding.

So these posts are partly an exercise for me, to learn about the
algorithm and the math behind it. Plus I just like playing with data.

Theory

There are two ways to understand the PageRank algorithm. First is the
random surfer idea already described. The second we might call the “vote
of confidence” model.

When program X hires a graduate of program Y, that’s a vote of
confidence for program Y. But this vote carries more weight if program X
itself is well regarded. So we have to see where program X’s own
graduates have been placed, and determine how much confidence people
have in those programs. And so on.

The threat of regress looms. How do we break out? With a dash of high
school algebra.

Imagine we have just three programs, A, B, and C. Program A got 80% of
its faculty from B, and 10% each from C and from A itself. B is similar,
except… well, here’s the whole story:

A’s Faculty

B’s Faculty

C’s Faculty

PhD from A

10%

20%

30%

PhD from B

80%

70%

40%

PhD from C

10%

10%

30%

Thinking row-wise, A cast 10% of its votes for A so to speak; B cast 20%
of its votes for A; and C cast 30% of its votes for A. But how much
weight does a vote from program A carry, compared to say a vote from B?
Labeling these unknown weights $w_A$, $w_B$, and $w_C$, we get a linear
equation: $$ w_A = .1 w_A + .2 w_B + .3 w_C. $$ Applying the same
formula to B and C, and we get a system of three linear equations, in
three unknowns: $$
\begin{aligned}
w_A &= .1 w_A + .2 w_B + .3 w_C.\\
w_B &= .8 w_A + .7 w_B + .4 w_C.\\
w_C &= .1 w_A + .1 w_B + .3 w_C.
\end{aligned}
$$

Solving this system we find that $(w_A, w_B, w_C) = (0.19, 0.68, 0.12)$
is the only solution with positive weights. So we’ve determined how much
weight a vote from each program carries. And unsurprisingly, votes from
program B carry by far the most weight.

Will the same reasoning work no matter how many programs we have, and no
matter what their hiring patterns are like? More or less yes. We have to
upgrade the math a bit and add a small tweak. But the heart of the idea
is pretty much the same.

General Solution

In the general case, we have $n$ programs and we want to give each
program $i$ a weight $w_i$ that reflects the votes of confidence it’s
received. Placement data tells us for any two programs $i$ and $j$ the
portion of faculty at program $j$ hired from program $i$, call it
$m_{ij}$. Since the weight of program $i$ is the weighted sum of these
numbers, we have a set of $n$ linear equations with $n$ unknowns:
$$
\begin{aligned}
m_{11} w_1 + m_{12} w_2 + \ldots + m_{1n} w_n &= w_1,\\
m_{21} w_1 + m_{22} w_2 + \ldots + m_{2n} w_n &= w_2,\\
\vdots & \\
m_{n1} w_1 + m_{n2} w_2 + \ldots + m_{nn} w_n &= w_n.
\end{aligned}
$$
In matrix terms, we’re looking to solve the equation
$\mathbf{M} \mathbf{w} = \mathbf{w}.$ This is an
eigenvector
equation $\mathbf{M} \mathbf{w} = \lambda \mathbf{w}$, where the
eigenvalue $\lambda = 1$. And a famous
theorem
guarantees that a unique solution exists, given assumptions to be
discussed momentarily.

The same theorem also guarantees that this solution is the result of the
random-surfing exercise we opened with. This is why we can understand
PageRank using either of our two interpretations. The weight $w_B$ will
also be the frequency with which our random surfer comes back to program
B’s faculty page.

Why? Well suppose we start on some random program’s page, say program A.
We represent this with the vector $\mathbf{p} = (1, 0, 0)$, because
right now we’re on program A’s page with probability 1. Then we start
our random walk. What is the probability we’ll arrive at A, or B, or C
next? To find out we multiply $\mathbf{M}$ against $\mathbf{p}$: $$
\begin{aligned}
\mathbf{M} \mathbf{p}
&=
\left(
\begin{matrix}
0.1 & 0.2 & 0.3\\
0.8 & 0.7 & 0.4\\
0.1 & 0.1 & 0.3
\end{matrix}
\right)
\left(
\begin{matrix}
1\\
0\\
0
\end{matrix}
\right)\\
&=
\left(
\begin{matrix}
0.1\\
0.8\\
0.1
\end{matrix}
\right).
\end{aligned}
$$ To find out where we’ll probably be at the next step, we multiply
this result by $\mathbf{M}$ again, i.e. we compute
$\mathbf{M}(\mathbf{M}\mathbf{p})$. Since matrix multiplication is
associative, this is the same as $\mathbf{M}^2 \mathbf{p}$. And in
general the probabilities of the $k$-th step are given by
$\mathbf{M}^k \mathbf{p}$.

Our theorem guarantees now (given conditions still to be specified):
$$\lim_{k \rightarrow \infty} \mathbf{M}^k \mathbf{p} = \mathbf{w},$$ where
$\mathbf{w}$ is the solution to $\mathbf{M}\mathbf{w} = \mathbf{w}$ we
found earlier. In other words, as the random walk progresses, the
portion of time spent at a program’s page converges to the weight that
program’s votes carry.

Importantly, this convergence happens no matter where the random walk
starts. In fact we get the same convergence for any probability vector
$\mathbf{p}$ we might start with. (A probability vector is a list of
non-negative numbers that sum to one.)

So what conditions guarantee this happy convergence? Our matrix
$\mathbf{M}$ is
stochastic, meaning
its columns are probability vectors: they contain nonnegative numbers
that sum to one. Given that, it suffices for $\mathbf{M}$ to be
regular, meaning
$\mathbf{M}^k$ has all positive entries for some positive integer $k$.
This is equivalent to it being possible to get from the page of any one
program to any other, with positive probability.

In reality this condition may well fail, but there’s any easy fix. We
just have our random surfer occasionally start over, picking a new
program at random to start their surfing from. Then there’s always a
positive chance of ending up anywhere, however briefly.

Application

Let’s put this theory into practice. We need a list of PhD programs, and
the number of hires from each of them, by each of them. Happily the APDA
provides a table of just such data up through
2016.

It also includes “selfies”: when a program hires one of its own PhDs.
Some selfies are certainly legitimate for our purposes, but others
probably aren’t the sort of thing we’re after here. For example, KU
Leuven stands out as having far more selfies than any other program save
Oxford. Judging from Leuven’s website, most of these posts are not
permanent positions, nor the sort of highly desirable fellowships Oxford
often hires its own graduates into.

In this analys I’m going to exclude selfies. This is a bit arbitrary,
but not entirely. It has a big, negative impact on KU Leuven, not nearly
so much Oxford. Disclosure: it also slightly favours the University of
Toronto, where I work.

That in mind let’s see our top 10 PageRank programs, alongside Wallace’s
results for comparison:

Wallace

Weisberg

New York University

New York University

Columbia University

Columbia University

Princeton University

Princeton University

Yale University

Yale University

Katholieke Universiteit Leuven

University of California, Berkeley

University of California, Berkeley

Rutgers University

University of Oxford

University of Pittsburgh (HPS)

Rutgers University

University of Toronto

University of Pittsburgh (HPS)

University of Oxford

University of Toronto

Harvard University

The match is quite close. And the two big differences (Oxford and
Leuven) are explained by the exclusion of selfies. So we seem to have
implemented the algorithm correctly.

Next time we’ll look at predicting PageRanks, based on past PGR ratings,
and past placement data. Meanwhile, here’s the full listing: