How is PageRank calculated?

The methodology to calculate PageRank has evolved since the first introduction of Larry Page’s (a co-founder of Google) PageRank patent.

“Even when I joined the company in 2000, Google was doing more sophisticated link computation than you would observe from the classic PageRank papers”

Matt Cutts

The original PageRank calculation would equally divide the amount of PageRank a page held by the number of outbound links found on a given page. As illustrated in the diagram above, page A has a PageRank of 1 and has two outbound links to page B and page C, which results in both page B and C receiving 0.5 PageRank.

However, we need to add one more aspect to our basic model. The original PageRank patent also cites what is known as the damping factor, which deducts approx. 15% PageRank for every time a link points to another page as illustrated below. The damping factor prevents artificial concentration of rank importance within loops of the web and is still used today for PageRank computation.

PageRank and the reasonable surfer model

The way PageRank is currently worked out is likely far more sophisticated than the original calculation, a notable example of this would be the reasonable surfer model, which may adjust the amount of PageRank that gets allocated to a link based on the probability it will be clicked. For instance, a prominent link placed above the fold is more likely to be clicked on than a link found at the bottom of a page and therefore may receive more PageRank.

PageRank simplified

An easy way to understand how PageRank works is to think that every page has a value and the value is split between all the pages it links to.

So, in theory, a page that has attained quality inbound links and is well linked to internally, has a much better chance of outranking a page that has very little inbound or internal links pointing to it.

How to harness PageRank?

If you don’t want to waste PageRank, don’t link to unimportant pages!

Following on from the previous explanation of PageRank, the first solution to harness PageRank is to simply not link to pages you don’t want to rank, or at the very least reduce the number of internal links that point to unimportant pages. For example, you’ll often see sites that stuff their main navigation with pages that don’t benefit their SEO, or their users.

However, some sites are setup in such a way that make it challenging to harness PageRank and below are some implementations and tips that can help you get the most out of PageRank in these kind of situations.

# fragments

What is a # fragment?

The # fragment is often added at the end of a URL to send users to a specific part of a page (called an anchor) and to control indexing and distribution of PageRank.

How to use # fragments?

When the goal is to prevent a large number of pages from being indexed, direct and preserve PageRank, # fragments should be added after the most important folder in your URL structure, as illustrated in example A.

We have two pages:

Example A

www.example.com/clothing/shirts#colours=pink,black

URL with a # fragment

Example B

www.example.com/clothing/shirts,colours=pink,black

URL without a # fragment

There is unlikely to be much, if any, specific search demand for a combination of pink and black shirts that warrants a standalone page. Indexing these types of pages will dilute your PageRank and potentially cause indexing bloat, where similar variations of a page compete against each other in search results and reduce the overall quality of your site. So you’ll be better off consolidating and directing PageRank to the main /shirts page.

Google will consider anything that's placed after a # fragment in a URL to be part of the same document, so www.example.com/clothing/shirts#colours=pink,black should return www.example.com/clothing/shirts in search results. It’s a form of canonicalisation.

if page.php#a loads different content than page.php#b , then we generally won't be able to index that separately. Use "?" or "/"

# fragment URLs should consolidate PageRank to the desired page and prevent pages you don’t want to rank from appearing in search results.

Crawl resource should be focused on pages you want to rank.

Cons:

Adding # fragments can be challenging for most frameworks.

Using # fragments can be a great way to concentrate PageRank to pages you want to rank and prevent pages from being indexed, meaning # fragment implementation is particularly advantageous for faceted navigation.

Canonicalisation

What is canonicalisation?

rel="canonical" ‘suggests’ a preferred version of a page and can be added as an HTML tag or as an HTTP header. rel="canonical" is often used to consolidate PageRank and prevent low-quality pages from being indexed.

How to use canonicalisation?

Going back to our shirt example...

We have two pages:

Example A

www.example.com/clothing/shirts/

Category shirt page

Example B

www.example.com/clothing/shirts,colours=pink,black

Category shirt page with selected colours

Page B type pages can often come about as a result of faceted navigation, so by making the rel="canonical" URL on page B, mirror the rel="canonical" URL on page A, you are signalling to search engines that page A is the preferred version and that any ranking signals, including PageRank, should be transferred to page A.

However, there are disadvantages with a canonicalisation approach as discussed below.

A canonical tag is a suggestive signal to search engines, not a directive, so they can choose to ignore your canonicalisation hints. You can read the following webmaster blog to help Google respect your canonicalisation hints.

Google has suggested that canonicals are treated like 301 redirects and in combination with the original PageRank patent, this implies that not all PageRank will pass to the specified canonical URL.

Even though canonicalised pages are crawled less frequently than indexed pages, they still get crawled. In certain situations, such as large-scale faceted navigation, the sheer amount of overly dynamic URLs can eat into your websites crawl budget, which can have an indirect impact on your site's visibility.

Make sure internal links return a 200 response code

Arguably one of the quickest wins in preserving PageRank is to update all internal links on a website so that they return a 200 response code.

We know from the original PageRank patent that each link has a damping factor of approx. 15%. So in cases where sites have a large number of internal links that return response codes other than 200, such as 3xx, updating them will reclaim PageRank.

As illustrated below, there is a chain of 301 redirects. Each 301 redirect results in a PageRank loss of 15%. Now imagine the amplified loss in PageRank if there were hundreds, or thousands of these redirects across a site.

This is an extremely common issue, but not exclusive, to sites that have undergone a migration. The exception to the rule of losing 15% PageRank through a 301 redirect is when a site migrates from HTTP to HTTPS. Google has been strongly encouraging sites to migrate to HTTPS for a while now and as an extra incentive to encourage more HTTPS migrations, 3xx redirects from HTTP to HTTPS URLs will not cause PageRank to be lost.

Reclaiming PageRank from 404 pages

Gaining inbound links is the foundation for increasing the amount of PageRank that can be dispersed across your site, so it really hurts when pages that have inbound links pointing to them return a 404 error. Pages that return a 404 error no longer exist and therefore can’t pass on PageRank.

Use tools such as Moz’s Link Explorer to identify 404 pages that have accumulated inbound links and 301 redirect them to an equivalent page to reclaim some of the PageRank.

However, unless there is not an appropriate equivalent page to redirect these URLs to, avoid redirecting 404 pages to your homepage. Redirecting pages to your homepage will likely result in little, if any, PageRank being reclaimed, due to the differences in the original content and your homepage.

Things to avoid

rel="nofollow"

The use of rel="nofollow" is synonymous with an old-school SEO tactic, whereby SEO’s tried to ‘sculpt’ the flow of PageRank by adding rel="nofollow" to internal links that were deemed unimportant. The goal was to strategically manage how PageRank gets distributed throughout a website.

The rel="nofollow" attribute was originally introduced by Google to fight comment link spam, where people would try to boost their backlink profile by inserting links into comment sections on blogs, articles or forum posts.

This tactic has been redundant for many years now, as Google changed how rel="nofollow" worked. Now, PageRank sent out with every link is divided by the total amount of links on a page, rather than the amount of followed links.

However, specifically adding rel="nofollow" to a link will mean PageRank that does flow through will not benefit the destination page and thus result in PageRank attrition. Additionally, the attrition of PageRank also applies to URLs you've disallowed in your robots.txt file, or pages that have had a noindex tag in place for a while.

Conclusion

PageRank is still an influential ranking signal and preserving, directing and ultimately harnessing this signal should be apart of any plan when trying to boost your organic search visibility. Every website’s situation is unique, and a one size fits all approach will not always apply, but hopefully, this blog highlights some potential quick wins and tactics to avoid. Let me know in the comment section if I’ve missed anything out?

Get blog posts via email

About the author

Joel joined Distilled as an analyst in August 2018, having spent two years in a previous in-house SEO role. Joel graduated with a degree in marketing, and it was in his internship year that he stumbled upon the world of search.Outside of work, Joel...
read more