Relevancy Modified MozRank: A Smarter Metric for Rank Analysis

We have been working for a while now on our own internal correlation study in partnership with Trident Marketing and Fuzzy Logix. In working on this project, time and time again it shocked me how crude our current ranking metrics are. We finally have good raw data from sources like SEOMoz and Majestic SEO but we have only begun to scratch the surface of how Google uses these types of data to organize and create search rankings.

In the same way that SEOMoz identified a simple statistical modeling technique known as Latent Dirichlet Allocation as a likely candidate for how Google models topic relevancy, we have been looking for similar statistical techniques that Google is likely to use in turning raw link data into metrics more suitable for ranking pages. The easiest way to do this has been to look at the language of SEO’s and try and translate what we intuitively believe into statistical algorithms.

Anchor Text Relevancy

It is generally believed by SEO’s that exact match anchor text links is one of the most important ranking metrics. SEOMoz’s correlation study seems to bear this out. However, many SEO’s will go on to explain that “relevant” anchor text matters as well, and SEOMoz’s study similarly tries to back this up with “partial match” anchor text (ie: baseball card would be a partial match to either baseball team or birthday card).

You can be certain that Google’s anchor text relevancy algorithm is probably more sophisticated than a string-in-string search, so we decided to look to statistics for string comparison tools with which we can modify the mozRank passed by a link in a way that provides extra value to more relevant anchor text. We call this Relevancy Modified MozRank.

The Contenders

There are several statistical methods we can use to model anchor text relevancy, three of which we discuss here.

Levenshtein Distance: The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. (via wikipedia) For example, if we were to compare the anchor text “baseball” to “baseball card”, the Levenshtein Distance would be 5 (adding a space and the 4 letters c,a,r and d). The Levenshtein Distance between “Baseball” and “Baseball” would be 0. This is also called the Edit Distance

Jaro Winkler Distance: The higher the Jaroâ€“Winkler distance for two strings is, the more similar the strings are. The Jaroâ€“Winkler distance metric is designed and best suited for short strings such as person names. The score is normalized such that 0 equates to no similarity and 1 is an exact match. (via wikipedia

Smith Waterman Algorithm: The Smithâ€“Waterman algorithm is a well-known algorithm for performing local sequence alignment; that is, for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smithâ€“Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure. (via wikipedia We found this to be a unique way to find relationships between parts of strings such as word stems, tense, etc.

Notably Not Included: Latent Semantic Analysis – working on this one, it is a little bit more complicated 😉

Computing Relevancy Modified MozRank

Since the three measurements above render different scores on different scales, we have to compute them differently. First, we compute the Levenshtein Distance, Jaro Winkler Distance, or Smith Waterman score for the ranking keyword and the anchor text used.

Smith Waterman modified MozRank (SWAmmR): We use a simple measurement of raw mozRank multipled by the Smith Waterman score (rmR*SWA)

In the above picture (click to enlarge) you can see the LD, JWD, and SWA modified mozRanks of various pages on the right hand side. Notice that we have no external exact match anchor text to work with in the left columns, but Google has plenty of relevancy data to work with by using these kinds of anchor text relevance measurements.

Takeaways

We have long thought that Google is using the relevancy of anchor text in determining how much link juice to pass. You don’t need the exact anchor text to look relevant to Google – in fact you don’t need any at all. This does not mean exact match links don’t help, it merely means that your strategy shouldn’t rely solely upon it. We will keep you all updated as we find more sophisticated measures and start to compare the correlation of these types of modified mozRanks to actually ranking.

6 Comments

An excellent study. Can you forsee these types of modified MozRank results being the new standard and being adopted by SEOMoz in their tools and APIs? That would be very useful to see.

Author Response: While it would be cool to see something like this in the API, I highly doubt it. Each Relevancy Modified MozRank is scored independently, so it is not as if they could simply store all of this data – it would probably be calculated on the fly and cached. It is probably better to simply sign up for the Site Intelligence Pro API and calculate teh figures yourself.

Great work. I’ve been a bit Leary of delving too deep into the more serious side of how we build analytic assumptions, but my formal BS/MS degrees are social science related, so I need to dig back into what I learned in the days of text books.

Read this on a Droid X walking in the pitch dark with reading glasses, so wasn’t optimal environment for thinking through take-aways.

Really interesting experiment! Hoping you can clarify one point for me…

You mention “In order to compute the Relevancy Modified MozRank of a page, we merely find all of the backlinks, get the mozRank passed of those backlinks, pass the anchor text for each link through a relevancy measurement tool, and then add them all together. “.

It sounds from that as though you are adding MozRank values (from different links) to get a final (adjusted) MozRank score for a URL. However, MozRank isn’t linear (I believe the log base is ~8.5), so you can’t add values. Did you account for that? If not it probably threw out your results, in which case I’d love to see a repeat of the experiment!

Thanks again for the good read.

Author Response: We use Raw MozRank rather than Pretty MozRank. (ie: UMRR rather than UMRP in the API). Good question.

Great post and I love the concept of factoring in a relevancy score. It really back my stance in that relevancy is a KEY ranking metric right now. The bottom line is that you need links from relevant sites. It is no longer enough to just get links from authority sites, they have to also be relevant!