Does the Ownership of Redirected URLs Matter to Search Engines?

Webmasters sometimes move web sites from one domain to another, change the URL structures pointing to their web pages, or rename those pages themselves.

Changing the URLs for pages isn’t something that should be done without a lot of thought, and without very good reasons. Especially if there are many links and references on the Web to the old URLs. See Cool URIs don’t change for a number of technical ideas on planning what to use for your URLs so that it’s less likely that you might need to change them.

Regardless, webmasters do sometimes change the URLs for pages found on the Web.

This can sometimes happen when the owner of a site decides to change its name, or to rebrand its products, or merges or acquires another site or business and wants to consolidate the web pages from the other site under one name. It can also happen when a blogger decides to change the permalink structure of their URLs. Sometimes product lines are renamed, and the sellers of those products want people looking for them to find the products under the new names. There are many other reasons why the URLs to pages change.

To make it easy for visitors, including search engines, to find those sites and pages at their new addresses on the Web, webmasters will set up redirects so that visitors to the original URLs, including search engines, arrive at the new URLs. Search engines may find a redirect for a URL, and have to decide which page to show information about in search results.

The kind of redirect that is often used for this kind of address change is a permanent, or 301 redirect, but it’s not the only type of redirect. If you are planning on using a redirect to let visitors and search engines know about the change of address of a page, or of the pages of a site, it’s important to know why and how different types of redirects are used. Regardless of the type of redirect being used, there are other issues that search engines might consider when they come across a redirect.

Sometimes redirects happen for less than legitimate reasons.

A newly published patent application from Yahoo explores how it might examine redirects, and attempt to understand if the owers of the original URL are the owners of the URL being targeted by the redirect.

An example from the patent filing describes one of the concerns that Yahoo has about redirects:

Redirecting URLs (uniform resource locators) is a very common phenomenon on the web. In dealing with redirects, a search engine, such as Yahoo!.RTM., has to come up with well-specified policies on which URL to index the content under. The search engine must also decide the appropriate URL to display as part of the search results. The problem is nontrivial, as can be seen from the following two examples: http://www.rational.com (source URL) redirects to http://www-306.ibm.com/software/rational/ (target URL) as of Oct. 23, 2007, because IBM bought Rational Software; and spam websites like http://www.somespam.com (source URL) redirect to http://www.yahoo.com (target URL) as of Oct. 23, 2007.

In the first example of redirection, the search engine would like to index the anchor text under both the source URL and target URL. The search engine may also like to display the source URL in search results because the source URL is a root page and may, therefore, improve user experience.

On the other hand, in the second example, the search engine would not like to associate the anchor text from the source (somespam.com) with the target (yahoo.com). In case of a content match, the search engine would not care to show the source URL, but would rather show the target URL.

A method and apparatus are provided for identifying if two websites are co-owned. In one example, the method includes obtaining redirect URL (uniform resource locator) pairs from the Internet, constructing a training set using the redirect URL pairs, constructing a feature set based on the training set, and learning co-ownership decisions based on the feature set and the training set.

Determining Ownership of Redirects

The easiest methods of trying to understand if an original URL, and a URL that it might be redirected to are owned by the same person or organization would be to compare the registration information about the sites that they appear upon using whois, or by having a person visit both sites and compare them. However, there are so many pages on the Web, and possible redirects, that an ideal approach is to try to find a way to automate the process.

A web crawler browses the web for redirect pairs of URLs. When it finds them, it sends information about those pairs to a training set. The training set is used to create a set of rules to try to decide of the original source URL and the URL that is targeted through the redirect are owned by the same person or organization.

The search engine could look at whois information to try to determine if the source and targeted URLs are co-owned, or have a person manually attempt to decide if the pages are owned jointly.

This training set could then be used to explore other pairs of redirected URLs, in an automated process to decide whether redirected URLs share a common ownership. The algorithm used would take what it can learn from the training set to build a “feature set” about the ownership of pages at different URLs that redirect from one page to another. The patent applications tells us that:

A feature set is a is essentially a set of rules for training the system to get to the ideal of human editorials discussed above with reference to FIG. 1. Referring again to FIG. 1, after the training set constructor device constructs the training set, the system learns co-ownership decisions by using features derived from the web-graphs and from the inlinks to the URLs of the training set. The feature set constructor device receives the training set and constructs a feature set of co-ownership decisions.

The patent includes a number of examples of features that it might examine to decide whether redirected URLs are shared by the same owner:

URL overlap of the redirect URL pairs – The characters (letters and possibly numbers) within the source and target URLs are tokenized and compared to a dictionary of tokens, which might be organized by finding the most commonly occuring words in those tokens.

For example, the URL “http://www.example.com/blog/” is found, and seen to be redirecting to “http://blog.example-site.com/”

An analysis would break each of the letters/characters in the URL into tokens, such as:

The analysis might see that both URLs contain “blog,” and “example” and determine that there is a fair amount of overlap (a statistically significant amount) between the original (or source) URL, and the URL that is the target of the redirect.

DNS (domain name server) overlap – The ip-addresses of the two domain name servers used by the two websites are reviewed.

URL-anchor text overlap – The link text, or anchor text, used by inlinks pointed to the domains are viewed and compared to words found within the URLs. Since search engines collect information about links to pages such as the URLs and the anchor text used by those links, this information is often readily available to search engines. For example, the anchor text “SEO by the Sea” might be used in a link to “http://www.seobythesea.com.” Using the kind of tokenized analysis and comparison described above would find that there is a statistically significant overlap between that anchor text and the URL.

Because redirects are sometimes used to spam search engines, a method like this is included to try to uncover spam. If anchor text pointed to the orginal URL matches well with the contents of the URL, but anchor text of the URL being redirected doesn’t match well with the anchor text, then there may be a problem. For example, anchor text in a link might be the word “yahoo webmaster guidelines” and the original URL might be “http://www.yahoo.com/webmaster-guidelines” but the redirected URL might be “http://www.example.com/prescription-drugs/”

The patent application tells us:

Spamminess of anchor text is an important consideration with the present invention. The system of the present invention utilizes machine learning to predict the co-ownership of two websites. Because the methods carried out by the system will be public information, the system is wide-open to be manipulated by spammers. Spammers could fairly easily designate several URLs to point to a spam webpage and have these several URLs falsely describe the spam webpage as being a non-spam webpage, such as the Yahoo!.RTM. home page.

The spammer could thereby easily setup an instance of cloaking spam. Cloaking is getting a search engine to record content for a URL that is different than what a searcher will ultimately see, often done intentionally by spammers. To counter this problem, the system employs trust information about the anchor text that the system may use for cloaking spam that creates a false match. The system may employ, for example, the same kind of definitions that a search engine uses in a typical web search.

Spamness/goodness measures – Any type of measure of how spammy or how trustworthy each of the two web sites from the source and target URLs may be viewed. If the source site is a spam web site and the target site is not a spam web site, then the URL redirect pair is more likely not to be co-owned. There are a number of ways that a search engine might use to try to decide whether a page is spam or not, from looking at the link structure associated with pages to a review of the content of those pages as well as a combination of both. The patent application doesn’t provide details of any specific method that might be used.

The title in the webpage of the target URL – The title of the page at the target URL might be compared to the title of the page at the source URL. If the titles match, then there may be a presumption that the URLs are co-owned.

Conclusion

When a webmaster uses a redirect to send visitors (and search engines) to a new address for a site, a search engine might look at more than just the existence of that redirect to decide whether it might pass along visitors to the new address in its search index.

The search engine may follow a policy that explores such things as whether the source URL and the target URL of the redirect are owned by the same owner.

This decision on co-ownership of the source and target URLs may also determine whether or not a search engine will associate the anchor text used for the first URL with the URL that it is being redirected to. If the URLs are determined to be co-owned, the search engine might associate the anchor text of the first URL with the second one.

The patent filing doesn’t involve a discussion about whether or not link popularity (such as PageRank) might or might not be passed along to a new URL through a redirect based upon a determination of co-ownership, it’s an idea worth thinking about…

Reader Interactions

Comments

Cool! Now I won’t worry anymore when migrating my blogger blog to wordpress blogs. This is a very useful information because starting bloggers opt blogspot as their platform since it’s free. But the customization limits it making wordpres as their choice. The only thing that bothers me is the pagerank and the time my site get indexed. Will it be sandboxed again?

Thanks. I like that line of thought – how a related process might be used to further explore links. I could see a search engine going through a similar post-crawl analysis of discovered URLs, with a set of rules to determine an indexing order, and potential filtering of some pages. The exploration of redirects could be a side process of such an analysis.

Thank you. I liked this patent application because it gave us some insight into the processes behind decisions such as what URL a search engine might show in search results when it saw a redirect, and whether or not a search engine might pass along the value of anchor text for those redirects. It’s great to be able to get a glimpse at some of the thoughts behind those decisions.

Whois information does tend to be a mess. Usually patents provide a fairly high level overview of a process, and may give us some illustrative examples without going into a great amount of depth, or including an exact roadmap of how they might implement the methods and processes that they cover. I’d love to hear more on some of the things that they touched upon, such as how they would rate the original and the target sites on the basis of “spamness/goodness,” but I do think there’s some potential behind this approach.

I don’t believe that this patent filing is aimed at any specific community whether SEOs, webmasters, plumbers, or any others. It’s aimed at providing a method for the search engine to try to achieve the best experience possible for searchers, by helping the search engine decide which page to show when it comes across a URL that redirects to another URL.

I wonder if we could develop some rules which might tell us where a site would fall on a scale of spamness/goodness to give us an idea what might cause a search engine to rate the new target link as being spam by analyzing a typical spam site? For example, an incomplete About page, frequently deleted posts, no method of contact, and a large number of pingbacks on other sites may begin to indicate spamminess. I imagine that we should think about if we have built a good site for the page to be redirected too.

A set of rules would be nice, but I think there are lots of things that a site can do to make it appear to visitors to be something of high quality, like the those that you describe. I’m also a fan of the Stanford Credibility Guidelines – http://credibility.stanford.edu/guidelines/index.html

There are probably some hints and ideas in white papers from Yahoo about things they might be looking for in determining where a page or site might stand on the spamness/goodness scale. Here are a few of those:

Three of those focus primarily upon liks and link spam, rather than an analysis of the content found on pages. Given that, a couple of steps likely in the right direct can be to attract and pursue links from quality pages, and to be careful about where you might link out to. Also, building a high quality site filled with interesting, engaging, useful, and credible content increases the likelihood that you’ll attract links from quality pages.

this is a good overview, but I was hoping someone could shed some light on how 301 redirects are treated differently by search engines than 302’s. I understand that 302 is supposed to be temporary redirect only, but does it also affect the SEO rankings? Is it better, or worse?

There are some cases where a website owner wants to legitimately do a 302 temp redirect, but I don’t know what the implications are.

With the right server set up, redirecting pages on one domain to another can be fairly simple. Sometimes it can be a little more complex though, depending upon the server software being used, the configuration of the server, and the amount of access the owner of a site has to making changes on the server it’s hosted upon.

The process described in the patent filing could potentially have an impact upon how a search engine might decide which URL to include in its index when it finds a redirect. If you want to dig deeper, there is a link to the patent application above.

A search engine might perceive a temporary (302) redirect as something that is only transitory, and might change back to the original. Because of that, it’s possible that the original URL may be shown in search results rather than the targeted URL, and it’s also possible that the search engine might not associate any link equity (or PageRank) or relevance based upon anchor text pointed to the old URL with the new targeted URL.

A permanent (301) redirect may seem more likely to evidence an intent to keep the new URL around, and it’s more likely (but not absolutely certain), that a search engine will show the new URL in search results rather than the one one that is being redirected, and it’s more likely that the search engine will associate link equity (or PageRank) and relevance based upon anchor text with the new targeted URL. However, as this patent points out, there are things that a search engine might consider before it makes that decision, such as whether or not the URLs seem to belong to the same owner.

There are times when a website owner might want to legitimately do a 302 temp redirect, such as when the redirect is meant to be temporary.

Many thanks for your terrific overview Bill (and you saved me from giving the USPTO $3 for the entire patent). It sounds like Yahoo is revealing their process (and likely the others) on determining the intention behind the redirect. I suspect that this is why it takes the search engines so long to transfer the PageRank from one URL to the other.

Google allows one to inform them that you have moved your page. Id assume it helps them index a new page and pass pageranks and keyword fluff as I call it. However I wonder if the PR and fluff would pass if a site got some kind of penalty and the owner set up a new page with similar content on a new url, but lets say changed the structure of the site maybe to confuse google. I have read on a forum that people have tried to set up hundreds of 301s from banned sites to existing sites with average PR and good keyword fluff to try and deposition them in the SERPs and the target webpages didnt move. If thats so setting up a 301 page to get clients that are used to visiting a certain url, that has now been banned, looks safe?

You’re welcome. Not sure why you’re paying the USPTO for patents when they are viewable and available online. We are being given a peek behind the curtains, but possibly not at everything. It can take a long time for the search engines to pass along PageRank through a redirect – which may indicate that even more is going on in those situations than just determining ownership.

I’m not sure that trying to use a redirect from a page or site that is banned or penalized in some way is really going to help or hurt a site being redirected to regardless of whether the site being redirected to is new or has been around for a while.

The patent application above from Yahoo doesn’t describe some of those scenerios specifically, but it does provide an example (which I’ve included in a quote above) about the use of a redirect from a site to attempt to maliciously associate certain anchor text with another site owned by someone else. Using a redirect from a banned site to try to harm a competitor’s site shouldn’t be something that people should be able to do, and I would guess that it’s the kind of harm that this patent application is attempting to avoid.

Using a redirect from a banned site to another site, regardless of who owns the other site, to try to circumvent a penalty or banning and pass along PageRank likely won’t be helpful either, but that’s probably regardless of whether or not the sites are co-owned or not. If a site is banned or penalized, the way to try to start making things better is more likely to fix the problem and go through some kind of reinclusion process, rather than trying to use some kind of redirection.

This does raise a very good point, though. It’s quite possible that situations like these are things that the search engines have had to grapple with for a while, and the use of a redirect to pass along PageRank and hypertext relevancy probably isn’t quite as simple a process as it might seem on the surface.

Again Bill, you blow me away with the way you breakdown information within your articles. I have learned more from your site than any other SEO site online. You must be applauded with your understanding of how to decipher SE’s patents and the ability to share your wealth with others. I have been doing quite a lot of 301 testing within my own network of sites to see the outcomes. This article has just backed up my results. Keep em coming…!

Unfortunately, people sometimes make changes to things like URLs and linking structures without being aware of the implications of their actions, and without exploring the right way to use and set up redirects, or being aware that even if they do know how to use redirects correctly, that there is a risk in making such changes, and that search engines don’t always do what you might expect them to when you do make those kinds of changes.

There are things that one can do to attempt to mitigate risks when making changes, such as identifying old links, and changing the ones that you have control over, such as directory listings, and asking for people who have linked to you to update their links to your new address. Pursuing new links at the new address during such a change isn’t a bad idea either. Letting people know beforehand that you will be at a new address can be a smart move, as well. There are other steps as well, that one can take to reduce loses in traffic and rankings, but as I wrote at the start of this post, “Changing the URLs for pages isn’t something that should be done without a lot of thought, and without very good reasons.”

Definitely making changes page titles/url’s and/or changing the domain name altogether is not for the faint of heart and takes a lot of careful preparation. In most instances, your traffic and rankings are going to drop temporarily until Google reapplies the acquired page rank to the new equivalent page. With advance planning and monitoring of webmaster central for 404 errors and correctly utilizing proper 301 redirects, the site’s rankings should rebound. It is very important though to make sure the changes are absolutely necessary. If not, they shouldn’t be touched.

I have a client that needed to change the domain name and I was a bit nervous to make such a drastic change. Thanks for supplying the patent info. It calms my nerves a bit. 🙂

I agree with you completely – making changes to domains and URLs are filled with some risk. The old cliche, “measure twice, cut once” probably isn’t cautious enough. Careful planning is called for, and extensive checking after the fact, to make sure that nothing went wrong. Even if you get all the redirects right, it still pays to look at everything else.

For example, sometimes (often) people make changes to a site at the same time that they move to a new domain, and they do that on a development server. During development, they may have set their robots.txt file so that search engines don’t crawl any of their pages. when they move the site off the development server to the new location, sometimes it can be easy to forget to change that robots.txt file, so that search engines can crawl the new site at the new location. It happens…

I’m not sure that this patent might be all that helpful to people using redirects to spam search engines. It does provide some hints at the analysis behind the decision on which page to show in search results, but I don’t think it will help people who are trying to manipulate search results.

Bill, you posted a unique Weblog entry about a topic about which I don’t often see Weblog entries online. Moreover, I concur with most of the commenters here.

I understand the dilemma of any Webmaster (whether he or she is inexperienced or a veteran) who move his or her Web sites from one domain to another one, at which point he or she must point the old URL to their new Web pages or rename those Web pages.

Because of the potential implications of a single fatal error, it is imperative that concerned Webmasters and members of the SEO community consult with an experienced and reputable colleague before attempting a permanent (301) redirect or a temporary (302) redirect.

I have seen a number of websites that I have visited quite often in the past get snapped up by competitors who bought them out. The new owner added a URL redirect to the index page of the URL purchased, thus passing the traffic to the URL of the purchaser.

Unfortunately, the content of the new owner turned out to be of much lower quality than the original site that was purchased. It was only a very short matter of time before I stopped visiting all together.

In that light, I can see why search engines are so interested in heavily analyzing redirects. I would imagine that search engines would pay very close attention to the bounce rate of the page receiving the traffic from the redirect. A change in ownership can make all of the difference in terms of quality as I have personally seen.

It is possible for the value of a link to decay over time when someone purchases an older domain and sets up a redirect to a different site with different content. While some of that might be from people removing links to sites that have changed like that, some of it may also come about because the search engines does analyze the redirect and the content found on the new page being redirected to.

Google’s patent on information retrieval based on historic data provided a couple of examples of that kind of analysis, such as the anchor text in the link that gets redirected no longer being as relevant for the new page as it was for the old page.