Over the last couple of years, Twitter silently changed the way they treat any links you include in tweets. In doing so, they have given themselves a very nice competitive advantage in lots areas, but they’ve also silently taken away the ability for search engines to follow the links you post to Twitter.

Here’s what Twitter changed:

In the past, clicking a link within Twitter took you directly to the destination.

Today, any link you click within Twitter first takes you invisibly to Twitter’s ‘t.co’ URL redirect. Once there, Twitter record various information about the click, before taking you on to your destination. All of this takes a tiny fraction of a second.

For example, clicking this link: http://t.co/1nKSjDDRhd will take you first to ‘t.co’, where Twitter will record the fact that you clicked it, and then you’ll be moved on to the destination URL (in that case, a previous blog post I wrote).

This is a very clever, simple way of allowing Twitter to gather piles of data on which links are most popular, who shares them, who clicks them, etc. As an illustration of how big this is, as a result of this Alexa treats ‘t.co’ as the 66th most popular website in the world.

The Oddity

The oddity here is this – the robots.txt file Twitter have created to tell all search engines what they can/cannot do with t.co links (http://t.co/robots.txt):

Roughly translated into English, the first 2 lines there say:

“TwitterBot, there is nothing you are disallowed from crawling.” (ie. Twitterbot is allowed to crawl everything)

The second block of 2 lines says:

“All other bots: You are disallowed from crawling anything.” (ie. Unless you’re “Twitterbot”, you are not allowed to crawl anything at all on t.co)

Twitter could make this information available in other ways – for example via their API – but they famously cut off Google from full access to this.

So What?

This is sensible from Twitter’s point of view, as it means they don’t have Google and other search engines crawling every URL posted to Twitter, eating their bandwidth.

But from a website owner’s point of view, and a user point of view, it means that Twitter have blocked Google (and any other search engine) from following the links you post to Twitter.

Thought provoking post Dan. One extra view onto this is that, even though the link is shortened in the tweet, Twittter’s API will return results for a URL or a text match within a URL. This is true even if you’ve pre-shortened with Bit.ly. They’re not unhappy for users to see what’s being tweeted but you’re right it seems like a competitive ‘FU’ to the search engines.

It’s a slightly different use of the robots.txt file, because there’s no actual site there – it’s just a series of one dimensional pages with a redirect.

Finally – does this pour scorn on the idea that tweeting a link will help with it’s appearance in search rankings? How else could they know, if they’re blocked from the API? Maybe they’re using DataSift or a similar data middleman 😉