In this study we explored the way that Google indexes tweets. The reason we embarked on this was to determine the likelihood that Google might use signals from Twitter for ranking purposes, but we found lots of other interesting information in the process. Spoiler alert:

Google does not index a particularly significant percentage of tweets at all [Tweet This!]

The tweets it indexes are highly biased to people who have 1 million followers or more. [Tweet This!]

Even for those high authority accounts, indexing is not particularly fast! [Tweet This!]

Read on for the details of what we found!

Basic Twitter Indexing Information

In Twitter’s IPO filing, it was reported that Twitter is handling more than 500 million tweets per day on average. The following image shows a pair of Google search queries we used to attempt to find out how many Twitter pages Google has in its index:

Between these two queries, we see less than 1.5 billion pages, which is a pretty small number when you consider that there are 500 million tweets per day – it’s less than three days worth. However, the data from these two queries is not necessarily that accurate, so we decided it was worth trying to break this down further, and see how many Twitter pages Google was indexing per month.

To accomplish that, all you need to do is utilize the advanced search query operators as shown in this graphic.

First click on “Search tools,” then “Any time,” and then “Custom range.” Then you can use the calendar feature to pick a range of dates. We did this on a month-by-month basis for each month January 2012 through June 2014. Here are the results we got:

First, the disclaimer. The site: query is known to be rather imprecise. However, even allowing for a large degree of error, this data suggests that the indexing rate of tweets is actually quite low. This already makes a pretty strong statement of the value of the information in the average tweet to Google (i.e. it’s fairly close to zero value).

As a further note, you may be wondering how many tweets are actually retweets, which could potentially make them a lot less valuable to separately index. That’s debatable, of course, as a retweet of your tweet would be an indicator of greater value and essentially behaves like the link graph in the world of Twitter (we can call it the “Retweet Graph”). But, in any event, according to Dan Zarrella’s analysis of 5 million tweets, retweets make up about 1.4 percent of the total number of tweets.

Detailed Research on Indexing of Tweets

Back in December, we published a study on the potential impact of Facebook on SEO, and in it we studied how Google indexes content from highly prominent Facebook profiles. It showed the updates from influential Facebook profiles only getting indexed at around 59 percent. Today we are reporting on a similar study for Twitter.

In this part of the study, we included an analysis of the indexation of posts for 963 different Twitter accounts. We used the Twitter and Google APIs to pull the last 20 tweets from each of these accounts and tracked their indexation levels in a number of different ways. The follower numbers of the accounts included in the study were broken into these categories:

More than 5M followers – 26 accounts

3M to 5M followers – 9 accounts

1M to 5M followers – 23 accounts

500K to 1M followers – 20 accounts

100K to 500K followers – 71 accounts

10K to 100K followers – 199 accounts

Aggregate Indexing of Tweets Over Time

The first look we took at the data was to see what percentage of tweets were being indexed, without regard to the number of followers. Here is what we saw over the first seven days:

In aggregate, there were 10,453 tweets that we saw within the last seven days, and 326 of them were indexed, for an indexation level of 3.12 percent. This is actually pretty consistent with what we saw in the first part of our study where we used simple site: queries to check indexation levels over time.

We also looked at the indexation levels for tweets that were more than one week old:

Once again, you see that the indexation level is relatively low. There were 19,389 total tweets checked, with 701 of them being indexed, for an indexation level of 3.62 percent. Total indexation in our data peaked at about week four. Given the depth of our data, I’d conclude that indexation of tweets increases over time and peaks between two and four weeks, and then it starts to decline after that. It may not be as low as the 0.1 percent levels we saw with the site: query tests we did, but at best it’s a small percentage of total tweets.

Breakout of Indexation Levels by Follower Count

We also broke out the data by follower count. The results were very interesting, as shown here:

As you can see, the indexation level of tweets for people with 1 million or more followers is actually quite high. As soon as you drop below 1 million followers though, it plummets. This decline continues, and for accounts under 10,000 followers, the indexation rate is only 0.22 percent, a level that is pretty consistent with the data we found with our site: queries.

Given the nature of how we identified our Twitter accounts, it’s clear that we were heavily biased toward larger accounts. In our test, 63.9 percent of what we tested had 10,000 followers or more, and those are very lofty numbers. The overwhelming majority of accounts have far less than 10,000 followers, and that also suggests that our site: test data is not that far off.

Indexation of Major Influencer Tweets Over Time

We looked at the indexing of tweets from very influential accounts over time in a more detailed manner. When we ran our test program, we noted when we researched the tweet, and the time of the tweet. This allowed us to see indexing levels of the tweets over time on a day-by-day basis. Here is what we saw:

What is really interesting about this is that the tweets from these very high profiles are not indexed particularly quickly. It has long been believed that Twitter is used by Google for news discovery, but this data suggests that Google is not particularly fast at indexing tweets even from the most influential profiles.

What Causes Tweets to Get Indexed?

We broke the tweets down into a variety of categories to see how that might impact indexation. For purposes of this analysis, we concentrated on the five Twitter profiles with the largest number of followers, and the five Twitter profiles that had the most inbound links. We broke it down this way so we could see if high follower count had more of an impact on a profile’s indexation rate than the profile having a large number of inbound links. Please note that the sample size for this test was small; a total of 92 tweets were checked at this level of detail.

For the five profiles with the highest follower counts, we found that 80 percent were indexed, and for the five profiles with the strongest link profiles, we found that only 20 percent were indexed.

We then went a little further to examine all the indexed tweets to see what types of tweets they were. For example, for the five profiles with the most followers, we found that 20.3 percent of the indexed tweets were newsy or very topical in nature, and 43.2 percent of the indexed tweets had a link in them. Inbound links to the tweet seemed to enhance the probability of being indexed as well, as 71.6 percent of the indexed tweets had inbound links to them.

We also looked to see what percentage of the indexed tweets were news oriented OR had a link in them OR had a link pointing to them, and that aggregated total was 86.5 percent. Here is the chart showing that data in a bit more detail:

Note that in this data that an indexed tweet may have a link in it, an image, AND have links to it – there can be some real overlap. We repeated the investigation by examining the makeup of all the non-indexed tweets. Among these, we saw that only 16.7 percent of these were news oriented, had a link out, or an inbound link pointing to them among the five profiles with the highest follower count, and that number dropped to 13 percent among the five profiles that had the most inbound links.

We took one last slice at this data, which was to focus on it by category. In other words, among the five profiles with the highest follower counts, and the five profiles with the highest inbound link counts, what percentage of news oriented tweets were indexed? Interestingly enough, it was 100 percent. It actually looked like 100 percent of image tweets were indexed as well, but the sample size for that was exceedingly small.

I need to emphasize that the data in this section is based on a small total sampling of only about 152 tweets from very high profile accounts. As a result, I would not try to draw any deep conclusions from it and offer it purely to fuel speculation, and perhaps to provide grist for a future, more in-depth study on the topic.

Twitter Links are NoFollowed

We also looked at the source code for a tweet. As with Facebook, this link is NoFollowed, so no PageRank is passed by the link:

This is common on social media networks, largely because the content is user generated, and this makes the value of that “endorsement” suspect. Google’s John Mueller had this to say about these types of links:

I think it’s always a bit tricky for us when we can recognize that it’s a user generated content site and we’re not really sure how to trust those links within there.

Conclusions

In summary, these charts show us that overall indexing of content of tweets by Google is quite low [Tweet This!], but they do in fact index a reasonably high percentage of tweets from more influential accounts (up to 50 percent) [Tweet This!]. However, the indexation is not as rapid as we would have expected [Tweet This!]. Given the conventional belief that Google might use shared links in Twitter as a potential indicator of a hot news event, we would have expected that indexation rate would be more rapid.

However, the data does not necessarily support this. Even for accounts with more than 5 million followers, only six percent of tweets are indexed within the first 24 hours, and this only climbs to 15 percent by the end of 48 hours. This is nothing to write home about! However, Google could actually be crawling the tweets, not indexing them, but still using URLs it finds as a means of discovery. We just don’t know.

In summary, to me the evidence suggests that Google does not currently use activity in Twitter as a ranking signal. If I am wrong about that, and they do extract some ranking signals from Twitter, then the evidence suggests that they are doing that primarily from accounts with 1 million or more followers – i.e., the absolute cream of the crop.

Andrew – it does mean most followed, which we are using here as a proxy for influential. Hopefully, this post does not inspire people to go buy followers! As you suggest, I am sure that the way Google makes that decision is more sophisticated than all of that.

Coming from the perspective of someone optimizing a website, conventional wisdom implies Google needs a certain amount of information to determine what a page is actually about in order to rank it properly. In other words, you need to have a reasonable amount of text.

It could well be that Google don’t index that many tweets, because the 140 character limit restricts the ‘quality’ of the pages that are created.

Given that you need one heck of a lot of followers before you start to see even some of your tweets being indexed, aside from the obvious benefits of reaching a lot of people quickly, it does seem that now is the time that we can turn round to people and say, “No, Twitter does not affect SEO.”

Have not seen any connection of this to Twitter though. Are you saying that Google used to show authorship photos for tweets when they showed them in the results? I never noticed anything like that, but es, that would be done now.

Eric,
Good exercise and good story about tweets indexing and their capacity to influence or not rankings. Not conclusive though, in my very personal opinion.

Have you thought at any time that Google may be using a different criteria (qualitative) in order to honor rankings from social platform participation? Along the lines of agent and author? Just a thought. If my suspicions (not data backed) are somewhat certain, then these aspects cannot be observed with the methodology above.

I’m interested in knowing what conclusions IMEC Lab (http://moz.com/rand/imec-lab/) is driving off their own experiments, and probably compare/complement with this on of yours.

As I have pointed out for years, Google’s date-range queries do NOT provide complete results for things indexed during the given period. This is an admirable amount of work but you’re piling chompy numbers on top of chompy numbers.

I think a smaller scale experiment would provide better insight into what google is doing. It would also enable you to check for secondary factors.

In my own research I have found that Google is following external links to the Tweets. They can either follow links to the accounts (more links drive more crawling) or they can follow links to the status pages themselves. Poorly linked user accounts are less likely to have their Tweets indexed regardless of activity.

Hi Michael – the actual data on the 19,000 tweets is measured through direct examination as to whether or not the tweets are indexed, so that’s hard core data. As I noted in the beginning, the site: queries numbers are not the crux of the work done. Cheers,

Hi Dan – Actually, in the study, we did examine whether large number of followers had more impact than links to the profile. Based on our analysis the number of links did not seem to correlate as well with indexation as the number of followers.

However, for that part of the study, as I mention in the post, it was a pretty small data set.

I think indexing of tweet is all depends on high authority accounts. Though I don’t possess any high authority in SEO, but still the percentage of my tweets is bit high . The reason might be that I am answered by high authority accounts.

One of my readers commented on my article and showed me the indexation of his twitter account. He linked his account with relevant social media (Google+, Linkedin), always used the Google Link Shortener, has a tidy follower-list and tweets always regular (every day or second day). With following these rules he has a lot of tweets indexed by Google, with only 97 follower. Are these personal rules of him maybe another sign for Google to index his tweets?
I look forward to hear from you!
Regards
Julian

Great post, thanks Eric. And overall, I’d say looks like Google’s getting it about right… indexing news from people that more people are following and ignoring (all-but) images, mentions and the rest of the world that are (largely) tweeting conversations about right-now.

PS Full marks for such a patient response to queries & points; maybe we need another abbreviation – gb;r-r.

Conflating indexation with results displayed is probably a mistake. Google indexes TONS of content that they do not display in the SERPs. Google is probably keeping millions of times more data in their index than they will ever actually display in results. This study is about results displayed, not indexation.

Just because a page(or tweet) isn’t displayed in results, doesn’t mean it’s not impacting the algorithm elsewhere. A link on a page that you can’t find in the SERPs can still pass PageRank. We can actually prove this by using the meta=”robots” content =”follow, noindex” tag. The links on the page with that tag will pass PageRank, but the page itself will not show in the SERPs.

Hi Micah – actually, we tested indexation of the posts by taking the URL and doing an info:query to see if Google had it in it’s index. The info: query is a straight indexation test, not a display test. However, it’s certainly possible that Google crawled pages that are not in it’s index, and it’s certainly possible that they might use links in such tweets for purposes of discovery, even if they chose not to index the content.

An interesting question! I just tested several times clicking on a link on a Twitter post in Google search showing for one of our blog posts, and watched it come in through the Real Time function of Google Analytics. Each time, the click was attributed to Twitter, not organic search.