The Moz Blog

Why New Content Briefly Flickers Out Of Google

The author's posts are entirely his or her own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

A few days after new content shows up in Google, it will sometimes flicker out of the SERPS for a few hours. Apparently, this is common knowledge to some SEOs. This is not common knowledge to programmers like me, and I nearly made a tin foil hat in preparation for the googlicopters when I learned the project (Linkscape) that I'd worked on for months had disappeared from all SERPs on the Friday evening of the week it launched. Fortunately, Rand responded to the internal email thread and, just as he predicted, 12 hours later, Linkscape was back in Google's SERPs like nothing had happened.

This raised the question, "Why would the engines drop new content out of the search results for a few hours after it is been in the results for a few days?" I don't know, but let me make an educated guess - sometimes there is a brief gap between pages falling out of a smaller (but quicker to build) index and when a larger (but slower to build) index is finished getting rebuilt with those pages in it.

Not having worked at Google I have no solid evidence they have multiple indices, but let me make the case that they probably do. Linkscape currently takes over a month to move something being crawled to it appearing in the results. There is some low hanging fruit to reduce this to more like a couple weeks, but for the foreseeable future we aren't going to have turnaround time on the order of hours like the engines do because our index is large enough that it just takes a lot of computers and lot of hours to compute it. So... how do the engines get around this issue? They could make their indices support random inserts, but this would make them more complex and less efficient. The other option is to have two indices. That way they can have one index that is small and quick to update, and another that is large and slow to update. The small index would try to have the difference between what is crawled and what is in the big index. At query time, they would then need to check both. Of course they could have more sizes of indices besides just two, but that doesn't affect the basic point that presumably Google has more than one.

Google could remove a page from the small index only after it is in the big index, but then it would be in both indices for a while until the small index was rebuilt. This overlap means the small index is larger than necessary, so can't rebuild as quickly as is possible, and so won't be as fresh as is possible. So perhaps they try to time it perfectly so there isn't any overlap nor any gap. The problem comes that as they crawl faster, grow their indices, add complexity to their indexing or let the intern check in his summer project, it is easy for a small gap to form. So maybe it is just hard to ensure that there is never any gap unless one is willing to waste resources by letting them overlap.

Chas (the developer who sits next to me) manages some indices with a large+small model that, for the record, never has gaps. And he contributes the fact that his large index starts rebuilding at midnight on Friday because load is lighter on the weekend. However, his computers are set to GMT which means it starts Friday 5PM PST. Well, it was a bit after 5PM on a Friday when Jane first noticed Linkscape dropped from Google's SERPS (I received her email at 5:28PM). Google has fewer CPU constraints than Chas, but they do have bandwidth constraints which is what's needed to push new indices out to lots of computers.

So the theory is that Google had two indices that were supposed to go live in the first seconds of the weekend GMT. First was the new large index that added our page. Second was the new small index that dropped our page. Only the small index was on time.

Or, at least, that is the best theory I can come up with. What do you guys think?

p.s. from Rand - This post is Ben Hendrickson's first on SEOmoz. He's been with us nearly a year, working on Linkscape, and before that with Microsoft & his own technology startup project. I'm thrilled to have him contributing to the blog. Hopefully, he'll get a photo up sometime soon :-)

27 Comments

1.) There is plenty of solid evidence that Google uses multiple indices. About a year ago, you couldn't go anywhere without hearing (from SEOs and from Google) about the Supplemental Index.

2.) Since Google has many (redundant) data centers around the World, they have the ability to reroute traffic away from a particular data center as it is being updated. There should never be a gap of time during which the indexed data for a given URL is completely unavailable.

3.) After Google did their recent PageRank update, there was a brief period where the Toolbar PageRank values went back to how they were before the update. Then the update "reoccurred." In other words, this phenomenon is not limited to the index, but may also include other sets of data (such as PageRank/the link graph).

BTW... my ideas don't align to point to any particular theory. I have multiple personalities, and I gave them all a chance to voice their opinions.

Kind of agree with Paul Pedersen above - I think it was originally Google conspiracy to get rid of SEOs by provoking heart attacks (when discovering your new website, briefly ranking no 3, has dissapeared without trace.)

Since that doesn't scare anybody anymore, I'm terrified of what they'll think up next...

I was sure Google was trippin. I would write an article, build some links and within a few hours I would see it ranked on page 1. The next day when I looked for the content it would be gone. The time it was missing varies from a few hours to a few days but it typically returned with a good ranking. Thanks for the explanation.

I change Titles of some of my pages, and suddenly appear on the first page. Could be due to the fact that you mention, or they are also tracking CTR, Bouce Rate, Time on Page, and some other factors to consider your page to be on the top ones for that keyword.

Great post, Ben (get yourself a profile picture already, I seem to be saying that a lot recently). Reading your writing is just like hearing you talk - isn't it funny how it's like that for some people. I have no evidence either way on what you describe, but it sounds plausible...

Does this thought sound accurate to you: If there is always an active crawl for the small index, and some sort of delayed crawl or slower crawl for the large index, then new links might 'freshen' a page that might already be in the large index but if the links are found within seconds of the switchover between the small and large index, that might cause an overlap of sorts and contribute to the disappearance of new content in the SERPS? Does that make sense? Or is this simply a crawl issue regardless of incoming links during the first few hours/days of the new content?

Lol - I have to give you my disclosure that I'm still trying to learn the SEO ropes and I don't claim to know any or very much SEO so if the above question is completely incoherent, that's my excuse. ;)

Since we all come at this from our own points of view, perspectives and experience I'm not sure we are going to come up with a consensus.

I see you idea of more than one index, with the lag time attributed to gap in switching over.

I am inclined to go with a combination of Darren and Ben. Something more like, multiple indices, set to serve regions that overlap as they rebuild on a timed schedule. Like waves, going over the globe, being built like incoming tide. It comes onshore a little farther, then falls back as it goes off line and an older version holds the place before the new wave comes in.

Data can be fed in and it goes to category but then that index goes down to rebuild. While it is down another index, overlaps the service area, but it is fed on a different cycle so it is not as up to date yet. Next cycle through it will have caught up with the one it replaces, but not with the newest data it has scraped.

That's my theory. I can't prove it.

BTW... I don't have multiple personalities but I do hear the voices in Darrens head. I can't say anymore. They might hear me.

I have a website that shows up top 5 one day then it will go to like 13-15 the next day and then back again. This has been happening for about a month now and I have no explanation of why. Maybe this is related? Maybe not... it doesn't completely disapear.

I would agree. Today I searched for a term that showed 2.5 million results, but the odd thing was it showed only ONE result and then right after it there was the gooooooogle to view the next pages. A few minutes later it was back to normal. Now that could have been a seperate bug, but it could also have been along the lines of what you are saying.