Minty Fresh Indexing

When I joined Google in early 2000, we had a stretch where we didn’t update our index for 3-4 months or more. At the time, that wasn’t bad for a search engine; I remember one search engine around then that wasn’t updated for over a year. Starting in mid-2000, Google updated our index pretty much every month. People used to use the phase of the moon to predict timing of the next “Google Dance.”

Now raise your hand if you remember “Update Fritz” from summer 2003. That was the Google Dance where Google switched from a monthly batch update to an incremental update. That means that our crawl/indexing team updated a fraction of our index daily or near-daily. Back then we had not only the normal crawl but also a “fresh crawl,” and if documents were in the fresh crawl then Google would sometimes show a date in our snippet.

The Google crawl/indexing team has continued working hard, and severalpeoplehavenoticed Google’s index getting fresher and fresher. Now some documents can show up in minutes instead of hours or days.

I’ve noticed that as search engines have gotter better (fresher, bigger, more relevant), people keep adjusting their expectations upwards. I can’t imagine waiting over a month for search engines to update their index with news events any more, but just a few years ago that’s how things worked. And it only takes a few encounters with a fresh index until you ratchet up your expectations. My previous mental model was “normally it takes a day or so to show up in many search engines,” but I had my own “Zoiks! That’s fast!” experience tonight, which I’ll describe for you.

I was feed-grazing in Google Reader, as I am wont to do, when I saw a message that there was an update to Reader’s code for offline reading (Google Gears). In my experience, if I move on the the next feed, I lose that little message with the link to update the code (not sure why, but that’s a different post). So I click the link and update my code for offline reading.

In the process, I lost the post that I was currently reading, which was Rich Skrenta’s post about Persai. I wasn’t done reading the post, so what do I do? I go to Google and search for [skrenta blog] so that I can find Skrenta’s blog and finish reading the post.

And what did I see in my search results? The snippet from Skrenta’s blog was showing the post that he did at 7:54 p.m. Pacific time. It was about 8:44 p.m. Pacific time when I did the search. So from Rich hitting the “Post” button to me being able to see it in Google’s main search index was well under an hour.

Don’t believe me? Here’s the bottom of Rich’s post, showing that it went live at 7:54 p.m.:

I double-checked that Rich’s blog was on Pacific time by leaving a quick comment on one of Rich’s other posts.

And here’s Google showing a snippet from Rich’s post within an hour after it went live:

Now that’s a minty fresh index. It takes a lot of good design and infrastructure to be able to refresh large numbers of pages that fast. Congrats to the Googlers who are improving Google’s ability to re-crawl, index, and score web documents quickly.

Update: I was only checking every 10 minutes or so, but this post was crawled/indexed/searchable in half an hour or less:

Hope you are feeling better.
I have come to expect instant indexing with google blog search, and when I am working on something time sensitive I will check google.com to see when something posts there, and expect to see it there within a day. Obviously it sometimes takes longer. With the amount of people who rip off content, and duplicate content penalties, fast digestion by the robot of new pages with indexing becomes more and more important.
I am glad to see the speed!
dk

David, thanks for the well-wishes. I’ve been lying down most of today, and I’m feeling a little better.

Blogsearch is great, and I’m a huge fan of blogsearch.google.com and that entire team, but it’s true that many regular users don’t really want to search in 15 different places. That’s why I’m glad to see some of the indexing speed of blogsearch “rub off” on the main Google search. And that’s not even counting when we can pull blog posts into the main Google search results, either in the main index or as a onebox that’s displayed at the bottom of a search page.

Ah, the good old days of the dance. There was a bit of excitement involved back in those days and I can remember some long nights manually going through the searches to see what had happened in the latest ‘dance’. Good times, although I’d agree that quicker is better these days

On that note, care to ponder why after a month and a half this page http://www.travellerspoint.com/guide/ is still not ranking for the term ‘wiki travel guide’? There’s only 4 decent ones in the world, so you’d think this should make it to at least the top 20 in G. Seeing as you specifically mention “search engines have gotter better (fresher, bigger, more relevant)”, this should be a concern to G I’d think. Yahoo has no problem with the page/site and ranks it decently.

Dave (original), different people within Google have different opinions. Some people want to get rid of it, some people want to show it all the time to show how fresh Google is. In some countries, freshness is more/less important, so it’s not a black/white subject. My personal opinion is that as long as we’re consistent, I’m pretty mellow either way. My hunch is that we’ll run some experiments and come up with some reasonable behavior.

Harith, certainly a post that’s 30 minutes old is going to have fewer links, for example, but just getting the content indexed/searchable is often half the battle, because people search for long-tail/specific stuff so often.

combining the technology of Google news with the technology of blogsearch and combining and incorporating all that information tech into the organic SERPs —- this benefits EVERYONE. The various Databases are talking to one another via SOA and more efficient programmijng.

Even the Image Search is benefiting from this immediacy – the news images are being incorporated into the organic search images.

Dave (Original), I think if you look at my comment you’ll see that you can’t use the words ‘prove’ and ‘always’ in that last comment. There’s millions of examples where Google isn’t providing the most relevant fresh results, so proving something requires addressing all those cases too. Whether or not google is ahead of their competitors in terms of the actual index is a different question, but there is still plenty of work to do from what I’m seeing.

I agree with your DMOZ comment though. 3 to 4 months is nothing, Kevin must be an editor, hehehe

I’d like to at least see the update date when hovering over a result so I don’t have to view the cached version for the date. The title attribute for the anchor tag could be used for that. Something like “magenta stew — indexed on Aug 7, 2007.”

Thanks for the post, I noticed something similar a few weeks ago. I updated my profile on our blog and it was pretty much instant that it was in the main index under a search for my name as top. Needless to say I jumped around the room etc etc

Funny thing was that lasted a day or so and then the cache reverted back to the pre update profile (which had no text at all) and no ranking on my name which it has remained since then. Is this likely to be a crawlbug or just an anomoly that will rectify itself over time? The cache still has the old version which i agree shouldnt rank at all as it has no text.

Well, that is some very impressive technology. Internet search is still in its infancy in terms of length of existence when compared to other computer technologies, but yet you guys seem to be ahead of the normal development curve. As Google approaches near real time indexing an interesting situation may develop as the web ages, the weighting of archived documents in relation to the fresh ones. The library may have the best sellers in the front but in the back they’ve got some great things written in 1903 as well, as the web ages more and more well written material will no longer be updated but will need to be accessed by searchers. I’d hate to go to the library hoping to check out an autobiography by a WWII veteran landing on Omaha Beach only to be only offered travel brochures to Normandy with plenty of ads to look at.

Now I don’t have one of these sites that get updated every 10 minutes in the main index so excuse me if my questions seem ignorant, as I can not test for it myself, but it does raise a few questions in my mind. Is this recrawl triggered by anything like a feed ping or a sitemap update and ping? Or does the crawler download and compare a copy of your site every 10 minutes waiting for an update? The later seems like an incredible amount of resources, then again a pinging type method seems easily gamed. Perhaps you determine the update frequency on more than just a mere pagerank calculation, such as noted and logged actual content changes. I’d hate to think that between July 30th and August 6th your homepage got downloaded and compared every 10 minutes in hope that you made a new post. Seems to me that is valuable resources that could have been used on some other sites that may not have a PageRank of 7 but have updated their sites.

Is this new ability available due to increased resources or from better allocation of resources? Is the growth of the supplemental index related to this capability? As more pages are deemed unnecessary to crawl regularly and not important enough to be parsed for obscure terms that would free up bandwith and processing power to update very popular sites every 10 minutes.

All in all, I think its great for the over all web experience, near real time ranking allows for better access like the I-35w tragedy but conversely it also means that you must also have the ability to just as quickly remove content deemed undesirable, which I’m sure also excites your own group of spam fighters.

My soccer club is Benfica. And we signed Freddy Adu a few days ago. So I searched “Freddy Adu” everyday during the negotiation period. I did notice that on this keyphrase the results were quite recent. And the News feature at the top of the results was very useful.

I also found a annoying trend in the news business. Someone writes the original news. Then a few hundred other websites copy that news. Some add an additional paragraph. Others just copy the original news. And I’m not referring to rss feeds.

is there any connection between minty fresh indexing and sitemap in google webmaster tools?
i tried for a several times and noticed, that with sitemap included, brings spiders in a few seconds.. and in a few minutes the site is indexed on search..
is this in any relation with sites pr?

I have decided to come out from lurking and finally make a comment here.

I think i noticed this about a week or so ago, i was searching for news about Google Analytics and how there were some errors in the reporting. Completely forgetting to check the GA official blog i searched Google to find the first result had been posted and then indexed by google in less than an hour. I was very impressed.

One question though, is it only blogs which get indexed this quickly? And is it perhaps due to the blogs “pinging” Google to say that the content has been updated? I would like to see if this applies to a non blog site, e.g how quickly a major newspaper gets their pages added into the index?

I have an “Updates” section on my homepage that links to pages with significant updates. I wish Googlbot knew that meant I want them freshly indexed. Unfortunately, my homepage was spidered yesterday but my other updated page is still shown as the old version in Google’s cache. I suppose it may have been spidered but not updated in the cache. Webmaster tools doesn’t say when pages other the homepage were indexed.

I stopped using Google sitemaps because I had stopped updating my sitemap for a while and didn’t want to take the time to make sure I knew how to do it correctly, and I was always afraid that I’d forget to use the special filename that I thought up that would prevent private pages from being indexed. I also hadn’t heard Google sitemaps mentioned in a while, but after I deleted it I heard people talking about it fairly regularly, as if to taunt me. I may go back to it.

Philipp Lenssen, yes, having an RSS feed definitely helps in faster indexing. RSS is a fantastic tool for webmasters because it’s a standard (except for some feeds like E-Bay or Microsoft who always feel the have to extend the basic functionality with their own signature). I highly recommend using RSS on any site and I certainly have noticed that Google is doing an incredible job of picking up new content while balancing older estabilshed content.

Even though I’m a bit upset about the PR toolbar not updating yet to give that warm and fuzzy feeling webmasters love, I’m starting to learn to love a new warm and fuzzy feeling from seeing numerous article I’ve authored and up on page one in the Serps. I’ve also noticed that content may get indexed on first page, but when other more current content comes in it bumps the older content down. For me, as an obsessive blogger, I have to say big huge thumbs up to this new Google system. Excellent !!!

btw, thanks Matt for a SEO related post. I know you get tired liek all of us.

btw, did you see the spinoff of the lolcats and loldawgs internet meme some blogger created called “lolcutts” ?? It’s fairly amusing. You’d get a chuckle out of it…

Maybe I’m missing something but on the screencap of the Rich Skrenta search the time “8:44 PM” is listed next to his result. You mentioned that 8:44 was when you conducted the search on Google.com but why would the results page list the time that the search was conducted next to one of the results?

I still worry that it’s the RSS news feed that triggers the fast response but then both the news feed and the corresponding web page both get into the regular index. If this is true then this can create duplicate content issues and the possibility of the wrong one ending up in the Supplemental Index. If this is correct, then the robots should adopt some different strategy in their wanderings.

Sam wrote:
“but there is still plenty of work to do from what I’m seeing.”

Yes; as Google always wants to do better. You asked why your page was not ranked higher a post above that one so it appears that just because your page is not ranked, Google has “plenty of work to do”. Might I suggest you change your front page anchor text to read “travel guide” instead of “destinations” that is there now if you want to rank on “wiki travel guide”.

Matt… while reading your post I started thinking what would happen to search engines if they offered a “newest on top” search return. I am sure that type of algo would be very complicated and the amount of crawling would vastly increase to keep up but it my provide for some interesting changes in how websites are maintained.

The race for fresh content would be at the top of most everyones list and possibly have a natural effect of making spam oriented sites go away.

Overall I think faster indexing speeds are a good thing, however i worry over the number of times i have gone to bed and thought damn i wish i had not said that and then managed to change it before the spiders indexed the content.
I geuss it will teach us to be more diligent, no more live testing and most important of all – publish only whilst sober.

@ Dr. David Klein I am not sure what others opinions on the issue you mentioned are, but personally I provide a subnav or a crumb trail for extremely long pages. ususally key headings form the anchor text I target..

alwys keep in mind that your users come first – and such navigation helps them, if not search engines, but I have noticed a positive effect…

Matt, I want to catch up with you and Google more formally, but maybe you can clarify more now — because it’s pretty confused.

Google has a fresh crawler which has always put content into the index within hours, right? So you’re saying that crawler might be getting stuff in within minutes now, correct? For what percentage of the index?

Google also has a news crawler, separate from the fresh crawler, right? And news index content will flow into regular web search results independent of universal search news result insertion, correct?

Then you’ve got blog search, which is pulling in RSS feed content. That seems to be getting listing into the main search results as well, independent of the fresh, news or main crawlers.

So if we swing back to Rich — you make it sound as if Rich’s post hit the main index within minutes, because the main index is so fast. But really, it seems more like this…..

Rich did a post, on a URL independent of his blog’s home page, which populate out to Google Blog Search and might have shown up as a standalone page within the regular search results. Or not — it’s not clear.

You searched and found that the home page of Rich’s site was recently indexed. OK, home pages have long often been spidered on a frequent basis, but that’s not the same as getting the actual post page itself in (and that page might have much more information than shows on a home page, if a blog only shows a summary).

I’ve got a compilation of other things like this — where people assume the main index is minty fresh, yet when you poke at it, it seems like it might be more that the news crawler or blog search is inserting into the main SERPs some listings. Hence my confusion.

I think at least sometimes search query popularity may speed up things. A while ago when gazillions of Webmasters were searching for quoted phrases from the faked search quality emails I put up a post with the complete text. The *post*, not the blog’s homepage, got Web search traffic in under two hours (judging from the number of hits probably way faster but I didn’t watch my stats that closely).

I am just wondering i wanted to see when this post was cached and all i got was
“Your search – cache:http://www.mattcutts.com/blog/minty-fresh-indexing/ – did not match any documents.”
then how is it possible that it showed up in Google’s index?
How is it possible to be indexed without being cached?

Semi-related: lately more and more RSS feeds have been showing up in Google search results and they frequently have page rank. This is an oddity from one’s own site with the promise of duplicate content penalties, but when Feedburner has the feed it seems to be the equivalent of a scraper site.

As a result, Feedburner now offers a variety of options including a noindex tag on the feed page and various redirects on the RSS feed links, but there’s a severe shortage of real knowledge about what this all means and various theories abound. Could you possibly comment on this topic?

Danny Sullivan raises some good questions. When I make a blog post (WordPress), in most cases content appears in the index rather quickly. When someone makes a news post on our site its typically indexed in Google News within minutes (been like that for a few years). When I add or update an old-fashioned web 1.0 static HTML page I don’t notice the same results.

This is only tangentially related to your post, but I thought other Reader users may appreciate this comment too. You don’t actually have to click the “Refresh” link when the “new version available” message shows up in Reader. A regular reload later (when you’re not in the middle of an item) will work just as well. Looks like we should tweak the wording on that message

Matt,
It’s great the speed with which new content is indexed.
I see it especially on my blog, a huge advance on times past.

Sadly though, it seems a shame that so many older pages are now effictively missing in the supplemental, it seems the old is being thrown out for the new which is a shame as I see so much quality content effectively being dumped.

Barry, remember that you’re a real power user though; many users don’t care when exactly something was crawled and refreshed, and you don’t want to overwhelm people. I noticed this with my iPhone this morning. I was calling a colleague and my iPhone said “64 contacts.” I was thinking to myself “Okay, that’s 64 people, but I have multiple phone numbers for some people. I wonder what the total number of phone numbers is? I wish my iPhone said ’64 contacts, 100 phone numbers’ or something like that.” I was thinking what I wanted the UI to look like when I realized that I was an atypical user. Apple chose to show only the important info to keep from cluttering the interface. Bringing it back to Google, you can always access the cached link and look in the message at the top to find the crawl time, so it’s possible to get the info. It’s debatable whether users want that extra information or whether it risks cluttering things.

Philipp Lenssen, I honestly haven’t talked to that team to see if RSS/pings are a factor. I just noticed the freshness as a user and thought I’d mention it.

Danny Sullivan, I wouldn’t claim to be an expert on this particular topic, but remember that one of the points of “Big Daddy” was to unify our crawler. Crawling from main web search, blog search, image search, etc. all go through a single crawl interface. My understanding is that the main web index can be minty fresh while being separate from (say) the blog search.

jeff Hall, it’s good to remember to ask “Do I really want to say this?” before putting it live on the web. There’s always the url removal tool in the webmaster console, but it’s hard to take things back once it’s live on the web.

Michael Martinez, thanks. I agree that savvy folks have noticed this already; I just wanted to add a little bit of context.

Mihai Parparita, many thanks for stopping by to clarify. That’s great to know, and I appreciate you mentioning it. That will save me a little bit of stress the next time I see that message.

My result is one – three days… it is poor result. I think that google engineers crawl sites from google reader and feed burner faster then other. First crawler read sites which read more people, and after – some peoples.

Your answer to Danny Sullivan may be of help. You said, “My understanding is that the main web index can be minty fresh while being separate from (say) the blog search.” Does that mean that the main web index does not contain RSS newsfeeds, which are the dataset for the Blogsearch as I understand it?

I hadn’t read in here for a while, but it’ s pretty ironic – a few weeks ago, I looked up something as a general search and first 2 spots were both posts that had been up not even an hour or so. I knew this kind of indexing has been happening faster and faster, but we blown away by just HOW fast the index results were showing up… Always being impressed.

Sam I Am, If a SERP I’m looking at shows a fresh tag, it proves to me that it’s fresh (not that I really care). In regards to “always” I used the word to descibe the Index as a whole, not 1 particular SERP.

David, if by “green” you’re talking about money, I’m sure that money isn’t involved. I suspect that the crawl folks try to do some really smart things to update the right pages at the right time. If you want to read more about the idea of smarter index updating, there was a non-Google paper from several years ago that discussed a few of the issues with updating search indices. Let’s see… ah, here it is:http://portal.acm.org/citation.cfm?doid=857166.857170

It’s unrelated to Google, but I remember enjoying that paper back in the day.

BTW, the “minty freshness” is a bit of an inside joke/allusion for old-time SEOs. Back in 2003, GoogleGuy used that term to describe the daily update at Google, so I thought it would be fun to use that term again.

“BTW, the “minty freshness” is a bit of an inside joke/allusion for old-time SEOs. Back in 2003, GoogleGuy used that term to describe the daily update at Google, so I thought it would be fun to use that term again. :)”

Great job engineering team! Though users immediately spot new features, the behind-the-scenes engineering can make the overall experience much better. For example at Wikipedia it is major experience bonus to see the change you make immediately on the website, your search example is awesome in very same way.

Of course, this may take resources to execute, and one has always weight, what has to go in, what should not – but thats tuning for you, to make the web better, not necessarily visible immediately. Thats what engineers are for

Dave (original) So does scope or listerine, one of those mouthwash products anyway or was it doublemint gum ?? Wow, my latest blog post was indexed so fast it’s more than minty fresh, it’s Googlicious !!!

hi matt ,
this is all you talking about the pages which have higher page rank that google crawl frequently but what about the other ones. but still i am looking that now a days google crawling is frequent that before .

I very rarely disagree with you, but this is slightly misleading. While the current indexing is faster and more powerful than it was way back in the day, it’s not “minty fresh”…yet. The problem is that you’ve used a relatively small dataset to come to your conclusion, and a dataset containing 2 of the more popular sites online.

If you could come up with a way to index newer sites that don’t necessarily have the authority or the inbound links of established sites, that would really help. My own experience suggests that at the present time, it’s between 12-24 hours…whichi is very good, don’t get me wrong (about 1-2 weeks faster than MSN and about 4-6 weeks faster than Yahoo!). But there’s still work to be done (e.g. consolidating multiple search sources such as Blogsearch into the main index…that would be a very good start).

I’m not really complaining either, although it sounds like I am. I’m just pointing out that this is a good step, but there’s still a fair chunk of work to be done yet.

The question can be general, and in trying to find an answer I found this post. My question is how to resolve a unique “Google Dance”. I am asking in general terms were to look and where to go. One day are search results are fine the next day they have disappeared. a day later they are fine. A week later the problem starts again.

[quote]Well we have notice this on blogs and News related website. they keep indexing on google very fast.

But, for other type of website, this fresh index it doesn’t work as you say.[/quote]

Dario,

Is this not logical – the nature of blogs and news sites is more immediate and with other types of sites content may be updated or added to very infrequently depending on purpose.

So it is entirely logical that the former gets spidered at warp speed and the latter not.

Any bot has “x” resources at any given point in time – so if it was my bot it would be visiting a worthy news site continuously, and a more static site as often as “necessary” – thus making best use of available resources to serve up what the user requires.

Why doesn’t Google provide minty-fresh pagerank in the toolbar or the other power user features such as link:, related:, etc.? Why quarterly? Is this more an issue that people could then tweak their tactics to their advantage? I probably answered my own question huh? 😀

Remember the old Infoseek? They used to let you update by e-mailing them up to 50 URLs at a time, and those URLs immediately went to the head of the search pile since Infoseek was big on freshness. At least I think it was Infoseek, the 90’s are such a blur. My memory puts Infoseek between Webcrawler and Alta Vista on the search engine parade, before Google came along and rained on them all:-)

Google’s Blitz crawl/indexing gonna put tremendous pressure on ethical SEO for sure. Now we need to do more test on our SEO work at test environment before uploading anything, because less than 30 minutes later the whole thing would be crawled and indexed.

Can you imagine those SEOs working on large news sites. Now they need not only to educate web developers on Google-Blitz-SEO but also those “proud” journalists too. What a nightmare!

Please give us back the good old days of Google Dance. No more Mmmm–minty freshness for me

Remember the old Infoseek? They used to let you update by e-mailing them up to 50 URLs at a time, and those URLs immediately went to the head of the search pile since Infoseek was big on freshness. At least I think it was Infoseek, the 90’s are such a blur. My memory puts Infoseek between Webcrawler and Alta Vista on the search engine parade, before Google came along and rained on them all:-)

Matt, I have noticed a very fast reaction of G on some of my articles too, but something seems to have changed. new articles, if relevant or not for some keys do seem to be listed higher at positions than elder articles which are more relevant to the topic?

This is something that we discussed last month (see http://www.browsermedia.co.uk/2007/07/19/is-google-on-speed/) and it seems to be gathering even more speed as it took just over a minute for a news article we posted this morning to appear in no.3 slot in the SERPS for a relatively competitive phrase.

Whilst this is indeed very impressive, it does make me wonder whether it opens up the door for more spammy results as it must be a whole lot harder for Google to work out the degree of authority / quality that the page must have (since it is unlikely to have built up any links in this short time).

It can only mean that your overall domain authority will become increasingly important as I can’t see what other factor Google can consider when deciding where to rank brand spanking new pages?

It is great that it encourages sites to keep content up to date (as we have generally seen that you rocket to very high positions then fade away) but I just hope that we don’t see the SERPS full of spammy pages created on a regular basis just for ranking purposes.

I like the fast updates. I was looking for info on Mike Vick at the top of the G results I saw fresh news and Web sites. It was a good combo. Google is getting smarter all the time. Some day we will just think about what we want and it will pop up in our Google hologram.

It’s been over three weeks since my consumer protection page was cached. It has a PageRank of 3 and doesn’t change often, but I don’t think Google is at the point of minty freshness yet. Maybe fresh as a slightly speckled banana, when it comes non-blog, non-news pages.

You know what would be really cool? If every page online is checked every 5 minutes if it was updated. Then we can optimize, upload, check, try something else, upload, check, etc. There will sometimes even be time to have a coffee when waiting for the next update….

Be prepared for new PageRank like scenarios when you start updating that often,..

Now a lot of us dedicate much of it to online activities such as blogging, networking, IM, email, browsing, the list goes on. As a society, are we starting to forget “real life” and are constantly distracted by the new era of the Internet?

Blogging too much?

My opinion is that the Internet provides us with a fantastic means of communication, however at some point we all need to allocate when and how much time we devote our efforts to building and maintaining networks and online diaries – at the cost of our family life in some cases.. How many times has your partner asked you to leave that damn PC alone? I know mine does.. What about kids, do they understand the addiction? Is it an addiction?

Are we well and truly of the WWW DOT age now, where it is such an integral part of our lives that it has to be factored into day to day living?

In less complicated terms, I would ask you. Who gives a fuck? We are here because you are a Googler, we think you may give us insight, reality dictates that you don’t… Interesting as your blog may be, all I want is information, not which harry potter book you last read, not what the last meal you cooked for your wife was, just tell me, now, please, the secret to all things Google….. Do we care? Obviously yes.. Do we? Do we? Do we?

It’s the magic ingredient.. Oregano in a spag bol, you are the spoiler,. the maker and the reason to exist. However, how much does Google pay you to blog? I do it for free.

I noticed this a couple of months ago. I keep a site which is basically made up of static pages. Then, I decided to add a blog. Within a month all my blog posts (which weren’t a lot) got indexed and were actually ranking higher than the main site. Some of my main site’s pages still haven’t been indexed to this day.

Great job Googlers! I always keep an eye out about how Google ranks sites. The minute I noticed it, I started recommending blogs to my optimization clients.

Yes, Google is getting very very fast. I wanted to call someone yesterday who was trying to sell their home using another Realtor. Their home listing had just expired yesterday, and just after 7am they posted their house FSBO on Craig’s list. I Googled their phone number from the MLS sheet at about 7:50am and not only the phone book entry showed up in the search but the Craig’s list listing showed up from less than a hour. Very Very Fast!!!

But Google is always on a “best-efforts” basis – unacceptable in the real world of professional computing. Try using Google Webmaster pages when the server farm is busy – no error messages, no “sorry we’re busy”, just *false* *results*. Quietly presenting *the* *wrong* *results* BAD! Real bad….

Google guys – learn something about the KPIs that real world computing needs. You cannot be smug here. Otherwise you’re objectives of getting professional users to entertain the use of your service are doomed to failure.

A question for Matt or others: I too have noticed Google’s fast indexing; often things that I post to my website are indexed within a few hours. But something has changed in the last week and I’m not sure what it is. My ‘news’ page, which I update most frequently, hasn’t been indexed for a week now. I realize (and hope) that this could just be some erratic behavior in the indexing algorithm, but the superstitious mind grasps for reasons. I’m hoping that someone can put my mind at ease.

Two things make me suspicious that the delay isn’t just accidental. The first is the last post Google indexed had a link to pictures to the Madison, WI farmer’s market. I called it “Peace, Love, and Food Porn.” Is there a chance that this would make Googlebot think the post was spam, and that would slow down future indexing? The other possible culprit is that the Googlebot hit a bad link in my sitemap, the result of a tag that no longer exists. Is there a chance that this has slowed things down?

Thanks for the great information. As a novice to the world of SEO – I really appreciate you offering this kind of information. Since there’s so many “SEO experts” out there today – I never really know who to listen to. However, you are one person I don’t have to think twice about listening too.

Great post. Glad to see that Google is indexing with great speed. It gives webmasters more incentive to keep their content fresh. Although in the past few months, I have noticed updates in days which wasn’t bad either.

hey matt , i am just looking your websites new post (closing the loop on malware)and i find that your website is frequently crawled by google but for other websites it takes a lot of time to crawl, is their any secret formula behind this .

In 2000, I know more than one search engine that had some or all of their indexes out of date by more than 6 months. For example, the one for which I was responsible… I watched in dismay as our various teams had “fatal” problems that prevented our search engine (I won’t say the name if you don’t “ask”) from getting updated for a good, long, long time. Can you say “serial dependency == serial failure == failure”?

Our (abject) failure to produce a remotely fresh index, however, resulted in a surprising outcome: we were also free of many of the simplest (at the time) “spamdexing” tricks. You know, cloaking, jump pages, etc. Sometimes being behind the curve lets others like Google learn how “sharp” the curve is. Mostly though, it’s just lame to be behind.

Incremental indexing is a neat trick. Incremental, prioritized indexing that works properly, however is pure magic. Knowing what will have changed is easy; guessing at what probably changed not that hard, but doing it in such a precise way as to be able to keep up in most cases is the magic part. Sitemaps, of course, is the only (putatively) deterministic approach to the problem (assuming of course that all webmasters are honest :-). And to Google’s credit, Sitemaps continues to grow and become a resource for webmasters that totally justifies any minimal effort required to support it. Google has, in the last years, completely mastered, and yes, finessed, a problem that was just a few years ago nothing more than simple, brute force. Wow. I continue to be in awe.

Right on!

(P.S. It could be a subtle “cause-and-effect” indicator that my first attempt at posting revealed an error in my addition of 7 + 9 :-{ )

We get postings all the time from people complaining about how we are screwing them and keeping them out of the Google directory — when in fact, Google has not done an update from us in more than a year.

That’s a DMOZ editor commenting, and a guy who actually has some semblance of a clue (unlike most of them).

It takes about one day with my main website with my websites that are four months old two – three days. I imagine once I rank higher my postings will appear quicker but I cannot complain it’s still fast.

I agree that search engines are now much more relevant than at any time in the past, except for one element in the algorithm: domain name. MSN is the worst offender for this, with Yahoo! not too far behind. Many people are simply buying in keyword-rich domains to get to the top of the engines fast.

A new site for property bangkok got to the #1 position on both engines within weeks of the launch purely because this element: the SEO is extremely weak. This site is nowhere on Google and probably will never so.

However, perhaps I’m wrong because on a recent search on web design bangkok, our main key phrase, the domain webdesignbangkok is at #2. This is a site where the hosting has expired and has very few backlinks.

I’ve been noticing this lately too… indexing does seem to be getting faster and faster. I’ve noticed some of my blog entries on my sites showing up in the search engines (well Google anyway) almost immediately after I post them. That’s really cool. It’s especially good of course for people who do a lot of current events type stuff… not really my bag. Maybe it should be?

Being somewhat of an old timer, I finally understand the benefits of blogging, though I’m yet to use it on site. What concerns me however is that if blog posts are being indexed this quickly, would it not be theoretically possible to cull a web site by using a blog?

I don’t see your point multi-worded Adam, I am talking about how often the Google Directory is updated, not beeing kept out of it.

The DMOZ editor is saying that the Google directory hasn’t been updated in over a year. That’s why it’s there.

Hey Matt, I just realized why I had the issue with “minty freshness”, and I’m wondering what the effect would be on your blog. The issue I have is because I usually prewrite my posts the night before, as opposed to publishing when I’m done writing. While the site is submitted to Blogsearch, and it appears to update at the time of the post, the main index takes a few hours from post time.

I’ve done a few posts where I published immediately and saw the post in SERPs within minutes.

If you (as in Google) could merge your Blogsearch results with your main results, that would likely solve the problem. Just a thought.

Hey Matt – Wow, what a great article. I started noticing the quicker indexing of Google… read your article… tried it – and the rest is history !!! Google seems to almost be running in REAL TIME – that quantity of data is fairly cool – minty fresh cool…

I think there is a lot of theory regarding the topic but there’s also a fair amount of luck involved. It’s like the stock market, pretty much everyone speculates and has ‘theories’ as to how it works, but at the end it is not much more than a gamble after you’ve done ‘the basics’. There was a brilliant book written about this topic actually, called ‘fooled by randomness’.

My last site (which launched only months ago) indexed very quickly but my newer site (www.APigeonCalledFrank.com) is having some difficulty, despite them both having the same kind of articles.

One area of index updating timliness is that older content is not negatively scored in search results. So many times I have searched for information on something and not only can’t I tell if the source is new but where is the latest information.
Things like news, reviews or product information from 2000 will be at the top of the search results because it has been around and linked to for a very long time. Yet the shorter the length of time the website has been around, is a negative when it comes to indexing updates.

I heard a lot about ‘Google dance’ but can’t actually understand one thing:
if a site’s internal page dropped from index, does this mean that its PR won’t be restored in near future? Don’t judge me strict, I’m a technical newbie

I noticed too that goggle seems to index the web pages a lot faster. I recently submitted a few pages and were indexed within 48 hours. I am not sure if this happened because there is a high page rank web site linking to it.

Matt, that’s good for everyone when Google updates their index more frequently. My experience with Google’s index rate on Blogger posts is that they are indexed within 30 minutes. That’s awesome! However, I have noticed an exception, that I would be interested in reading your comments about.
Here is the exception: When the blog filename is changed when using the “publish to FTP” option, Google seems to “forget” to do any subsequent indexing. In other words, if the initial blog file name is changed, and Google indexes it, what happens when additional posts are made using the new filename? For example:

Great that that particular post was indexed so fast. But not all of us webmasters have the same experience online. The site you mentioned and that got a post indexed so quickly must be one that already has a lot of authority with the search engines and is therefore crawled frequently. The big question is what does it take to take a one man blog to such status

This article is now news again!
At Google Zeitgeist 2009 Larry Page actually said that he always thougt Google needed to index the web every second to allow realtime search. And when he told his team they laughed and said that a couple of minutes should do. Larry said no, we need to index the web every second. He stated that now with twitter indexing realtime his team get’s the idea.

Google will be launching Google Wave and it has a feature where you see people typing the words.
So I guess this is what Larry means by realtime indexing.

My blog (started Jan 09) was being indexed really fast, usually within the hour. Then in around the middle of May and for no reason I can see, it suddenly stopped. Since then my traffic has plummeted. Google Webmaster tools is reporting no problems whatsoever. My blog is all original content and I’ve spent five weeks trying to work out what has gone wrong. I am still clueless. Can anybody help?

For the under half hour indexing, google has to love your blog so much for it to be indexed faster than your stomach can crash that apple. A track back into google’s indexing history says a lot about where we are headed

My interest is in the technological perspective. How does Google manage to update their index so fast given there so many websites out there that need to be sorted out?
As a Google insider, perhaps you could give us some insight. 😉

Matt
There has been a lot of discussion lately on Webmaster Help Forum about severe indexing problems of blogs that were just fine a week ago. Google search by a quoted phrase returns the Home Page only, whereas ‘site:domain URL’ search says the post IS indexed. On most occasions, it has to do with duplicate content on scraper sites. It appears as though Google were having problems determining the original content source and it even throws original content websites from its index.
This is getting widespread.
Could you please notify us what’s going on in this respect.
Thanks in advance!

Great post and I think that since blogging has become so popular that most website frameworks will be redeveloped using them because of their indexing power. My personal company site which was built on a popular blogging platform totally got indexed before another website of mine that was just built on “the norm” platform. Do you think that the use of worpress, drupal, and joomla will take over the use of normal coded sites? Thanks for the great post as I am an avid reader.

How will this new template for a website be outdone? What is the next step in the evolution? I feel that we need to get away from the .com and go with more industry specific URL’s. I noticed that only .com and .gov URL’s usually come up in general searches. No .net and .co and so on.

I feel I have been a part of the creation of the web. I remember as a kid seeing my best friend’s dad on web back in the commander 64 era. Then I remember creating my 1st website back in late 1990s. When go.com, yahoo.com, excite.com, etc were flying high, and then boom Google powering the search on Yahoo (Shame on Yahoo for not forcing some major ownership deal – LOL) then boom this basic store front was ruling the world on search engines.

Now we have crazy indexing theories where we use back linking, Domain Authority, Page Authority, etc. The funny part is people are always trying to crack how Google is indexing. At 1st we had to buy links, etc- then Google found a way to kill those, etc. I just wonder sometimes when you are an artist if it is just smarter to stop, step back and look. What is the purpose of Google, and how can it accomplish what it wants without making SEO people go crazy. How can they have the best search results in the world to have a happy base while yielding the financial gain they desire from their happy clients. I am one to say I am excited about their new gmail ad system- yes, why not ask what ads we want to see – just because I am reading about a mortgage in my email does not mean i want to see mortgage ads. I might not even be looking for a mortgage for myself. So get to know me by asking, then watch me- i am fine with you watching me. i want you to know me, so you can better help me. Now back to Google’s engine.

I hate it and I love it. I have tried to leave it because it is so big and powerful, but truth be told I love the results way better than Bing. I like the bane Bing as it reminds me of Southwest Airlines- Bing you are free to search the web- Hey Bing make sure you pay me for that one liner. LOL.

I do not think Google should give much wait to anything but .com, .gov, and .org, because those three basically and honestly cover it all. In my profession I see people with the name domain name, but one is .com, .net, .co, etc and I find that to be such a mess. The last thing I want to do is type a name I think is it into Google and have a dozen to choose from. I want to go with the Original one.

Early in the last decade, we observed that our website pages were indexing in Google searches after a long time of about one month, and sometimes we have to submit it to index by ourselves. Next Google started indexed these pages past on weekly basis and now it is really amazing when we launched a new website i.e. Compare Panda, we see new pages are indexed so fast because changes take place on daily basis in Google searches. Google automatically include newly updated pages and If we submit updated page in Google webmaster then it take only 1-2 hours to be indexed in Searches. I really want to thank Google Indexing Team!