The November Mozscape index is launching a few days later than scheduled. A miscalculation in the amount of crawl data initially included, and the fact that our crawlers are extremely efficient, led to our first index attempt this month to be about twice the size of our 77 billion URLs goal. Had we not made this miscalculation, we would have been able to hit our original release date of 11/5, but restarting the index caused our release date to slip a few days.

Another hiccup we ran into this month was processing 76 billion URLs. It took a bit longer than our previous October index, which was only 55 billion URLs. This became glaringly apparent in one specific step of our index processing. Periodically throughout processing, we checkpoint the files that have been processed so we can roll back if something catastrophic occurs (a machine failure, file corruption, etc.). With the larger index this month, these checkpointing steps were taking noticeably longer; in some cases, it took days to checkpoint some of the larger steps. Thanks to the genius engineers on the Mozscape team, Martin and Brandon were able to come up with a solution that drastically reduced the time spent checkpointing. With Martin's update to the processing software, the time spent in some of these steps was cut from days to just minutes! Once again, taking a step back brought the Mozscape team two steps forward.

The Mozscape team is continuing to make some significant progress finalizing our private cloud solution in Virginia. We are on track to have indices produced in both the AWS cloud and our own private cloud by the end of the year. After a successful test index completed, the first Mozscape index is now in progress, running in our own private cloud. It's an exciting achievement for the Mozscape team!

Here are the metrics for this latest index:

76,734,608,461 (76 billion) URLs

776,343,422 (776 million) Subdomains

134,499,372 (134 million) Root Domains

878,838,592,381 (878 billion) Links

Followed vs. Nofollowed

2.69% of all links found were nofollowed

56.69% of nofollowed links are internal

43.31% are external

Rel Canonical - 13.65% of all pages now employ a rel=canonical tag

The average page has 71 links on it

61.28 internal links on average

10.13 external links on average

And the following correlations with Google's US search results:

Page Authority - 0.36

Domain Authority - 0.19

MozRank - 0.24

Linking Root Domains - 0.30

Total Links - 0.25

External Links - 0.29

This histogram shows the crawl date and freshness of results in this index:

The freshest data in this index will be from October 16th (when processing began), and a good portion of the link data will be from late September to mid October. This index will reflect link data that dates back to about mid-September, but the majority of this index will be the first few weeks of October. As we continue to improve on the length of time it takes to process an index, this freshness will keep improving!

Another exciting announcement is our new App Gallery that launched a few weeks ago. Check out all of the great tools our users are building on top of our Mozscape data. If you have a free tool that you would love to see added to this page, submit a request to have it added to the gallery - we'd love to hear about it!

As always, we'd love your feedback. Hope to hear from you in the comments, where the big data team will be reading and responding as usual.

I'm always excited to see a new mozscape update. I think the best is that you guys were able to do it $600,000 cheaper this month. I can't wait to see what happens when you fully migrate to the hybrid structure.

We are working now on improving Page and Domain Authority and hope to have something out near the end of the year/beginning of next year. That said, we noticed that all of our domain metrics dropped and all of the page metrics increased, likely due to Google algorithm shifts in the last few months, so the correlations may still be lower then before. I blogged about this a few months back if you'd like to read more:http://www.seomoz.org/blog/mozscape-correlation-analysis-google-algorithm-changes

Yep - we do have some big changes planned for the future - we've got some really smart engineers working on the Mozscape project!Cutting down our processing time in order to provide fresher index data is always one of our top priorities. Historically we've been releasing indexes every 4 weeks, but we're working on making those releases more and more often. The ultimate goal is to work toward more of a live index, but that probably won't be available for 9 month to a year.

Carin Really impressive statics.Great to hear that news you are making plan for a release all least metrics in every 4 weeks. I am waiting for that And Please publish an Infographic once in month. Once again Thanks a lot Carin.

Thanks a lot for the index, info for the update and light into your inner workings.

For every update I see, I always wonder how to interpret the correlations of the index to the Google US results. I tried to find some article about it at Seomoz, but couldn't. Please direct me to it, if available.

In finance and other fields, the Correlation coefficient ranges as so:

1: Perfect positive correlation, the two move in lockstep.

0: No correlation, the two will move at random to one another.

-1: Perfect negative correlation, the two move in opposite lockstep.

These are the current index correlations to Google US results:

Page Authority - 0.36

Domain Authority - 0.19

MozRank - 0.24

Linking Root Domains - 0.30

Total Links - 0.25

External Links - 0.29

Looking at the correlation coefficient, I don't really see that much of it. I would think your goal would be to get close to a 1 or perfect correlation, so that the index results are a better mirror of how Google sees our site, right?

Would it be possible to have Seomoz take on this?

Also. Is the negative sign in front of the correlation numbers just a separation character? If so, it would be wise to try to use another character like ":", so that there is no possible confusion.

Yes - those hyphens are just separation numbers. In terms of the values, basically, a correlation with rankings of 1.0 or close to that would suggest that the only factor in Google's entire algorithm is based on relatively simplistic link metrics. We use correlation more like an indication that we're crawling good stuff and that our algos for PA and DA are approximating what Google thinks is important in a page's link profile. Hope that helps!

Been waiting for that update, cause ive done a lot of hard work this month!I have a question: Is the basic rule for the mozscape update Once a month?Or is it more random and could happen even twice a month?

Hey Yoav!We plan for a release at least every 4 weeks, unless something knocks us off schedule, like this month. We're continually working to decrease processing time - bigger hardware, improvements to our software, etc. Once an index is complete, we'll release it - so if things run smoothly and it finishes up in 3 weeks, we'll release a week early so you guys can get fresh data!

Awesome!In the far future, do you think the mozscape index could be as dynamic as weekly updates?I've tried every index in the book, nothing measures a site's trust and authority as good as MozAuthority.

Great to hear about our authority metrics!! We do have plans to have index updates as often as weekly, or even more often, in the far future. It's a big task, and not one we can accomplish with the current software we have in place, however, we've got some really amazing Mozcscape engineers working on our solution for the future!

What tools are you using besides the Mozbar? The Mozbar, Open Site Explorer and the PRO campaigns are all pulling from the latest index, but if the applications you're using are not managed by SEOmoz, there might be a delay while they update their application. Are you still seeing the discrepancies?

Hey there!Unfortunately, http://moonsy.com is a third party application that we don't manage - we don't have any control over when they update their application with our new metrics. The MozBar and Open Site Explorer are SEOmoz managed tools, meaning they will immediately update with a new index release. Hope that helps clarify - feel free to reach out our help team, help@seomoz.org, if you have more questions!!Thanks,Carin

All sounds great. It would be pretty cool to have some of those key metrics charted over time for some perspective. The fact that you have indexed 76,734,608,461 URLs and that 2.69% of those are nofollow, is mid boggling (and will probably be an infographic by morning). However some perspective on how that compares to previous crawls would be good.