Majestic Million CSV now free for all, daily.

Today we are releasing Majestic Million CSV FREE on an updating basis, for anyone to download and use under a free creative commons license.

Way back on Christmas Day, we released a copy of the Majestic Million database as a one off. We decided it was time to do something for FREE again that would help the web-o-sphere and thought that updating this for you every day and giving it free would fit the bill 🙂

The Majestic Million is a list of the top 1 million website in the world, ordered by the number of referring subnets. A subnet is a bit complex – but to a layman it is basically anything within an IP range, ignoring the last three digits of the IP number.

When we launched the Majestic Million, it did get a good reaction – but we have not seen it used as prolifically as we would like. We hope that making the data freely available in a CSV, programmers and API fanatics will download this on a regular basis and use it for interesting new ideas.

If you want to see what data is available in the Majestic Million download, I would urge you first to go and play with our own web implementation of Majestic Million, because a CSV file with 1 MILLION LINES is NOT something you want to download on a whim! However, recent versions of Excel can cope with 1 million lines if you have a computer with enough memory to handle it. Most people, though, would want to import the data into a database first.

Regular Updates

We have set up a CRON job so that every day we will recompile the data, which is based on our Fresh index. It is POSSIBLE that on two consecutive days, the data is the same, if the Fresh Index did not update for some reason, but usually the data will change daily. Please do not try to download the data more than once every 24 hours, otherwise you will simply end up getting banned or we will have to reconsider giving the data away or putting it behind a walled garden.

Download Location

Be wary scrapers… this will do your head in if you weren’t expecting it… The Majestic Million CSV can regularly be downloaded here:

Please – if you have made use of this data for a benevolent purpose, please mention it in the comments. We are dying to see what you use it for. We cannot promise to publish all uses and the CSV is of course provided without warranties or support unless you are on a paid plan for other API options.

Comments

I’m not completely sure I understand what the exact criteria are for saying these are the top million sites by subnet, i.e. determined by number of incoming links? I ask as many casual readers may assume that traffic (visitors or page views) is the determining criteria.

As a side note, LibreOffice’s Calc also supports 1 million rows, although I’ve not yet actually put this to the test. I suspect that neither it nor Excel would be happy with column headers + 1 million data lines :-).

I think there is a blog post about why we chose subnets about a year back – but in short, when looking at which sites may be creating the most “waves” or “influence” in the world, raw link counts are not ideal, because sitewides can distort rankings. Similarly, we found some servers may have thousands of sites/domains linking to a site that are controlled by the same person or group, so some false positives there as well. By ranking based on IP Ranges, rather than others, the list looks more robust.

On the traffic front, we are measuring something different – and I think that whilst inevitably one would expect a correlation between traffic and this order, it is ultimately probably the outliers (in either direction) that would be of interest. I wonder who will be the first to highlight interesting aspect of outliers between our list and (say) Alexa’s list?