Sat Feb 21 2015 21:34Minecraft Archive Project: 201502 Capture:
I've done a new capture of data for the Minecraft Archive Project, my big 2014 project to archive the early history of Minecraft before it disappeared. My goal for the refresh was to capture what has happened in the past year while doing as little work as possible, and I met my goal. The whole thing took about two weeks, and most of that was a matter of letting things run overnight. Most of the actual work was refactoring the code I wrote the first time to make future captures even easier.

Top-line numbers: I've archived another 150 gigabytes of good stuff, including 18k maps and schematics, 1k mods, 11k skins, 7k texture packs (resource packs now, I guess), and 100k screenshots. I was able to archive about 73% of the maps. Four percent of them maps were just gone, and 23% I didn't know how to download.

The 201404 Minecraft Archive Project capture contains data from four sites. The new 201502 capture is limited to two sites: the official Minecraft forum and the huge Planet Minecraft site. I started archiving maps, mods, and textures for Minecraft Pocket Edition, and was able to pick up about 5500 MCPE maps.

Now that I've done this twice without getting into trouble, I'll give a little more detail about the process. I've got scripts that download the archives of the Minecraft forum and Planet Minecraft. I find all the threads/projects modified since the last capture, download the corresponding detail pages (e.g. the first page of a forum thread--I'm only after the original post), and extract all the links.

Then it's a matter of archiving as many of those links as possible. I've written recipes for archiving images and downloads. These six recipes take care of the vast majority of items:

Two file hosts: Mediafire and Dropbox

Four image hosts: imgur, Photobucket, TinyPic, and postimage.org

There's also a general catch-all for people who host things on normal home pages, as Tim Berners-Lee intended. If your URL looks like the URL to an image or a binary archive, I will ask for that URL. If you serve me the image or the binary instead of an HTML file telling me to click on something, then I'll archive the file.

I decode most link shorteners except for the ones that make you click through ads, mainly adfoc.us and adf.ly. The 2014 archive had about 18,000 maps behind adf.ly links, and I spent a lot of time running Selenium clients clicking through the ads to discover the Mediafire links. I think that took a month. This time there were about 3000 new maps behind adf.ly links and I just didn't bother.

There are two big blind spots in my dataset, and they're the same as last time. One is mods. A lot of mods are hosted on Github and CurseForge, two big sites I didn't write recipes for. There's also the issue of mod packs, which have been steadily growing in popularity and complexity as development on core Minecraft winds down. Thanks to things like the Hardcore Questing Mod, modpacks are entering the "custom challenge" territory previously occupied solely by world archives.

There are sites that list mod packs (12) but I don't want to spend the time figuring out how to archive all the mod packs. There's also the problem that mod packs are huge.

The second blind spot is servers. It's theoretically possible to join a public Minecraft server with a modded client and automatically archive the map, but realistically it ain't gonna happen. I complained about this last time, but now I've done an assessment of what's being lost.

Planet Minecraft has a big server list that mentions the last time it was able to ping any particular server. There doesn't seem to be any purging of dead servers, so I'm able to get good measurements of the typical lifecycle.

Of the 136k servers in the list, 12k are "online" (The most recent Planet Minecraft ping was successful). 51k are "offline" (Most recent Planet Minecraft ping failed, but there was a successful ping less than two weeks
ago) and 73k I declare "dead" (last successful ping was more than two weeks ago).
It seems really weird that of the nearly half of the 'offline' servers went offline in the past two weeks, so something's going on there; maybe Planet Minecraft's ping process is unreliable, or it just takes a long time to check every server, or servers go up and down all the time.

Anyway, the median lifetime for a public Minecraft server is 434 days, a little over a year. These things go online, people do a bunch of work on them, and then they disappear. I've kind of gotten to 'acceptance' on this, but it's still obnoxious.

One final thing: I thought I'd check if I could see the result of Mojang's June announcement of rules for how you can make money by hosting servers (and, more importantly, how you can't). I wanted to see if these rules had a chilling effect on the formation of new servers or caused a lot of old servers to shut down.

And... no, not really. Here's a chart showing two sixty-day periods around June 12, the date of the Mojang blog post. For each day I show 'births' (the number of servers first seen on that day) and 'deaths' (the number of servers last seen on that day). There's a drop-off in new servers around the end of July, but then it picks up again stronger than before. I don't have an explanation for it but I don't think there's anything in here you can pin on a blog post. The Mojang rules were probably intended to go after a small number of large obnoxious servers, and everyone else either doesn't care or flies under the radar.

(Screenshot is from World #57 by Art_Fox. I didn't archive the map because it's behind an adf.ly link, but I got the screenshot.)

PS: Congratulations to Anticraft, the oldest public Minecraft server I could find that's still online, added to Planet Minecraft on February 28, 2011.

Update: I fixed up the adf.ly code and let it run for another two weeks (!), saving another 2000 Minecraft maps and 700 MCPE maps. I probably won't do this again because it's a huge pain, but I said that this time and ended up doing it out of some sense of obligation to the future, so maybe obligation will strike again, who knows.