Manhattancraft: The quest to make a full-size city of Minecraft blocks

A server cluster, Python, and a sandbox game come together to rebuild New York.

The current version of Christopher Mitchell's Minecraft version of Manhattan.

Christopher Mitchell

Building a model of a well-known physical form in Minecraft is an old fascination for us by now. Early in the days of the game's meteoric rise in popularity, we were stunned by scale models of the USS Enterprise, the Taj Mahal, and the underwater city Rapture from Bioshock.

Further Reading

A student's years of coding work allows us all to calculate with portals.

But even beyond those massive structures, there are still limits to push. And computer science PhD student Christopher Mitchell has found one: a 1:1 model of the island of Manhattan, down to perfect replicas of the individual buildings.

Plenty of video games are set in New York: Crysis 2, Crysis 3, Spider Man 2, Grand Theft Auto IV, and The Godfather II among them. But even when their versions feel right, they tend to be highly stylized and compressed, unable to commit to the full scale of the city. By harnessing a significant amount of processing power and a number of algorithms, Mitchell hopes to eventually create a much more faithful portrayal, albeit one composed of parts that Minecraft players are normally using to frantically defend themselves against Creepers.

Mitchell first posted about his project a year ago in the Cemetech forums, where programmers and hobbyists go to discuss coding for graphing calculators, among other things. In the forums, Mitchell exhibited models of a handful of iconic New York buildings like the Flatiron, the Chrysler Building, the New York Public Library, and Manhattan's side of the Brooklyn Bridge. Some of the structures are clearly thrown off due to their softer and more nebulous surfaces, like the trees surrounding the library. But at a distance, they are decent replicas of their real-life counterparts.

Enlarge/ A model of the New York Public Library in midtown Manhattan, generated from Mitchell's code.

Christopher Mitchell

While generating a building or a single structure is tough but doable, the challenges of replicating an entire city composed of those building models are gargantuan. The buildings are an expression of a larger system that Mitchell is pulling together that he has termed "SparseWorld." According to a paper Mitchell wrote for the magazine Technophilic, the system combines "orthoimagery, bathyspheric, and elevation data from the USGS EROS service, and 3D buildings from Google’s 3D Warehouse" to create models. In the case of Manhattan, it takes a server cluster with 300 cores and 200GB of RAM a few hours to render Mitchell's current best version of a model of Manhattan.

"I owe some mentorship on how to interact with USGS data to the author of the TopoMC tool," Mitchell told Ars via e-mail. The TopoMC project itself handles models best when scaling them 1:6, but Mitchell rewrote TopoMC to better handle 1:1 scaling for SparseWorld. In Minecraft, one block is considered equal to one cubic meter, so that becomes the limit on how granular the Manhattan model can be.

The entire SparseWorld system is written in Python and uses Google 3D Warehouse data in the Collada format, as well as the PyCollada library, to create voxelized virtual models of available buildings. Initially, the system figured out the rectangular prism that contained the building in question and then filled it in with Minecraft stone. A later version "iterates over the surface of the [building] mesh rather than the volume of the enclosing prism" to reduce complexity.

Where Mitchell runs into completion problems is in the building of models to populate the entire city. Rather than generating the city procedurally or using pre-fab models, he wants to build a full replica of the city, brick by Minecraft brick. "Google's 3D Warehouse has provided many excellent models to test and hone my conversion algorithms against," Mitchell told Ars. "But it only includes models of well-known landmarks, and even then the models are of varying quality."

Where to get replicable data for every building in New York City? "Completion is reliant on getting models for every building on every street, and to my knowledge, only Google has that much information," Mitchell said of Google's Earth and Maps combination of products. Here.com, which used to be Nokia Ovi Maps, and Microsoft's Bing are other possible sources.

Further Reading

The problem is that these companies' data is under license and encrypted, which Mitchell doesn't want to mess with under the table. "I've considered reverse-engineering the encrypted format that Google Earth uses to fetch building models from the server and just keep the building models to myself," Mitchell said, but he still worries about violating the Google Earth license. He's had trouble getting in touch with the right people at Google to discuss accessing the data and using it for the academic pursuit of a Minecraft Manhattan. He has also reached out to the Here.com and Bing teams.

The version of Manhattan that SparseWorld currently generates is 277 square meters of terrain, with 71 billion cubic meters of information compressed into the generated map. But because of the lack of building data, the map mostly lacks in dimension.

Going forward, Mitchell hopes to eventually figure out a way to identify repeated patterns in buildings to help with rendering windows and to use land-cover data from the USGS to place trees in the world (very necessary if Minecraft players are to actually play in Manhattan). He also knows that Python is not sustainable for the system and wants to eventually transition to a compiled language.

Mitchell hopes to complete the map as an expression of how far computing power has come. In the past, video games have relied on either the painstaking work of artists or a procedural algorithm to create a setting. With a functional SparseWorld, games could, theoretically, be set in familiar locations without having to be an artist's rendering limited by the constraints of a single system.

I think that this shows more about how people think outside the box more than it shows how far computing has come. (Computing has come a very far considering people can attempt something like this in a game though)

Minecraft is hideously ugly. This should look awesome. But it doesn't, it's what Manhattan would look like if the universe had only just discovered the third dimension.

Minecraft is not a game you play for the latest in graphics though this kind of thing does require much processing power. It is a game where your imagination is the limit and like the article says it is a work in progress so it will improve.

In the past, video games have relied on either the painstaking work of artists or a procedural algorithm to create a setting. With a functional SparseWorld, games could, theoretically, be set in familiar locations without having to be an artist's rendering limited by the constraints of a single system.

I still have a vision of a racing game or driving sim based on real locations and topography, thanks to Streetview technology and the like. Imagine taking your favorite drive in a Ferrari without any consequences, or perhaps just planning your next vacation by sampling the most scenic routes. Perhaps it's still years away, but I can totally see it happening.

The minecraft modeling is incredible, but I think that this article highlights an issue I always have and is incredibly troubling to me:

No longer are the best datasets provided by USGS or NOAA, they're all private. And since the USGS and other organizations just partner with Microsoft or Google, there is no reason for them to replicate the data. So now we are slowly losing large public datasets. For big organizations it's not huge, but for hackers and tinkerers we are quickly losing access to raw data. (Think experimental research quadcopter navigation using Google's 3D buildings instead or crappy urban GPS reception)

(Also, I know the USGS never did street view or photogrammetry, but the USGS might have better free and public datasets if companies weren't doing it)

Minecraft is hideously ugly. This should look awesome. But it doesn't, it's what Manhattan would look like if the universe had only just discovered the third dimension.

If Minecraft had smaller blocks, the game would be less fun to play.

In my opinion Minecraft strikes a perfect balance between visuals and playability.

I haven't had the chance to play it yet but Everquest Landmark Next (or whatever it's called) looked quite awesome in an introduction I saw, I think on Nerd³. For most games a crafting system like what is given in Rust would be acceptable. And a game which is more similar to Minecraft (minus the redstone) would be 7 days to die, which strikes far better visuals with (IMO) better playability.

The version of Manhattan that SparseWorld currently generates is 277 square meters of terrain, with 71 billion cubic meters of information compressed into the generated map. But because of the lack of building data, the map mostly lacks in dimension.

Something about those numbers doesn't make sense to me. I thought it was generating at 1:1? Or did I miss something in the article? 277 square meters seems... Small. 277 million square meters makes more sense, considering the maximum build height is 256, it would line up with the 71 billion cubic meters figure.

The version of Manhattan that SparseWorld currently generates is 277 square meters of terrain, with 71 billion cubic meters of information compressed into the generated map. But because of the lack of building data, the map mostly lacks in dimension.

Something about those numbers doesn't make sense to me. I thought it was generating at 1:1? Or did I miss something in the article? 277 square meters seems... Small. 277 million square meters makes more sense, considering the maximum build height is 256, it would line up with the 71 billion cubic meters figure.

The minecraft modeling is incredible, but I think that this article highlights an issue I always have and is incredibly troubling to me:

No longer are the best datasets provided by USGS or NOAA, they're all private. And since the USGS and other organizations just partner with Microsoft or Google, there is no reason for them to replicate the data. So now we are slowly losing large public datasets. For big organizations it's not huge, but for hackers and tinkerers we are quickly losing access to raw data. (Think experimental research quadcopter navigation using Google's 3D buildings instead or crappy urban GPS reception)

(Also, I know the USGS never did street view or photogrammetry, but the USGS might have better free and public datasets if companies weren't doing it)

EDIT: Grammar

They might if it fit within their mandate. Something like "street view" might fit better in say a city planning department.

In the past, video games have relied on either the painstaking work of artists or a procedural algorithm to create a setting. With a functional SparseWorld, games could, theoretically, be set in familiar locations without having to be an artist's rendering limited by the constraints of a single system.

I still have a vision of a racing game or driving sim based on real locations and topography, thanks to Streetview technology and the like. Imagine taking your favorite drive in a Ferrari without any consequences, or perhaps just planning your next vacation by sampling the most scenic routes. Perhaps it's still years away, but I can totally see it happening.

The Test Drive Unlimited game does this for Hawaii and Ibiza. It's not very well executed, but still a pretty decent game. I wouldn't advise buying it unless you find a really cheap copy.

It has a road network of over 3,000km, all loosely based on real roads/geography, and 176 real world cars, mostly sports cars but they've got everything from the Citroen 2CV to the Koenigsegg Agera R. And it's massively multiplayer, which is pretty cool for meeting random strangers from all over the world.

The game developer (Eden/Atari) has gone under and EA games is taking over, with an update coming some time this year or next year. Hopefully they'll do a good job. I don't have my hopes up.

I always wanted to build a replica of Chicago in Minecraft, but the height limit was always an issue for me. Even with the new build limit of 256 there'd be a number of building that get cut off. I would imagine that it'd have to be even worse for Manhattan since there are more 800+ft skyscrapers there.

I'm curious if he's staying on older versions and using a mod to increase the world height to 1024 so that the new World Trade Center will be fully rendered. (The building is 546m + it's a few meters above sea level + water level at some height above bedrock)

He also knows that Python is not sustainable for the system and wants to eventually transition to a compiled language.

That makes no sense. This quote makes me feel like the programmer is not super technically competent.

There are projects like the various python llvm compilers alone which make that statement absurd, but more to the point he should look at refactoring his code before he blames the language for RAM and other inefficiencies.

He also knows that Python is not sustainable for the system and wants to eventually transition to a compiled language.

That makes no sense. This quote makes me feel like the programmer is not super technically competent.

There are projects like the various python llvm compilers alone which make that statement absurd, but more to the point he should look at refactoring his code before he blames the language for RAM and other inefficiencies.

It's possible he might want pre-runtime type checking more than the processing speed?

I always wanted to build a replica of Chicago in Minecraft, but the height limit was always an issue for me. Even with the new build limit of 256 there'd be a number of building that get cut off. I would imagine that it'd have to be even worse for Manhattan since there are more 800+ft skyscrapers there.

I'm curious if he's staying on older versions and using a mod to increase the world height to 1024 so that the new World Trade Center will be fully rendered. (The building is 546m + it's a few meters above sea level + water level at some height above bedrock)

To answer your question, I modified the 1.5.x client and server to support chunks of arbitrary height; you can see on the DynMap for the project that there are black areas where structures have gone above Y=256. Unfortunately, I haven't yet found an easily-maintainable way to continue the client and server (ideally as a Bukkit modification) up to the current version. I would love if this article and/or the project helps push Mojang to support higher chunks, as the Anvil format already supports it and the game modifications are minor.

The version of Manhattan that SparseWorld currently generates is 277 square meters of terrain, with 71 billion cubic meters of information compressed into the generated map. But because of the lack of building data, the map mostly lacks in dimension.

Something about those numbers doesn't make sense to me. I thought it was generating at 1:1? Or did I miss something in the article? 277 square meters seems... Small. 277 million square meters makes more sense, considering the maximum build height is 256, it would line up with the 71 billion cubic meters figure.

The Technophilic article linked says 277 million square meters.

Also, the number sounds a little more normal if you just call it 277 square kilometers.

The minecraft modeling is incredible, but I think that this article highlights an issue I always have and is incredibly troubling to me:

No longer are the best datasets provided by USGS or NOAA, they're all private. And since the USGS and other organizations just partner with Microsoft or Google, there is no reason for them to replicate the data. So now we are slowly losing large public datasets. For big organizations it's not huge, but for hackers and tinkerers we are quickly losing access to raw data. (Think experimental research quadcopter navigation using Google's 3D buildings instead or crappy urban GPS reception)

(Also, I know the USGS never did street view or photogrammetry, but the USGS might have better free and public datasets if companies weren't doing it)

EDIT: Grammar

On the other hand, if a government department says "we want to make a 1m resolution volumetric (or surface) scan of major US metropolitan areas", there may be some response from privacy advocates.

On the other hand, if a government department says "we want to make a 1m resolution volumetric (or surface) scan of major US metropolitan areas", there may be some response from privacy advocates.

Not likely, since the government already has 10cm resolution surface scans, and all privacy groups already know and accept this.

Apple's 3D mapping technology, which was originally C3 technologies (a subsidiary of Saab AB — an aerospace/defence company), is a civilian version of technology that was developed for cruise missile targeting and other military use.

Cruise missiles do not use GPS, because satellites can easily be shot down during war time. Instead they require a high resolution map of the entire ground area that they will fly over. They scan the ground as they fly at a few times faster than the speed of sound, using that data to know when they need to adjust altitude to avoid crashing into the side of a mountain while flying close to the ground to avoid radar detection. They also use the same system to know when they've reached the target and should hit the ground.

This technology has been in place for decades and every major military force in the world already has high resolution maps of the whole world.

Here's a video from a few years ago demonstrating it: https://www.youtube.com/watch?v=3apAXzf3JTg The only thing innovative about that technology is that it allows accurate and up to date scans to be created at will, instead of relying on ones that were taken a year or three ago.

> "Completion is reliant on getting models for every building on every street, and to my knowledge, only Google has that much information,"

*opens maps.app on my mac, zooms in on new york* Well that's obviously wrong. Apple maps actually has sub-meter 3D resolution models of the suburb of Stockholm where I live! Google doesn't as far as I know.

So this is why Minecraft is to popular then, its like digital Legos, it seems most thing I read about Minecraft are about building things rather than actually playing the game which I have no idea what the game-play is but I wonder why the toy company did not came up with this idea before.

If you take a look at Minecraft they are basically blocks and let people build everything they want with them. Its truly like Legos for computers. Its rather amazing that this game is pretty much new and nobody came with an idea so simple but fun before.

THe problem I have with this is minecrafts incredibly crap render distance. I believe its only upto 16 chunks, so about 256m.... The top down map looks awesome, but then you see the screen shots where you can't even see 1 block at a time.

So this is why Minecraft is to popular then, its like digital Legos, it seems most thing I read about Minecraft are about building things rather than actually playing the game which I have no idea what the game-play is but I wonder why the toy company did not came up with this idea before.

If you take a look at Minecraft they are basically blocks and let people build everything they want with them. Its truly like Legos for computers. Its rather amazing that this game is pretty much new and nobody came with an idea so simple but fun before.

The only "game-play" in minecraft is walking around to gather building materials, some of which are very hard to find:

"Saddles are a rare item in the game; they are uncraftable, being found only in chests, inside dungeons, abandoned mineshafts, Nether Fortresses, Desert and Jungle Temples, in blacksmith chests found in NPC villages, or by trading with a villager. Also, saddles can be "caught" with a fishing rod."

And the other gameplay is staying alive from monsters that randomly appear at night time. You can avoid this completely by either setting the game mode to "peaceful" or going to sleep in a bed at sunset. Monsters will appear wherever it's dark, so they'll appear during daylight hours if you go down into a poorly lit mineshaft or a cave with no lighting at all.

There really is no gameplay in minecraft except building stuff. It's like lego except cheaper (blocks are pretty much free, except the ones that take time to find), and you can share your creations with friends in another city or even strangers if you're really keen.

On the other hand, if a government department says "we want to make a 1m resolution volumetric (or surface) scan of major US metropolitan areas", there may be some response from privacy advocates.

Not likely, since the government already has 10cm resolution surface scans, and all privacy groups already know and accept this.

Apple's 3D mapping technology, which was originally C3 technologies (a subsidiary of Saab AB — an aerospace/defence company), is a civilian version of technology that was developed for cruise missile targeting and other military use.

Cruise missiles do not use GPS, because satellites can easily be shot down during war time. Instead they require a high resolution map of the entire ground area that they will fly over. They scan the ground as they fly at a few times faster than the speed of sound, using that data to know when they need to adjust altitude to avoid crashing into the side of a mountain while flying close to the ground to avoid radar detection. They also use the same system to know when they've reached the target and should hit the ground.

This technology has been in place for decades and every major military force in the world already has high resolution maps of the whole world.

Here's a video from a few years ago demonstrating it: https://www.youtube.com/watch?v=3apAXzf3JTg The only thing innovative about that technology is that it allows accurate and up to date scans to be created at will, instead of relying on ones that were taken a year or three ago.

Yes, I am aware of TERCOM (and it's roots in the SLAM program). Militaries have no maps of the world at 100mm resolution, as they would be enormous datasets and utterly pointless for guidance (where multi-metre resolution is preferable both for storage size and algorithmic requirements, and to avoid noise from transient changes like cars or buildings).Do you have a source for the claim that even exists a wide-area 100mm-resolution scan of the US? Gathering good coverage 100mm imagery is difficult enough.

Do you have a source for the claim that even exists a wide-area 100mm-resolution scan of the US? Gathering good coverage 100mm imagery is difficult enough.

No, I don't have a source. But it's generally agreed that it exists.

I disagree with your claim that it's hard to gather. It's far easier than imagery, you just fly an airplane over the ground a few times from different directions and send the data back to a server. That's it, job done. 100mm scan created.

The amount of data required to cover the USA is insignificant compared to PRSIM. I'd be surprised if it doesn't exist, but the data is probably out of date in most places. There's no real reason to repeat the scan regularly.

The minecraft modeling is incredible, but I think that this article highlights an issue I always have and is incredibly troubling to me:

No longer are the best datasets provided by USGS or NOAA, they're all private. And since the USGS and other organizations just partner with Microsoft or Google, there is no reason for them to replicate the data. So now we are slowly losing large public datasets. For big organizations it's not huge, but for hackers and tinkerers we are quickly losing access to raw data. (Think experimental research quadcopter navigation using Google's 3D buildings instead or crappy urban GPS reception)

(Also, I know the USGS never did street view or photogrammetry, but the USGS might have better free and public datasets if companies weren't doing it)

EDIT: Grammar

Back when the monopoly AT&T of old still existed, Bell Labs did a lot of research "for the good of all," as part of their monopoly deal with the U.S. government. (Yes, they also got a tax write off for some of that research, I'm sure.)

There's no reason we cannot extract similar agreements from Google, Microsoft, and others, to share their vast datasets at low or no cost for hobbyists and academics, while charging retail prices for commercial use. Especially given that any use of GPS (as well as public roads for maps and street view) is, by definition, subsidized by the American taxpayer, I believe we the people have some leverage here.

The minecraft modeling is incredible, but I think that this article highlights an issue I always have and is incredibly troubling to me:

No longer are the best datasets provided by USGS or NOAA, they're all private. And since the USGS and other organizations just partner with Microsoft or Google, there is no reason for them to replicate the data. So now we are slowly losing large public datasets. For big organizations it's not huge, but for hackers and tinkerers we are quickly losing access to raw data. (Think experimental research quadcopter navigation using Google's 3D buildings instead or crappy urban GPS reception)

(Also, I know the USGS never did street view or photogrammetry, but the USGS might have better free and public datasets if companies weren't doing it)

EDIT: Grammar

Back when the monopoly AT&T of old still existed, Bell Labs did a lot of research "for the good of all," as part of their monopoly deal with the U.S. government. (Yes, they also got a tax write off for some of that research, I'm sure.)

There's no reason we cannot extract similar agreements from Google, Microsoft, and others, to share their vast datasets at low or no cost for hobbyists and academics, while charging retail prices for commercial use. Especially given that any use of GPS (as well as public roads for maps and street view) is, by definition, subsidized by the American taxpayer, I believe we the people have some leverage here.

About as much leverage as any user of technology. Just ask your favorite patent troll.

Especially given that any use of GPS (as well as public roads for maps and street view) is, by definition, subsidized by the American taxpayer, I believe we the people have some leverage here.

GPS is not something that can be "used up". It's just a passive radio receiver.

And if the US suddenly decided to stop funding it and the satellites gradually drifted out of orbit, that would be no problem at all since pretty much all modern GPS units and smartphones are also receiving location data from GLONASS, which is pretty much the same (although slightly less accurate).

I agree that government could, theoretically, require corporations to hand over the data they've spent billions of dollars gathering. But lets be honest, it's never going to happen. No major US political party would vote for any such bill.

A more successful and practical technique to gain access to data is to ask the supreme court to rule on whether or not google maps/google street view is actually eligible for copyright at all. Believe it or not this is a grey area. Copyright does not apply to "facts". A map of the world is not a creative work, it is simply a recording of factual information - and this means it theoretically is not eligible for copyright at all.

This is especially true for something like Google street view or satellite photos. There is a lot of creativity in a map, the colour coding of the lines and decisions about what streets should be visible at various zoom levels, but none of that applies to a satellite photo. A photographer carefully considers the best angle or lighting conditions for each photo, but a satellite or airplane doesn't do any of that creative stuff when it passes over the earth snapping photos constantly.

If you ask a copyright lawyer whether or not this type of data is eligible for copyright, they will probably tell you they don't know the answer, but it would be good business sense to assume they are copyrightable in order to avoid an expensive legal fight.

If you ask the supreme court whether or not satellite photos and automatically generated terrain maps are eligible for copyright there is a good chance they'll say that there is no copyright at all on this data, anyone can scrape it and re-use the data for any purpose.

Ironically, the more accurate the mapping data is, the closer the map will be to true fact instead of rough approximations and interpretations of fact. This means modern highly accurate mapping data really is at risk of being declared non-copyrightable. I personally avoid the whole issue by flagging all my contributions to open street map as public domain (http://wiki.openstreetmap.org/wiki/Public_Domain_Map).

So this is why Minecraft is to popular then, its like digital Legos, it seems most thing I read about Minecraft are about building things rather than actually playing the game which I have no idea what the game-play is but I wonder why the toy company did not came up with this idea before.

If you take a look at Minecraft they are basically blocks and let people build everything they want with them. Its truly like Legos for computers. Its rather amazing that this game is pretty much new and nobody came with an idea so simple but fun before.

The only "game-play" in minecraft is walking around to gather building materials, some of which are very hard to find ...

Good grief, one of the best selling games of all time gets labeled with "game-play" in quotes, as if there isn't any real gameplay? Yes, exploring the world (gameplay), crafting (gameplay), and building structures (gameplay) are major aspects of the game. Those are the main draw and those are excellent examples of gameplay.

But there is more traditional gameplay there as well. Playing on hard (or even normal), the mobs can sometimes be a challenge, especially in places like Nether Fortresses. Minecraft actually has a final boss and the requirements for getting there are not simple.

There's also taming, raising, and breeding several creatures, a farming system, and a system for creating automated machines. And then there's the adventure system. You can download maps created by other players that can include puzzles and quests.

So, yeah, there is a ton of gameplay in Minecraft. The very first version of Minecraft was basically what is now "creative" mode. You could run around and place and remove blocks. That was fun, but that version would never have become a worldwide hit. The actual gameplay elements are what did that.

But there is more traditional gameplay there as well. Playing on hard (or even normal), the mobs can sometimes be a challenge, especially in places like Nether Fortresses. Minecraft actually has a final boss and the requirements for getting there are not simple.

And in thousands of hours of playing Minecraft I've never seen the final boss, except when I cheated to get there once out of curiosity. For me the gameplay just isn't a big part of the game, I just build stuff.

Now if you'll excuse me, I think I'm going to go work on my current minecraft world.