From Cholera to the Cloud: The Emergence of Big Data and What it Means for Mapping

Simon Hochberg, operations, Mapsense

The story of John Snow and the first geographic information system is famous in the spatial analysis community. As cholera spread rampantly throughout nineteenth-century London, Snow, a physician, plotted the locations of cholera deaths on a map of the city’s streets, buildings, and water pumps. From this spatial diagram, he observed that the people affected by the deadly disease all gathered water from a particular pump on Broad Street. Snow only plotted around 120 points, but those 120 points were enough to slow the outbreak, found epidemiology, and create the first geographic information system. Considering that there are 3,143 counties and 74,134 tracts counted in the 2010 United States Census, those 120 points, momentous as they were, would be a tiny dataset today.

The McKinsey Global Institute called big data the “next frontier for innovation, competition, and productivity,” and the truth is, big data keeps getting bigger and bigger. With the proliferation of smartphones and other geo-transmitting devices, massive geo-tagged datasets are constantly being created and expanded. By the midpoint of 2013, more location data had been created in the two previous years than in all of the rest of human history. With this unprecedented influx of data comes exciting new possibilities for analyzing and understanding our world. Geographic information systems – which sit right at the intersection of data and cartography – are in a prime position for making sense of this new location data.

However, traditional GIS infrastructures are unable to handle massive (and continuously growing) amounts of data. With millions of rows of data streaming in every hour from people’s smart phones and vehicles, many of the datasets generated today are too big to even fit on an analyst’s desktop computer. These datasets are orders of magnitude larger than traditional GIS tools can handle and map easily. And big location data is gathered on more than just humans. Researchers place GPS trackers on migratory animals to better understand their travel patterns, and the latitude, longitude, genus, and species of every government-maintained tree in San Francisco is recorded and stored. Fortunately, both new and old companies are exploring new solutions to this challenge. Industry stalwart ESRI is exploring new solutions for geoprocessing big data by connecting ArcGIS with Apache Hadoop. San Francisco startup Mapsense is building innovative and powerful cloud-based tools that can stream and visualize billions of records in real-time to create dynamic vector tile maps for on-the-fly spatial analysis. These solutions will make it easier for citizens, businesses, governments, and researchers to understand their dynamic worlds and access the value in their large amounts of location data.

Workflows are changing too. GIS analysts producing static maps and PDFs are giving way to developers and data scientists who correlate billions of latitude and longitude coordinate pairs while building a location-based service to answer questions about location in real time. The technology used to visualize and process this data has to modernize to be able to scale with the data. When mapped, geo-tagged datasets millions of rows long are put into context and become instantly understandable. These visualizations let analysts easily recognize anything from where and when certain types of crimes occur in San Francisco, to where exactly in a wildlife preserve the endangered California Condors like to roost. Marketers can filter social media posts by location to better understand whom they are trying to reach, where they are, and what is most important to these people. City governments can see where certain types of city services requests are coming from to better ensure that their citizens are responded to quickly and fairly. The existence of big location data in nearly every field and sector underscores just how vital it is that spatial analysis tools evolve to serve modern data demands.

The creation, collection, and visualization of big location data offers tremendous potential for better preserving nature, understanding consumers, and making sense of the world, the people, and phenomena that share this planet. As data grows and modernizes, the tools used to map and display it should grow and modernize as well. GIS has come a long way since it was first used to plot water pumps in London; with big data come even bigger questions, and as the tools emerge, so too will the answers.