Building Data Visualization Tools

How to Work with Maps

The content of this blog is based on examples/ notes/ experiments related to the material presented in the “Building Data Visualization Tools” module of the “Mastering Software Development in R” Specialization (Coursera) created by Johns Hopkins University [1].

Required Packages

ggplot2, a system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”.

gridExtra, provides a number of user-level functions to work with “grid” graphics.

dplyr, a tool for working with data frame like objects, both in memory and out of memory.

viridis, the viridis color palette.

ggmap, a collection of functions to visualize spatial data and models on top of static maps from various online sources (e.g Google Maps)

Data

The ggplot2 package includes some datasets with geographic information. The ggplot2::map_data() function allows to get map data from the maps package (use ?map_data form more information).

Specifically the italy dataset [2] is used for some of the examples below. Please note that this dataset was prepared aroind 1989 so it is out of date especially information pertaining provinces (see ?maps::italy).

# Get the italy dataset from ggplot2
# Consider only the following provinces "Bergamo" , "Como", "Lecco", "Milano", "Varese"
# and arrange by group and order (ascending order)
italy_mapggplot2::map_data(map="italy")italy_map_subsetitaly_map%>%filter(region%in%c("Bergamo","Como","Lecco","Milano","Varese"))%>%arrange(group,order)

Each observation in the dataframe defines a geographical point with some extra information:

long & lat, longitude and latitude of the geographical point

group, an identifier connected with the specific polygon points are part of

a map can be made of different polygons (e.g. one polygon for the main land and one for each islands, one polygon for each state, …)

order, the order of the point within the specific group

how the all of the points being part of the same group should be connected in order to create the polygon

How to work with maps

Having spatial information in the data gives the opportunity to map the data or, in other words, visualizing the information contained in the data in a geographical context. R has different possibilities to map data, from normal plots using longitude/ latitude as x/ y to more complex spatial data objects (e.g. shapefiles).

Mapping with ggplot2 package

The most basic way to create maps with your data is to use ggplot2, create a ggplot object and then, add a specific geom mapping longitute to x aesthetic and latitude to y aesthetic [4] [5]. This simple approach can be used to:

Author: Pier Lorenzo Paracchini

He is a generalist with a passion for people, data and technology. He has a Master of Science in Electronic Engineering from the Politecnico Di Milano and works as an enthusiast developer with a data scientist twist in the software innovation sector in Statoil. His journey in data science and machine learning started in 2014.

Follow us on:

The geom_path function is used to create such plots. From the R documentation, geom_path“… connects the observation in the order in which they appear in the data”. When plotting using geom_path is important to consider the polygon and the order within the polygon for each point in the map.

The points in the dataset are grouped by region and ordered by order. If information about the region is not provided then the sequential order of the observations will be the order used to connect the points and, for this reason, “unexpected” lines will be drawn when moving from a region to the other. On the other hand if information about the region is provided using the group or color aesthetic, mapping to region, the “unexpected” lines are removed (see example below).

Mapping with ggplot2 is possible to create more sophisticated maps like choropleth maps [3]. The example below, extracted from [1], shows how to visualize the percentage of republican votes in 1976 by states.

# Get the USA/ state map from ggplot2
us_mapggplot2::map_data("state")# Use the 'votes.repub' dataset (maps package), containing the percentage of
# republican votes in the 1900 elections by state. Note
# - the dataset is a matrix so it needs to be converted to a dataframe
# - the row name defines the relevant state
votes.repub%>%tbl_df()%>%mutate(state=rownames(votes.repub),state=tolower(state))%>%right_join(us_map,by=c("state"="region"))%>%ggplot(mapping=aes(x=long,y=lat,group=group,fill=`1976`))+geom_polygon(color="black")+theme_void()+scale_fill_viridis(name="Republican\nVotes (%)")

Maps with ggmap package, Google Maps API and others

“A collection of functions to visualize spatial data and models on top of static maps from various online sources (e.g Google Maps,..). It includes tools common to those tasks, including functions for geolocation and routing.” R Documentation

The package allows to create/ plot maps using Google Maps and few other service providers, and perform some other interesting tasks like geocoding, routing, distance calculation, etc. The maps are actually ggplot objects making possible to reuse the ggplot2 functionality like adding layers, modify the theme, …

“The basic idea driving ggmap is to take a downloaded map image, plot it as a context layer using ggplot2, and then plot additional content layers of data, statistics, or models on top of the map. In ggmap this process is broken into two pieces – (1) downloading the images and formatting them for plotting, done with get_map, and (2) making the plot, done with ggmap. qmap marries these two functions for quick map plotting (c.f. ggplot2’s ggplot), and qmplot attempts to wrap up the entire plotting process into one simple command (c.f. ggplot2’s qplot).” [4]

How to create and plot a map…

The ggmap::get_mapfunction is used to get a base map (a ggmap object, a raster object) from different service providers like Google Maps, OpenStreetMap, Stamen Maps or Naver Maps (default setting is Google Maps). Once the base map is available, then it can been plotted using the ggmap::ggmap function. Alternatively the ggmap::qmap function (quick map plot) can be used.

The zoom argument (default value is auto) in ggmap::get_map function can be used to control the zoom of the returned base map (see ?get_map for more information). Please note that the possible values/ range for the zoom argument changes with the different sources.

# An example using Google Maps as a source
# and different map types
base_map_terget_map(location="Varese",maptype="terrain")base_map_satget_map(location="Varese",maptype="satellite")base_map_roaget_map(location="Varese",maptype="roadmap")grid.arrange(ggmap(base_map_ter)+ggtitle("Terrain"),ggmap(base_map_sat)+ggtitle("Satellite"),ggmap(base_map_roa)+ggtitle("Road"),nrow=1)

How to change the source for maps…

While the default source for maps with ggmap::get_map is Google Maps, it is possible to change the map service using the source argument. The supported map services/ sources are Google Maps, OpenStreeMaps, Stamen Maps and CloudMade Maps (see ?get_map for more information).

# An example using different map services as a source
base_map_googleget_map(location="Varese",source="google",maptype="terrain")base_map_stamenget_map(location="Varese",source="stamen",maptype="terrain")grid.arrange(ggmap(base_map_google)+ggtitle("Google Maps"),ggmap(base_map_stamen)+ggtitle("Stamen Maps"),nrow=1)

How to geocode a location…

The ggmap::geocode function can be used to find latitude and longitude of a location based on its name (see ?geocode for more information). Note that Google Maps API limits the possible number of queries per day, geocodeQueryCheck can be used to determine how many queries are left.

The ggmap::route function can be used to find a route from Google using different possible modes, e.g. walking, driving, … (see ?ggmap::route for more information).

‘The route function provides the map distances for the sequence of “legs” which constitute a route between two locations. Each leg has a beginning and ending longitude/latitude coordinate along with a distance and duration in the same units as reported by mapdist. The collection of legs in sequence constitutes a single route (path) most easily plotted with geom_leg, a new exported ggplot2 geom…’ [4]

Share

You might also like

https://datasciencedojo.com/wp-content/uploads/Building-Data-Visualization-Tools-How-to-work-with-maps.png8001000Arhamhttps://datasciencedojo.com/wp-content/uploads/2016/06/Logo_w300-1.pngArham2017-10-18 17:03:182017-10-30 17:13:09Building Data Visualization Tools: How to work with maps

https://datasciencedojo.com/wp-content/uploads/Intro-R-Visualizations-PowerBI.png8011001DaveLangerhttps://datasciencedojo.com/wp-content/uploads/2016/06/Logo_w300-1.pngDaveLanger2017-04-25 10:44:282017-11-07 11:38:29Introduction to R Visualizations with Power BI