Mapping Walmart Growth Across the US using R

Have you seen the Walmart growth maps video at flowingdata? And, did you wish that you could create similar animated movies using R? Well, you came to the right place. In this post, you will learn how to create point maps at the zip code level and then animate them to make a movie using your favorite libraries in R such as ggplot2 and dplyr. The bonus, of course, is to recreate the Walmart growth maps in R.

Let’s get started then.

Loading Libraries

You can see from the following commands that I owe a tremendous amount of gratitude to Hadley Wickham, for he has made many of the programming tasks easier with his R packages. ggplot2 has to be one of the best plotting packages and with dplyr manipulating and aggregating data has become less tedious and more enjoyable. He also has created packages for string manipulation (stringr) and date manipulation (lubridate). We need all of them.

library(ggplot2)#for plotting, of courselibrary(ggmap)#to get the US maplibrary(dplyr)#for data manipulation like a ninja library(readr)#to read the data in csvlibrary(lubridate)#to play with dateslibrary(scales)#for number labelinglibrary(stringr)#to play with stringslibrary(Cairo)#for anti-aliasing images on Windowslibrary(zipcode)#to clean up and geocode zipcodes

Gathering, Cleaning-up, Summarizing the Data

Let’s load up the zip code data using the following command:

data(zipcode)

Next, use the fantastic ggmap library to get the US map. Here, I’m using the toner-lite type of map from stamen maps, a great resource for fantastic maps.

us_map get_stamenmap(c(left =-125, bottom =25.75, right =-67, top =49), zoom =5, maptype ="toner-lite")#you can change the map type to get something more colorful

A couple of clean up-items: convert the date column to the data format, create opening year and month columns. I also created another date column for looping through and creating maps for each month. You may ask, “Why did you not use just the opening date?” Good question. I am not using the opening date because after creating a map, I am saving the map using the same value for the filename. We are using ffmpeg here to create movie out of images, and ffmpeg likes sequential numbering of files. For a Mac OS, the file name is irrelevant, because we can use the glob argument of ffmpeg.

Creating maps

The fun part: to actually see the data on the US map. Since we have to create a map for each month, we are using a function here for repetition and ease of use. I’ve explained the code using R comments.

A note about color: I played with a few tools such as paletton to find good contrasting colors for the individual dots. After spending crazy amounts of time just trying different combinations, I just picked one of the blue shades from R named colors.

Loop through the data frame for each month and create a map for that month. Warning: these commands will take about 5 minutes and create a lot of maps. Make sure that you created a directory named maps inside the directory of this file and that you have enough space to store all the maps.

wm_op_data_smry %>%
mutate(mapid = group_indices_(wm_op_data_smry, .dots='OpDate'))%>%# created a group id for each group defined using OpDate i.e. a month
group_by(OpDate)%>%# note when you are using user-defined functions in dplyr, you have to use do
do(pl = my_zip_plot(wm_op_data_smry, unique(.$OpDate), unique(.$mapid)))## pass the summary data frame, the date, and map number to the function

Creating the Walmart Growth Movie

This is the fun part in which we put all the images together to make a “motion picture.” For this to work, you will need ffmpeg installed on your computer. On Windows, for ease of use, install it in the C drive.

# prepare the command for execution# the framerate argument controls how many frames per second we want to see. Increase that number for a faster transition between months.# since we used 7 digits as the fixed length of filenames, %7d pattern will match those filenames.
makemovie_cmd paste0("C:/ffmpeg/bin/ffmpeg -framerate 5 -y -i ", paste0(getwd(), "/maps/img_%7d.png"), " -c:v libx264 -pix_fmt yuv420p ", paste0(getwd(), "/maps/"), "movie.mp4")# for mac os, you can use the glob argument and obviate the need for sequential numbering of files.#makemovie_cmd
system(makemovie_cmd)# the system command will execute the ffmpeg command and create the final movie.

Play!

What do you think? I like this approach as it offers maximum flexibility of ggplot, you can adjust the sizes of new points, and you adjust the frame rate. Another advantage: you can relatively change the data source and use it for any other mapping purposes.