data news + journalism + diy hackery

Our new Empire Blog reporter, Colby Hamilton, dropped by my desk the other day wondering whether we could map contributions to presumptive NYC mayoral candidates by zip code.

He was going to post about it after lunch. I said I'd be ready with a map.

Thanks to Fusion Tables and a little Ruby magic, I had one ready when his story was done shortly after lunch, and we updated it into the evening as the candidates' filings were made available by the NYC Campaign Finance Board.

How I Did It

For anyone looking to do something similar, here's what I did:

-- I downloaded each candidate's donation as an Excel spreadsheet from the homepage of the Campaign Finance Board.

-- I uploaded the spreadsheet to Google Fusion Tables (if it's an Excel file more than 1MB, you have to save it as comma-separated-values, or .CSV, before uploading).

-- I used Fusion Tables' fantastic aggregattion function -- View -> Aggregate -- to sum the contributions by zip code. Then I exported that file as a .CSV, which gave me a file for each candidate with the columns: ZIP, SUM and COUNT -- SUM being the total donations and COUNT the number of donations for the zip code.

-- I re-uploaded that aggregation export back to Fusion Tables. (If anyone knows how to save an aggregation in FT without exporting it and uploading it again, I'm all ears.)

-- Now that I had the contributions by zip code, I need the zip code shapes to go with them. The US Census has zip code shapefiles by state FIPS code, and for the entire United States. (Quick note: Census zip code data and US Postal Service zip codes aren't exactly the same, though we felt comfortable using the Census version for this project.)

-- I uploaded the New York state zip code shapefile to Fusion Tables, too, using Shpescape. (If you're working with New York State, you can save some work and just use mine.)

-- I opened the ZIP-SUM-COUNT file in Fusion Tables and merged it with the zip code shapefile, linking them with the ZIP field in the first file and ZTCA5CE10 in the second file.

-- Using Visualize -> Map, I could see all of the relevant zip codes for that candidate. By using the Configure Styles link, and then tinkering with Fill Color -> Buckets settings, I shaded the map according to total donations.

This map is ready to be embeded!

The Trouble with Tables

An admission, though: I didn't use the Fusion Tables embeddable map for this story. I did share the FT map with Colby, which let us know we had a couple of good stories. FT is great and fast for that. It also works in production with smaller data sets.

But the long time it takes Fusion Tables to populate a map with large data sets can make for a frustrating user experience. That's compounded by the fact there's no way (yet) to "listen" for a sign that the layer has fully loaded, which would let me display a "Please wait ..." sign until it did.

So in this case, I built my own KML, or Keyhole Markup Language, file (5 of them, actually; one for each candidate). I then compressed those files in to much smaller KMZ files, which are just "zipped" KML files, so they load quickly. I then used those files as layers with Google Maps' KmlLayer() constructor. I also used a "listener" to find out when the layer is fully loaded, and display an alert to the user until it is.

More to Come

As for how I built the KML file, I'm going to share that in another post once I clean up the Ruby code I used to automate the process. (If your project can't wait for that post, drop me a note and I'll try to help.)

But the basics are these:

1. The layout of a KML file, and the format for using different styles to color different shapes, is pretty straightforward and nicely documented. In my code, I changed the style name for a given shape based on the value of the "SUM" variable.

2. The hardest part of writing a KML file is defining each shape in the proper format. But the merged file I made linking the ZIP-SUM-COUNT data and the shapefile actually has that information! The "geometry" column of that table is straight KML! (Thank you, Shpescape.) Export that merged file as a .CSV, and you've got all of the building blocks for your map.

If you have ideas, improvements or questions about this post, please don't hesitate to drop a note in the comments or drop me a note via email.

John Keefe

Adventures in journalism-doing, mapmaking, code mangling and hardware hacking.
Much of what I do is for WNYC's Data News Team, and I'm one of the DIY co-conspirators on Team Blinky. Opinions here are mine alone.