"…we are not pans and barrows, nor even porters of the fire and torchbearers, but children of the fire, made of it…" — Ralph Waldo Emerson, The Poet

Menu

Monthly Archives: April 2014

Recently I was working with a shapefile dividing the state of MA into rectangular grids. I wanted to position the tooltip relative to a rectangluar svg path on a Leaflet map overlay. Usually when I want to provide info on a path it is a data line and I place the tooltip based on the mouse coordinates, as discussed in my previous post about tooltips. In this case, I wanted to place it centered over the data, where the highlighted region below is an SVG “path”, and not a “rect” element:

It turns out the path description is pretty simple for a rectangle. In geoJSON the rectangle geometry element is one shape (geometry.coordinates[0]). The rectangle is encoded with 5 data points, where the first and last are the same, as show in the screen capture below:

The GeoJSON coordinates here are WGS84.

Inspecting the DOM for one of these paths after rendering shows that the SVG path has only the required 4 data points and is closed by the “Z”. Here the values are SVG coordinates.

<path d="M293,160L222,160L222,89L293,89Z"></path>

I wrote the following function to calculate the top center position of the rectangle, for the purpose of positioning the tooltip.

In the end I didn’t like the tooltip moving all over the screen and obstructing the map, so I used a statically positioned data table that updated on mouseover. Even though I didn’t use it, the approach above was so much better than positioning based on the mouse and obstructing the region of interest that I thought I’d share it anyway.

Together with Paul Schimek and Kim Ducharme, I submitted an entry for the 37 Billion Mile Data Challenge. MAPC made available transit data on MA driving from the DOT from 2008 – 2012. They also divided the state into ~15 acre grid cells (250mx250m), for which they provided information on driving, estimated C02 emissions, land use metrics, school accessibility, population from the 2010 census. I created an interactive map to explore this data statewide aggregated at the zip code level and also at the 15 acre grid cell level. Paul ran a multivariate regression model to explore possible explanatory variables independent of other effects. Kim provided excellent styling assistance as always.

The map is amazingly interesting to explore: down to the neighborhood level you can explore population density, average income, jobs, transit access, miles driven per day per person and more. Check out the live version to explore the interactive map.

I ran into an interesting problem updating a legend rendered with d3 today. I am displaying different data sets on a map and adjusting the colors and cutoff values based on what variable is selected. The image below shows two example states of the legend, the third is the unexpected result when I transition from the first to the second and the “200-300″ case fails to update.

In the legend update function I selected the appropriate legend string list from “legendStrings” based on the index of the data being displayed and attempted to create an appropriate compare function as discussed previously. However, this failed as shown in the picture above:

My not so clever compare function included the index of the data, “ind”, so I thought it would enter new elements whenever the index was changed. However, this doesn’t work, because the compare function used is not bound to the data. So when update is run, the new compare function is used to compare the old data with the new data and the old compare function doesn’t exist anymore… The solution is to only use the attributes of your data (stored in d, and i) in the compare function, so in this case refactor the data representation so that it includes information about which variable is being stored. Or, you can always delete the elements and re-render them:

legend.selectAll("rect").remove();
/* then render new elements as above */

Which seems about the same to me in this case and was certainly a lot simpler. I guess I am still waiting to see the use case where I get excited about the d3 enter and exit functionality…

I decided to document the tooltip approach I’ve used on some recent projects. Basically a combination of the styling approach as seen in the New York Times article: Four Ways to Slice Obama’s 2013 Budget Proposal (this is a wonderful visualization I’ve studied a lot) and David Walsh’s css triangles trick for the call out triangle. It is a common enough need to have a tooltip with some styling and a nice indicator arrow pointing at my data, something like this:

HTML

First, define the div structure for the tooltip and classes to be used, with some text for testing the layout. It helps to have a high level container for positioning and then individual classes to format different fields. In this example I used a couple of spans to have left and right justified text for the data display. I usually just put the tooltip div in the body in html (rather than generate it programmatically):

CSS

Next up, style the tooltip! I usually set the #tooltipContainer to “display: block” when testing so I can see how the tooltip renders with the dummy data. I often use crazy colors to debug as well (it can be helpful to see exactly where the triangle div overlaps the main part of the tooltip for example). Here is the css used to generate the example tooltip above:

Positioning the tooltip with Javascript relative to SVG element

There are two cases I’ve used for positioning the tooltip, either using page coordinates, or svg coordinates. The former, shown above, is best for adding a tooltip on a path (for example a data line), where you want the tooltip to be where the mouse is mousing over the line. The latter approach (svg coordinates), is best when you want to position the tooltip relative to the element you are mousing over, for example, directly above or to the right of an svg:circle.

In the case of an svg circle for example, you can add an argument to pass the element into the update function and calculate the offset coordinates as follows, NOTE in this case the tooltip div is declared inside the SVG (is that crazy, whatever, it works):

That’s the idea anyway. There are so many ways to approach tooltips. I just wanted to share what is working for me at the moment. This approach works well if you don’t want a drop shadow on the tooltip, if you want a border of some sort, then you can use an image instead of the css triangle as was done in the NYT article mentioned above.

I am looking into using d3 in conjunction with Leaflet and worked through Mike Bostock’s tutorial on d3 and leaflet. Using the MAPC provided shapefile for municipalities, so far it seems really slow on zoom, I think because of the super high resolution shapes being recalculated. Check out the shape of Boston!

Can I just say that Boston is genuinely a weird shaped town? I particularly love these oddities:

I was having some trouble visualizing the geoJSON files I generated in the previous post in d3. Does this look like Somerville in blue to you? No?! Well obviously the pink is Cambridge though…

Turns out the problem was the wrong projection. The MAPC data (or most of it) is using the NAD83(HARN)/Masssachusetts Mainland EPSG:2805 (QGIS reports this), also known as, “NAD_1983_StatePlane_Massachusetts_Mainland_FIPS_2001″ from the .prj in the shape file.

Apparently, pyshp doesn’t handle reading the .prj files very well, it kinda ignores them on purpose (https://code.google.com/p/pyshp/issues/detail?id=3). So using pyshp to convert the shapfiles to GeoJSON probably wasn’t a good choice (no projection info will be transferred to the GeoJSON).

GeoJSON allows for specifying a CRS, but assumes use of the default of WGS84 if none is specified according to the spec. I tried to explicitly set the CRS in the geoJSON, but that doesn’t seem to be working, it looks like the d3 code is ignoring the “crs” property and assuming WGS84 as well.

I then tried saving the layer as GeoJSON with QGIS (version 1.7.5), despite the very nice save dialog and carefully selecting the desired projection, it doesn’t work at all:

The GeoJSON was not reprojected to the desired coordinate system (pretty similar to this reported QGIS issue), and

However, saving the layer as an ESRI shapefile you can change the projection using QGIS.

I was able to use ogr2ogr to convert the GeoJSON generated by pyshp with no CRS specified to WGS84 (“EPSG:4326″), after which it renders correctly in d3. (Note when I tried different target projections there was still no crs property in the GeoJSON! It looks like support for different crs in GeoJSON is pretty spotty). Here is the ogr2ogr command I used for the simple projection change:

At this point really what is the point of using any tool other than ogr2ogr? For example, the following command converts the Massachusetts municipalities shape file to GeoJSON, reprojects it correctly, and filters out all the cities except cambridge and somerville.

MAPC provided a shape file dividing the state on a 250m grid (~15 acre blocks). The file is pretty unwieldy, dividing the state into 355,728 segments, for which only 67,919 have data (less than 20% of the grids have a metric for “mipdaybest” which I used to filter the data). Here is a picture of the state with only the grids with data colored:

To accomplish this rudimentary image I converted the provided shape file to GeoJSON in python using the pyshp library, filtering out records with no data. This code is based on an example by M. Laloux, modified to use an iterator over the records and drop any with no data (here I am using “mipday_phh” as a proxy for no data).

I then opened the file in QGIS (Why did I convert it to GeoJSON you ask? Because I am planning to work with it in d3 next). With QGIS I can’t see the grid color by default because it is overwhelmed by the default black border on each shape. This post discusses how to remove the outline. I ended up using the last option (the old Symbology), but what a pain!

Besides the difficulty of working with such large files (did I mention with all the yak shaving I am doing here I got a bigger hard drive too?). The grid data is also awkward to link to other data sets, which more typically use zip code information. (Update: The companion data set Tabular section includes zip code information for the centroid of each grid cell). Since the grids are regular, many span multiple zip codes/municipalities. Furthermore, addresses were mapped to grid locations based on estimates of street location, so for some rural areas some data was mapped to the wrong grids etc. Regardless of those challenges, the pictures below show clearly that driving patterns vary greatly within a city or zip code.

Here are a few plots looking at the data in QGIS showing Miles Per Day per Household, where the blues are under 35 miles per day, yellow around 75 and orange is over 100 miles per day. It’s a horrible viz, but I can’t bring myself to spend time on the color config in QGIS (way too painful). Note the municipality boundaries shown are drawn from a separate shapefile layer provided by MAPC.

The same image is shown below zoomed in on the Boston area. There are unsurprising large swaths of long distance car commuter communities ringing the city. I included a legend here FWIW, the colors were picked manually from the QGIS color picker (which leaves a bit to be desired), hence the abysmal scheme. The divisions were automatic using the equal bin option. This data is clearly flawed! We can see from the legend that there is at least one grid cell with an average of over 6 thousand miles a day! I screen grabbed this legend from the layers toolbar in QGIS and gimp-shopped it in for your viewing pleasure.

Next up, working on a workflow for rendering nice images and figuring out what data and stories to tell.