"…we are not pans and barrows, nor even porters of the fire and torchbearers, but children of the fire, made of it…" — Ralph Waldo Emerson, The Poet

Menu

Category Archives: Visualization

Recently I was working with a shapefile dividing the state of MA into rectangular grids. I wanted to position the tooltip relative to a rectangluar svg path on a Leaflet map overlay. Usually when I want to provide info on a path it is a data line and I place the tooltip based on the mouse coordinates, as discussed in my previous post about tooltips. In this case, I wanted to place it centered over the data, where the highlighted region below is an SVG “path”, and not a “rect” element:

It turns out the path description is pretty simple for a rectangle. In geoJSON the rectangle geometry element is one shape (geometry.coordinates[0]). The rectangle is encoded with 5 data points, where the first and last are the same, as show in the screen capture below:

The GeoJSON coordinates here are WGS84.

Inspecting the DOM for one of these paths after rendering shows that the SVG path has only the required 4 data points and is closed by the “Z”. Here the values are SVG coordinates.

<path d="M293,160L222,160L222,89L293,89Z"></path>

I wrote the following function to calculate the top center position of the rectangle, for the purpose of positioning the tooltip.

In the end I didn’t like the tooltip moving all over the screen and obstructing the map, so I used a statically positioned data table that updated on mouseover. Even though I didn’t use it, the approach above was so much better than positioning based on the mouse and obstructing the region of interest that I thought I’d share it anyway.

Together with Paul Schimek and Kim Ducharme, I submitted an entry for the 37 Billion Mile Data Challenge. MAPC made available transit data on MA driving from the DOT from 2008 – 2012. They also divided the state into ~15 acre grid cells (250mx250m), for which they provided information on driving, estimated C02 emissions, land use metrics, school accessibility, population from the 2010 census. I created an interactive map to explore this data statewide aggregated at the zip code level and also at the 15 acre grid cell level. Paul ran a multivariate regression model to explore possible explanatory variables independent of other effects. Kim provided excellent styling assistance as always.

The map is amazingly interesting to explore: down to the neighborhood level you can explore population density, average income, jobs, transit access, miles driven per day per person and more. Check out the live version to explore the interactive map.

I decided to document the tooltip approach I’ve used on some recent projects. Basically a combination of the styling approach as seen in the New York Times article: Four Ways to Slice Obama’s 2013 Budget Proposal (this is a wonderful visualization I’ve studied a lot) and David Walsh’s css triangles trick for the call out triangle. It is a common enough need to have a tooltip with some styling and a nice indicator arrow pointing at my data, something like this:

HTML

First, define the div structure for the tooltip and classes to be used, with some text for testing the layout. It helps to have a high level container for positioning and then individual classes to format different fields. In this example I used a couple of spans to have left and right justified text for the data display. I usually just put the tooltip div in the body in html (rather than generate it programmatically):

CSS

Next up, style the tooltip! I usually set the #tooltipContainer to “display: block” when testing so I can see how the tooltip renders with the dummy data. I often use crazy colors to debug as well (it can be helpful to see exactly where the triangle div overlaps the main part of the tooltip for example). Here is the css used to generate the example tooltip above:

Positioning the tooltip with Javascript relative to SVG element

There are two cases I’ve used for positioning the tooltip, either using page coordinates, or svg coordinates. The former, shown above, is best for adding a tooltip on a path (for example a data line), where you want the tooltip to be where the mouse is mousing over the line. The latter approach (svg coordinates), is best when you want to position the tooltip relative to the element you are mousing over, for example, directly above or to the right of an svg:circle.

In the case of an svg circle for example, you can add an argument to pass the element into the update function and calculate the offset coordinates as follows, NOTE in this case the tooltip div is declared inside the SVG (is that crazy, whatever, it works):

That’s the idea anyway. There are so many ways to approach tooltips. I just wanted to share what is working for me at the moment. This approach works well if you don’t want a drop shadow on the tooltip, if you want a border of some sort, then you can use an image instead of the css triangle as was done in the NYT article mentioned above.

MAPC provided a shape file dividing the state on a 250m grid (~15 acre blocks). The file is pretty unwieldy, dividing the state into 355,728 segments, for which only 67,919 have data (less than 20% of the grids have a metric for “mipdaybest” which I used to filter the data). Here is a picture of the state with only the grids with data colored:

To accomplish this rudimentary image I converted the provided shape file to GeoJSON in python using the pyshp library, filtering out records with no data. This code is based on an example by M. Laloux, modified to use an iterator over the records and drop any with no data (here I am using “mipday_phh” as a proxy for no data).

I then opened the file in QGIS (Why did I convert it to GeoJSON you ask? Because I am planning to work with it in d3 next). With QGIS I can’t see the grid color by default because it is overwhelmed by the default black border on each shape. This post discusses how to remove the outline. I ended up using the last option (the old Symbology), but what a pain!

Besides the difficulty of working with such large files (did I mention with all the yak shaving I am doing here I got a bigger hard drive too?). The grid data is also awkward to link to other data sets, which more typically use zip code information. (Update: The companion data set Tabular section includes zip code information for the centroid of each grid cell). Since the grids are regular, many span multiple zip codes/municipalities. Furthermore, addresses were mapped to grid locations based on estimates of street location, so for some rural areas some data was mapped to the wrong grids etc. Regardless of those challenges, the pictures below show clearly that driving patterns vary greatly within a city or zip code.

Here are a few plots looking at the data in QGIS showing Miles Per Day per Household, where the blues are under 35 miles per day, yellow around 75 and orange is over 100 miles per day. It’s a horrible viz, but I can’t bring myself to spend time on the color config in QGIS (way too painful). Note the municipality boundaries shown are drawn from a separate shapefile layer provided by MAPC.

The same image is shown below zoomed in on the Boston area. There are unsurprising large swaths of long distance car commuter communities ringing the city. I included a legend here FWIW, the colors were picked manually from the QGIS color picker (which leaves a bit to be desired), hence the abysmal scheme. The divisions were automatic using the equal bin option. This data is clearly flawed! We can see from the legend that there is at least one grid cell with an average of over 6 thousand miles a day! I screen grabbed this legend from the layers toolbar in QGIS and gimp-shopped it in for your viewing pleasure.

Next up, working on a workflow for rendering nice images and figuring out what data and stories to tell.

MAPC and the Mass DOT recently released data from the Vehicle Census of Massachusetts for use in the 37 Billion Mile Data Challenge. I am just warming up with the data, so I thought I’d take a look at some simple questions. I’d love to look at something of personal interest, like MIT solar cars (back in 1999 we registered our three wheeled flying saucer as an “experimental motorcycle”) or the Solectria Force (since I wrote software for the Force when I was working at Azure Dynamics). Unfortunately, these vehicles are among the thousands in the dataset that have no manufacturer information because they were not part of the commercial database used to decode VINs on the dataset. Although, to be fair, there are over 230 different makes identified. Besides the solar car didn’t even have an odometer.

So how about investigating the Deloreans of Massachusetts? Have you ever seen one on the road? I saw one in Cambridge some years back… I wonder what the story behind it was. Did you know they made gold plated Deloreans? Pretty cool, but according to Wikipedia the remaining ones are all in museums.

There were 20 Deloreans registered in the state of Massachusetts between 2008 and 2011. According to Wikipedia there are approximately 6,500 vehicles still existing. Which means MA accounts for ~0.3% of the surviving Deloreans. Of these, 12 were manufactured in 1981, 7 in 1982 and 1 in 1983.

I posit that most of these vehicles are being happily garaged and protected from the elements. Not unexpectedly, they are all relatively low mileage vehicles for 30+ year old cars.

MAPC provided daily mileage averages based on odometer readings reported at inspection stations. They also did a lot of work cleaning the data and anonymizing it so we can’t track down specific owners of vehicles (at least they tried). Looking at the calculated daily mileage I immediately suspect two outliers:

One data point, where the data is starting from 2007 and there is only one odometer reading for that vehicle of 40350, which seems have been credited entirely to the 403 day period between inspections, resulting in an outrageous 102.69 mile per day average.

In another case it looks like the ten-thousandths place was mis-entered by the inspection station (can’t really blame them, they are inspecting a Delorean afterall). So an extra 30,000 miles is being credited for a 394 day period (which accounts for 76.14 of the average daily miles reported). Correcting for this error brings the miles down to a reasonable 0.89 miles/day.

It was interesting when looking at the average daily miles to note that of the 64 records for these 20 vehicles, 25 have unavailable odometer data and therefore report a daily mileage of zero. Another two had the obvious errors discussed above. Who knows what the fidelity of the rest are or how this reflects the total data set. Clearly the data isn’t perfect, but the image below seems to indicate there are a few people regularly driving their Deloreans.

After discarding data as discussed above, I plotted the average daily mileage and the days between inspections used to calculate the average for the remaining records (37 of them). I colored data points for a few vehicles that seemed to show consistent behavior. You can see a common inspection interval around 400 days (Delorean drivers push their inspections to the end of the month+ just like the rest of us). The consistency between years for many of the “higher” average mileage vehicles indicates those numbers are real and there are probably a half dozen or more Deloreans doing a daily commute in Massachusetts!

A few vehicle’s data is colored to show consistent average daily mileage and regular inspections for those vehicles. It looks like a few Deloreans in the state were used pretty regularly from 2008 – 2011.

A few closing thoughts on my first dive into the data. Ugh, I forgot how painful matplotlib is to make anything pretty, so I didn’t bother here! I tried out IPython Notebook a little bit, but I don’t think I can handle editing code in the browser. Pandas? Not ready to pass judgement, but I didn’t have the patience to figure out if it could do what I wanted today… I did use it to generate the following stats on the average miles per day data above:

More than half of the data points are doing less than 2 miles per day of driving, but hey, my car sits about that much too.

I am doing some work now with Mark Schindler of GroupVisual.io. He recently presented at a DataViz meetup about his ideas and motivation for a more intuitive treemap variant. I am going to give a shot at creating it. Here is an example of the color-prioritized treemap concept:

Essentially, this layout abandons the traditional category groupings in a treemap in favor of a more pleasing organization based on the same metric used for color. The other challenges will be creating a pleasing organic shape approximating the hand designed one above. The target is to do this in javascript/d3 for use in web apps.

My first pass approach is to sort the elements based on the color metric, then toss them in bins (columns) of approximately equal area and render it using a stacked bar chart concept using d3:

It’s a start. Next step, make the columns a bit more equal and fill them on a diagonal, to get a better controlled gradient. In the following screen shot the working area was divided into a grid and filled from the bottom left to the top right on the diagonal. For example, a 3×3 grid is filled in the following order:

4 7 9
2 5 8
1 3 6

Still has the problem that the first column will be full and the last one most likely not, as each preceding column is slightly overfilled.

The following shows a first step at fixing the balance issue, simply centering each column vertically along a slight diagonal for a bit more pleasing shape:

That’s it for baby steps, next up: time to work on squaring up these little sliver rectangles and become more “tree” like.

I had an interesting conversation recently where I realized that I haven’t actually used the D3 data exit for any of the projects I’ve done with D3 yet. I guess I got away with that so far because either:

adding data only, for example to chart user submissions for the Jaybridge Challenge (however, this data was only added on a refresh of the page, so effectively never modified from a D3 standpoint).

I realized that although I have read about enter() and exit() and heard many people lament that they don’t really understand d3, I hadn’t personally investigated very deeply. So I did a little more study…

A good starting point with enter()/exit() seems to be the three little circles demo. Which is a nice illustration of what is happening, but inspecting the code that makes these animations is useless (because it’s not actually using enter and exit directly to demonstrate how they work, it’s not that simple). Also this demo waits till the very end to introduce the compare function associated with the data, which belittles the importance of the compare function for anything more than the simplest enter()/exit(). This is consistent with all the other demos and code I have written, where no compare function is specified, because lots can be done without ever using exit.

So the question that was posed to me was essentially: “If you enter the data [1 2 3] and then later make the same selection and enter the data [2 3 4], what happens?” Well, my initial answer was that no new elements will be created and the data will be overwritten as [2 3 4], which is absolutely correct. In order for D3 to do something smarter, like exit the now obsolete “1” element and enter a new element with data “4”, you must specify a compare function with the data, as with the “3 little circles” demo, an easy example is to use the built-in “String” function for your comparison, which works well in this trivial example:

In order to remove the “exiting” elements you must call exit().remove() as shown here:

circles.exit().remove();

I looked at the D3 code that runs on enter and exit and frankly it seems like a lot of code and hardly seems worthwhile. In a simple example like this one why you would want to bother to make the comparisons? I guess to animate transitions on exit and enter, or if there was a lot of data and configuration of the elements was costly, but from my testing it seems that all attribute configuration is re-run on all the elements anyway, so I don’t see the practical advantage there yet.

I created a simple page to play with Enter and Exit, it allows you to change the data in the array, toggle the use of a “String” compare and also select if you want to remove the “exiting” elements. It is useful if you want to see what is going on to leave the exiting elements around. It was amusing to me to play with which elements are “exited” and which ones are grabbed again if more nodes are added, particularly the order of things in the DOM which varies if you have a compare function or not. Overall I don’t think there is much utility to this, as it is effectively broken code to see what d3 is doing.

Continuing to explore the GAPRI data on access to pain medication around the world, I decided to try a choropleth map (with country colors now corresponding to morphine/death). Check out the working interactive choropleth map.

I got to use a few new tools in getting this to work:

In order to use the d3.geo.path which converts GeoJSON to SVG for display in a web page, I needed first to covert my shapefile with the GAPRI data into a GeoJSON format. For future use I will probably explore ogr2ogr to do this transformation. But for my initial test I used a web based MyGeodata Converter, which worked like a charm.

d3 Winkel Tripel Projection, this projection is in the geo.projection d3-plugin. I had some trouble getting it to work, I think because parts of this are in transition with d3.v3. I ended up using versions based on this example.

d3 HsL color interpolation: Pretty simple. You first need to scale your data from 0-1, then pass that into the color interpolator. I found the docs a little confusing on this point, so here is an example:

Tooltips: I used svg:title for the tooltips. Super simple, just append(“svg:title”) to each path and set the .text() to what you want to display. I spent more time handling the special case of “No data”.

d3 Zoom Behavior: The zoom behavior leaves something to be desired, but it was simple enough to get the d3.behavior.zoom() to work based on this example, the important code is svg.call() and the redraw() function.

Todo:

Connect the table legend and the map interactively.

Use a better zoom functionality that is more discoverable (zoom buttons and grab icon for panning).