Posts Tagged ‘visualization’

[Editor’s note: Chrys Wu takes care of Web 2.0 type aggregation and promotion at The Washington Post. She has a blog and recent posts have focused on Infographics. I highlight several below. Nutgraph: Don’t think about one platform first. Think about all platforms available simultaneously. Also: “Infographics is not art, it is a conveyance of information.”]

As promised in a previous post on learning information graphics (sometimes shortened to “infographics”), I’m posting my raw notes from Day 1 of an information graphics workshop taught last month by Alberto Cairo and Xaquin G.V., two leading practitioners.

Even though subways are a fuel-efficent way to move people around congested urban areas, Americans make poor use of them, probably because they are poorly funded and often don’t travel where we want to go. Right now, of the five most-used subway systems in the country, only New York City’s attracts as many riders as the five largest foreign subway systems.

[Editor’s note: Continuing my theme of traffic flow visualization (1 | 2), here’s a video by FlightSuite, NHAW, Technorama, and NASA showing animated world flight patterns in a 24 hour period as colored yellow dots traveling from city to city. I’d tell you more but I can’t dig up any other information about this visualization. Tufte has a neat section on this topic. Thanks Seba!]

[Editor’s note: Looking at the previous post on visualizing subway level of service, I was reminded of one of David Alpert’ great posts on the same topic in March 2008 but on the frequency and reliability of bus routes. David based his graphic on observed data but then extrapolated it a bit and graphed the resulting histogram. Also see this crazy art project showing 3D data sculpture of the Sunday Minneapolis / St. Paul public transit system, where the horizontal axes represent directional movement and the vertical represents time.]

The L2 bus travels along Connecticut Avenue from Friendship Heights, detours through Adams Morgan, down 18th and New Hampshire through Dupont, and then along K Street to McPherson Square. It also runs right past my window. I started keeping track of its actual times and compared them to the schedule. (Click image for larger version).

This chart shows how much time you are likely to wait at 18th and S based on when you show up. The darkest area is 10% of the buses: for example, at exactly 8:00, 10% of the time a bus will come within 3 minutes, but 90% of the time it will take longer than 3 minutes. The lightest area is 100% of the buses (that I’ve observed); at 8:00, 100% of the time a bus will come within 8 minutes (not bad).

The red dotted line represents the schedule. The WMATA trip planner reports that this bus should arrive at 7:48, 8:04, 8:23, 8:36, 8:48, 9:01, and 9:16. If all buses showed up exactly on time, the entire chart would coincide with the red line.

You can see that many of the triangular areas deviate to the left of the red line. That means that the bus often shows up early. If you get to the stop at 8:46, two minutes before the scheduled 8:48 arrival, 30% of the time the bus will show up within four minutes, but 70% of the time it will take 12 minutes or more because 70% of the time this bus shows up before 8:46. And it’s been as early as 8:41 (that’s where the tall light blue spike appears), which means to be safe and avoid risking a 23-minute wait for a 9:01 bus that may show up at 9:04, you have to arrive to the stop seven minutes ahead of time.

The tighter the triangle, the more accurate the bus’s appearances. As you can see, the 8:04 is pretty good, only deviating to the left (early) occasionally and then not very far early. At the same time, it’s not late much; the big dark triangle means that the bus isn’t usually more than a couple minutes late either. On the other hand, its light colored spike is very high, meaning that occasionally even if you show up a minute early you might be stuck waiting 28 or 32 minutes if the 8:23 is late.

The 8:23 and 8:36 appearances aren’t very consistent, leading to the lack of visible shape in those areas. Those buses are often early and often late, and several times have shown up within one minute of each other.

You can see all my data on this Google Spreadsheet. The first tab is my direct observations; the second tab is the calculated data that generated the chart.

In conclusion, the 8:04 is fairly reliable, while the later buses are not so much. WMATA is working on offering real-time bus info which would help since someone could see how much time actually remained until the next bus, and see this before leaving home. The other big recommendation I see from this data is for the drivers to try harder to avoid being early. They should wait at certain key stops until the correct departure time. That way, commuters could at least know for certain that if they showed up a minute or two before the bus’s scheduled arrival, they wouldn’t be left waiting at the stop for 20 minutes.

[Editor’s note: Washington DC has the second busiest subway system in the US after New York city. This series of visual diagrams show the network’s topology and how to optimize routing to achieve a better level of service (quicker train frequency). Note how the time scale has been reduced to modular 12, easy to understand as a train every 2 to 6 minutes based on the number of colored lines thru each subway station.

DC’s metro rail subway is convenient and affordable. But with more people using transit every year, the system is beginning to show signs of strain (heck, it’s over 30 years old now!). The federal gov’t just gave one of the last nods to construct a new line (dubbed the “Silver”) connecting downtown DC to Dulles International Airport and farther out into the exurbs. But this does little to alleviate crowding on the original 5 rail lines .

How to squeeze the most capacity out of existing tunnels and switches? These excellent maps from Track 29 chart the current system and show how it might be tweeked to optimize the flow of passengers from point A to point B primarily on the Orange line, the most overcrowded, where a switching problem reduces train frequency thru the downtown central business district (CBD).]

The first diagram represents WMATA’s current service pattern during rush hours. Colors represent each of the subway routes. More lines along a colored route represent better (more frequent) service. [Ed: not all stations shown, based on "rush" peak service.]

Based on the 135 second headway, WMATA can run 5 trains through a given segment of track every 12 minutes. Each the diagrams below represents a 12 minute interval during rush hour. Each of the lines on the diagram represents a train in each direction. Therefore, a trackway with two lines (like between Stadium Armory and Largo) represents a headway of 6 minutes–12/2. In other words, you’ll be waiting for a train for up to 6 minues. While on the Red line it would only be 2.5 minutes. [Ed: map seems to undercount Green line service.]

There are several choke points in the system, including at the Roslyn tunnel where the Orange and Blue lines converge and travel under the Potomac River into the District of Columbia.

The chief limitation for the Orange Line, as you can see here is the 4 minute headway on the Vienna-Rosslyn segment. Adding one train would reduce headways to 3 minutes and would add a capacity of 1000-1400 passengers for every 12 minute period. Any additional capacity is sorely needed, but the segment of track between Rosslyn and Stadium Armory is essentially at capacity.

Hence the so-called “Blue Line Split.” Here’s what WMATA is proposing: [Ed: WMATA runs the Metrorail subway in DC.]

This results in better service on the Orange line, and equivelent service on the Blue, except for Arlington Cemetary station primarily used by tourists (there are no homes or offices at that station). Many Blue line riders actually need to transfer at Metro Center or L’Enfant stations or get to eastern downtown faster, so this may actually be a boon for them, too.

But, while that squeezes extra service, the naming convention of the lines becomes confused. Some propose renaming / rerouting the Blue and Yellow lines like so (below). This map reflects this and planned Silver line service.

Greater Greater DC has a full discussion of adding even more commuter rail service to the nation’s capitol.

In early 2007, when the Los Angeles Times launched its Homicide Report blog — an effort to chronicle every homicide in Los Angeles County — it was clear that there were important geographic and demographic dimensions to the information that a blog format wouldn’t fully capture. What we needed was a ChicagoCrime.org-style map that would let users focus on areas of interest to them, with filters that would enable them to “play” with the data and explore trends and patterns for themselves. Problem was, the web staff (of which I was a part) lacked the tools and the expertise to build such a thing, so the blog launched without a map. (Sound familar?)

It took several months to secure the tech resources and a couple more months to create wireframes and spec out requirements for what would become the Homicide Map, with the help of a couple of talented developers and a project manager on part-time loan from the website’s IT department. We were fortunate, of course: We actually had access to this kind of expertise, and since then we’ve hired a couple of dedicated editorial developers. I’m aware that others might not have it so good.

Last week, Robert Niles argued that news organizations should be in the business of creating “killer apps”. Put another way, there is a need to develop tools that hew to the content rather than the other way around. But creating the functionality Robert describes takes a closer connection between news thinking and tech thinking than is possible within news organizations’ traditional structures and skill sets.

In this post, I’ll try to squeeze some wisdom out of the lessons we learned in the process of assembling the Times’ Data Desk, a cross-functional team of journalists responsible for collecting, analyzing and presenting data online and in print. (Note: I left the Times earlier this month to work on some independent projects. I am writing this piece with the blessing of my former bosses there.)

Here, then, are 10 pieces of advice for those of you building or looking to build a data team in your newsroom:

Find the believers: You’ll likely discover enthusiasts and experts in places you didn’t expect. In our case, teaming up with the Times’ computer-assisted reporting staff, led by Doug Smith, was a no-brainer. Doug was publishing data to the web before the website had anybody devoted to interactive projects. But besides Doug’s group, we found eager partners on the paper’s graphics staff, where, for example, GIS expert Tom Lauder had already been playing with Flash and web-based mapping tools for a while. A number of reporters were collecting data for their stories and wondering what else could be done with it. We also found people on the tech side with a good news sense who intuitively understood what we were trying to do.

Get buy-in from above: For small projects, you might be able to collaborate informally with your fellow believers, but for big initiatives, you need the commitment of top editors who control the newsroom departments whose resources you’ll draw on. At the Times, a series of meetings among senior editors to chart a strategic vision for the paper gave us an opportunity to float the data desk idea. This led to plans to devote some reporting resources to gathering data and to move members of the data team into a shared space near the editorial library (see #8).

Set some priorities: Your group may come from a variety of departments, but if their priorities are in alignment, disparate reporting structures might not be such a big issue. We engaged in “priority alignment” by inviting stakeholders from all the relevant departments (and their bosses) to a series of meetings with the goal of drafting a data strategy memo and setting some project priorities. (We arrived at these projects democratically by taping a big list on the wall and letting people vote by checkmark; ideas with the most checks made the cut.) Priorities will change, of course, but having some concrete goals to guide you will help.

Go off the reservation: No matter how good your IT department is, their priorities are unlikely to be in sync with yours. They’re thinking big-picture product roadmaps with lots of moving pieces. Good luck fitting your database of dog names (oh yes, we did one of those) into their pipeline. Early on, database producer Ben Welsh set up a Django box at projects.latimes.com, where many of the Times’ interactive projects live. There are other great solutions besides Django, including Ruby on Rails (the framework that powers the Times’ articles and topics pages and many of the great data projects produced by The New York Times) and PHP (an inline scripting language so simple even I managed to learn it). Some people (including the L.A. Times, occasionally) are using Caspio to create and host data apps, sans programming. I am not a fan, for reasons Derek Willis sums up much better than I could, but if you have no other options, it’s better than sitting on your hands.

Templatize: Don’t build it unless you can reuse it. The goal of all this is to be able to roll out projects rapidly (see #6), so you need templates, code snippets, Flash components, widgets, etc., that you can get at, customize and turn around quickly. Interactive graphics producer Sean Connelley was able to use the same county-level California map umpteen times as the basis for various election visualizations in Flash.

Do breaking news: Your priority list may be full of long-term projects like school profiles and test scores, but often it’s the quick-turnaround stuff that has the biggest immediate effect. This is where a close relationship with your newsgathering staff is crucial. At the Times, assistant metro editor Megan Garvey has been overseeing the metro staff’s contributions to data projects for a few months now. When a Metrolink commuter train collided with a freight train on Sept. 12, Megan began mobilizing reporters to collect key information on the victims while Ben adapted an earlier Django project (templatizing in action!) to create a database of fatalities, complete with reader comments. Metro staffers updated the database via Django’s easy-to-use admin interface. (We’ve also used Google Spreadsheets for drama-free collaborative data entry.) … Update 11/29/2008: I was remiss in not pointing out Ben’s earlier post on this topic.

Develop new skills: Disclaimer: I know neither Django nor Flash, so I’m kind of a hypocrite here. I’m a lucky hypocrite, though, because I got to work with guys who dream in ActionScript and Python. If you don’t have access to a Sean or a Ben — and I realize few newsrooms have the budget to hire tech gurus right now — then train and nurture your enthusiasts. IRE runs occasional Django boot camps, and there are a number of good online tutorials, including Jeff Croft’s explanation of Django for non-programmers. Here’s a nice primer on data visualization with Flash.

Cohabitate (but marriage is optional): This may be less of an issue in smaller newsrooms, but in large organizations, collaboration can suffer when teams are split among several floors (or cities). The constituent parts of the Times’ Data Desk — print and web graphics, the computer-assisted reporting team and the interactive projects team — have only been in the same place for a couple months, but the benefits to innovation and efficiency are already clear. For one thing, being in brainstorming distance of all the people you might want to bounce ideas off of is ideal, especially in breaking news situations. Also, once we had everybody in the same place, our onetime goal of unifying the reporting structure became less important. The interactive folks still report to latimes.com managing editor Daniel Gaines, and the computer-assisted reporting people continue to report to metro editor David Lauter. The graphics folks still report to their respective bosses. Yes, there are the occasional communication breakdowns and mixed messages. But there is broad agreement on the major priorities and regular conversation on needs and goals.

Integrate: Don’t let your projects dangle out there with a big ugly search box as their only point of entry. Weave them into the fabric of your site. We were inspired by the efforts of a number of newspapers — in particular the Indianapolis Star and its Gannett siblings — to make data projects a central goal of their newsgathering operations. But we wanted to do more than publish data for data’s sake. We wanted it to have context and depth, and we didn’t want to relegate data projects to a “Data Central“-type page, something Matt Waite (of Politifact fame) memorably dubbed the “data ghetto.” (I would link to Waite’s thoughtful post, but his site unfortunately reports that it “took a dirt nap recently.”) I should note that the Times recently did fashion a data projects index of its own, but only as a secondary way in. The most important routes into data projects are still through related Times content and search engines.

Give back: Understand that database and visualization projects demand substantial resources at a time when they’re in very short supply. Not everyone in your newsroom will see the benefit. Make clear the value your work brings to the organization by looking for ways to pipe the best parts (interesting slices of data, say, or novel visualizations) into your print or broadcast product. For example, some of the election visualizations the data team produced were adapted for print use, and another was used on the air by a partner TV station.

When I shared this post with Meredith Artley, latimes.com’s executive editor and my former boss, she pointed to the formation about a year ago of the interactive projects team within the web staff (Ben, Sean and me; Meredith dubbed us the “cool kids,” a name that stuck):

“For me, the big step was creating the cool kids team — actually forming a unit with a mandate to experiment and collaborate with everyone in the building with the sole intention of creating innovative, interactive projects.”

And maybe that should have been my first piece of advice: Before you can build a data team, you need one or more techie-journalists dedicated full-time to executing online the great ideas they’ll dream up.

What else did I miss? If you’ve been through this process (or are going through it, or are about to), I hope you’ll take a minute to share your insights.

[Editor’s note: The amount of people using Facebook and other social networks are astounding. This 3d globe visualization movie shows Facebook users friending each other, commenting to each other, and otherwise interacting, all geolocated via IP addresses. I like the flight paths and pulses best. Thanks Lynda!]

A group of Facebook engineers – Jack Lindamood, Kevin Der and Dan Weatherford – have created a small project called Palantir at a Facebook Hackathon event. The project is named after The palantír of Orthanc, a crystal ball-like object from The Lord Of The Rings (yep, they’re nerds).

Anyway, it’s a video of the earth showing Facebook activity visually and geographically. One view shows activity as dots of light that flow upward. Another view shows connections between people around the globe as it occurs. The images above show a little of it, but you really have to see the video to appreciate it. You can see it here and here.

Facebook says they are strongly considering productizing this, but for now it isn’t on the roadmap. If they do go forward with it, presumably you’ll be able to watch friend connections happening all over the world.

How people spend their discretionary income – the cash that goes to clothing, electronics, recreation, household goods, alcohol – depends a lot on where they live. People in Greece spend almost 13 times more money on clothing as they do on electronics. People living in Japan spend more on recreation than they do on clothing, electronics and household goods combined. Americans spend a lot of money on everything. Related Article

I’m [ed-Zach Johnson] a little late on this, so I hope it’s old news to most readers that Universal Mind, where I’ve worked for the past 2 months, just launched a technology preview of theSpatialKey visualization system. This is a big deal.

Andrew Powell, Doug McCune, and Brandon Purcell have already posted great introductions to SpatialKey, so I won’t go through all that here. But just so’s you know: SpatialKey is a visualization system for geotemporal (location + time) data, developed primarily in Flex, that lets you filter and render thousands of points very quickly, all client-side in your browser.

This is not a formal release. We’re in a technology preview for now, which means you just get to see some sweet examples, but soon we’ll release a version, SpatialKey Personal, into which you can load and visualize your own data. Here are links to three of my favorite examples (for more, check out our Gallery page, or this post on the SpatialKey blog).

As I said, otherbetterintroductions have been written on SpatialKey; I just want to focus on a few of my favorite features or attributes.

not a single, do-it-all application

SpatialKey is based around a collection of visualization templates. Each offers a unique view of the data, with specialized visualizations, filters, and UI controls. Since the templates are specialized, each one is pretty easy to learn and begin using.

chorodot symbolization

You don’t see these much, but I think they’re really effective. The “heat grid” symbolization in SpatialKey is a modern implementation of a technique put forth by Alan MacEachren and David DiBiase in 1991.

Aggregating points to arbitrary but regularly-shaped polygons, or binning, was an extant graphical practice at the time, but the geographic application and their particular methods created an effective cartographic symbology. Other than SpatialKey, I haven’t seen this symbolization in a geographic visualization context, but I think it’s very effective at presenting large datasets that require aggregation. The heat grid symbolization in SpatialKey extends the approach by allowing grid renderings of attributes of the data (like house prices or temperature) in addition to aggregation of the count of points.

SpatialKey grid symbolization showing a data attribute (average home prices) in Sacramento county

small multiples / map comparison

I’ve always been a fan of the small multiples depiction of change, illustrated so well by Edward Tufte in The Visual Display of Quantitative Information and Envisioning Information. Though the SpatialKey Map Comparison template shows two multiples, it qualifies (and we can easily plug in more maps for specialized templates).

D.C. construction in the SpatialKey Map Comparison template

Both the maps and the time charts are live-linked. Mousing over an area on one of the maps or a bar on one of the time charts reveals the tooltip for both displays, allowing you to easily retrieve specifics for different time periods or areas.

complex temporal filtering and focusing via the heat index chart

The time chart, shown in the first screenshot above, is great for revealing linear temporal trends in a dataset, and for enabling linear filtering. But some datasets evince more complex temporal trends — for example, some crimes may be more common on a certain day of the week and at a certain time of day. Such trends are lost when data is aggregated in a linear fashion to, say, days or weeks.

in closing

I was late to the game on this one, joining Universal Mind in June. SpatialKey was developed by the brilliant team of Doug McCune, Ben Stucki, and Andrew Powell, led by Brandon Purcell and Tom Link, with product manager Mike Connor. It’s a privilege working with such a talented crew.

Our goals for this technology preview are modest (blowing minds, getting feedback), but we’re excited to continue developing SpatialKey and SpatialKey Law Enforcement. And we’ll be releasing updates, new examples, and SpatialKey Personal in the near future. So stay tuned to the SpatialKey blog, and please contact us if you have any feedback on our technology preview.

[Editor’s note: Fascinating proof-of-concept for how to create and display heat-maps in Google Maps for Flash/Flex AS3 using PHP back-end for calculation and Flash for front-end. More information for using Google Maps in Flash CS3 download and reference and tutorial. Similar to some nifty work Zach Johnson is working up at Universal Mind for spatialkey.com.]

Well for long time I wanted to give it a try and yesterday I had the time to experiment a bit. The idea was to display GBIF available data as a Heat Map over Google Maps. Here you have an screenshot for Quercus ilex:

And if you want to try for yourself here it is (some usability issue, the search box is on the bottom right corner):

1) Get the data: I am using the so called “Density tables” from GBIF. You can access them through GBIF web services API at http://es.mirror.gbif.org/ws/rest/density . For example in a query like this one for Quercus ilex (of course you need to get the taxonconceptkey from a previous request to the services):

This works fine but has some problems. The first one is that GBIF goes down almost every evening. Tim can maybe explain why. Thats why I am using the spanish mirror (look at the url) and I recommend you to do the same.

Second problem is the verbosity of the XML schema being used. For downloading the Animalia, well thats the biggest concept you can get probably, the result is 14.1 MB of XML. And thats just to get a list of cellIds (if anybody is interested we can post details about CellIds) with counts on them, exactly an array of 34,871 numbers. Even worst is handling them on a web client like this one, parsing such a huge xml output kills the browser. The GBIF webservices API deserve its own blog post I would say together with Tim.

But what is new is that I have supercow powers on GBIF I am working for GBIF right now and have access to a test database. In a testing environment I developed a little server app that publish the same density service but using theAMF protocol. I used AMFPHP for this if anybody is interested. There are two good things about using AMF: The output now is around 150 KB for the same thing and AMF is natively supported by Flash so there is no need to be parsed it goes straight into memory as AS3 Objects.

2) Create a Het Map from the data: Once the data is on the client I make use of a Class from Jordi Boggiano called HeatMap.as that creates Sprites as the result. In my case I decided to create a Spring, think like an Image, of 1 pixel per cellId creating a 360×180 pixel image (cellId is equivalent to a 1 degree box).

3) Overlay the image on Google Maps: When you have the Sprite, or even earlier but thats too many details, what you do is overlaying in Google Maps for Flash using a GroundOverlay object that takes care of the reprojection and adapting it to the map. The GroundOverlay is explained in the doc as a way to overlay images but it accepts actually any Sprite.

Done! (almost)

4) Ok, there are some problems: Yes, it is not perfect, these are the pending issues:

The GroundOverlay seems to not be reprojecting correctly the Sprite I generate and in the very north and south everything is not correctly overlayed.

The resolution of the Heat Map is a little bit poor, bu actually represent the quality of the data we have. Some interpolation could be done to make it look nicer.

The colours of the Heat Map do not fit well with the actual Google Maps layers. When there is small data then you can not see it almost.

I still dont feel confident with the code to release it yet. I hope I can work a little bit more on it so that i can be proud, but if you desperately need it let me know.

Just another notice. Yesterday Universal Mind released a preview of a new product: Spatial Key. I am always impressed with what this people does and follow the blogs from their developers (like this one and this one). They are kind of my RIA and web GIS heroes. The new product they have released actually look very much like what I wanted to do in Biodiversity Atlas for data anlysis. It lets people explore geographically and temporally huge datasets. Tim suggested me to contact them and I will do. Nevertheless it is great to have such a great tool available to get ideas on interaction design. Good job Universal Mind, you really rock.

We want to see your comments!

Update:

Some people asked for different quality settings on the heat map. I have modified the application so that you get now a set of controls to define different quality and drawing options. By default the app tries to figure out depending on the number of occurrences, but maybe thats not the best, depends on how the data is dsitributed. In a final product I think I would NOT provide this functionality to the user, too much for my taste. You know, less is more.

I posted some weeks ago my first experiment on HeatMaps over Google Maps for Flash. It was well received by the community of google maps developers and several asked for the code. I did not published it ten because there were still some things I did not understand and somehow were just magic and I had to tweak. The biggest problem was that the heatmap actually was not correctly overlaying in the map, it was clearly a Projection problem. I was plotting the coordinates in an image without reprojecting them to the Mercator projection used by google Maps.

Ok, now I have solved this by using the GoogleMapUtility.php class in the server after getting the latitudes and longitudes of my points from the database.

The GoogleMapUtility.php class can be found here. I am not sure who developed it but is widely available all over the web.

In this class I set the Tile Size to be 360, but it could had been more or less anything as long as in the Flex side then you use the same size for creating the Sprite (check the flex source code for more details).

What I would really like is to be able to do this reprojection on Flex as normally I just transfer coordinates to the client and then represent it in different ways, heatmap, grids, markers, etc.

I will try to port the GoogleMapUtility class soon to AS3 and publish it here.

I am using this code already in the widget I am developing for GBIF. It is only a small area and I dont have much data, but I am happy with it.