Friday, 1 April 2011

Friday, 14 January 2011

Data Journalism Background

There is a quiet revolution underway in the field of journalism that is changing the way we tell stories. Data journalism is an emerging and increasingly valuable journalistic discipline that involves finding stories buried in statistics and visualizing them in a way that makes them easy to comprehend.

Data journalism is an offshoot of computer assisted reporting (CAR), which has been around for decades and was once the exclusive purview of reporters with programming knowledge or, at minimum, a geek side. But the ubiquitous nature of computers today and the advent of online journalism has resulted in programmers developing tools that put data analysis and visualization within the reach of every journalist. Increasingly stories are being published in both mainstream and online media outlets where the database is the focal point of the work as opposed to merely a starting point for an interpretive print article or television story.

Journalists need to be data savvy. It used to be that you would get stories by chatting to people in bars...But now it’s also going to be about poring over data and equipping yourself with the tools to analyze it and picking out what’s interesting. And keeping it in perspective, helping people out by really seeing where it all fits together and what’s going on in the country.”

Technological advances mean that journalists need no longer be working on a project on the scale of Wikileaks or have the programming chops of Adrian Holovaty who developed Chicago Crime (now EveryBlock Chicago) in order to publish informative hyperlocal data journalism. Accessibility is improving by the day as more municipalities embrace the open data philosophy and take steps to release public data to improve the transparency and accountability of local government.

Goal

The goal of this experimental portfolio was to explore the extent to which new computer tools and open data are making investigative opportunities more accessible to community journalists. This was analyzed by using available public data for the neighbourhood of Birch Cliff in Toronto to develop data visualizations for a hyperlocal blog which is under development for the neighbourhood. It was truly experimental in that prior to enrolling in the MA Online Journalism program, I was unfamiliar with the term data journalism and had certainly never heard of the open data movement. My only experience with internet maps involved searching Google Maps for directions and printing the results. I have used spreadsheets in the past to view television budgets and tally student grades, but have never created a spreadsheet and have no knowledge of formulas. I have no experience with with HTML, XML, KML, JSON or geotagging.

Method

My work began on the day that the City of Toronto launched its open data initiative, which coincided with an international hackathon involving 73 cities on five continents. I decided I would hold my own private hackathon -- one inexperienced hack and no hackers to assist.

I studied Toronto’s open data website to determine if there was data relevant to Birch Cliff in an XLS format, as I had been warned to avoid working with XML and Shapefiles as they would pose too big a challenge. I was pleased to find official results from the 2010 municipal election and decided to create a map showing the poll-by-poll breakdown of vote results in Ward 36, which includes Birch Cliff.

My research began with many hours of fiddling with spreadsheets in Numbers, Excel and Google Docs and exploring appropriate mapping programs including UMapper, Mapalist, Open Street Map and Google Maps. I also researched API’s and KML. I studied similar maps produced by The Electoral Map and Torontoist and was particularly impressed by the work of Patrick Cain, a Toronto journalist who makes maps for a living.

Given that the detailed Ward 36 map is only available as a PDF, it was apparent that I needed to copy the poll boundaries by hand into Google Maps. I was ultimately successful in creating a map but I was unable to label it or add overlays to show how the different polls voted. In short, I hated it.

Rethink Required

I set aside the election map and decided the best approach was to revert to first principles -- I needed to walk before I could run. Since the goal was to create visualizations suitable for a hyperlocal website, I started over and created a simple hand drawn Google map showing the coverage area of the Birch Cliff Blog.

With two hand drawn maps under my belt, I returned to the election map and created a proper version which includes coloured overlays to indicate how each poll voted, as well as click-through statistics breaking the vote down by candidate and stating the voter turnout.

In addition to being interactive and allowing users to see how the the voted played out in their neighbourhood, the Ward 36 election map is a useful journalistic tool because it visualizes a trend that might not be obvious by simply looking at the statistics. It can be seen that the western portion of the ward, which is a relatively affluent part of the community is solidly light blue, indicating most people voted for Robert Spencer. The apartment towers and selected other polling districts in the east are purple, suggesting that people in less affluent parts of the community voted for Dianne Hogan. This polarization allowed Gary Crawford who won the polls coloured red to come up the literal middle and win the election.

The voter statistics could be explained by the candidates' geographic power bases, but I also wanted to explore whether there was a socioeconomic trend, and so I tried to drill down. The City of Toronto, however, only releases demographic information at the ward level and Statistics Canada breaks down demographics according to tracts, which don't match the polling boundaries. So I turned to other methods to better visualize the voter breakdown which led me to Google Fusion tables. I uploaded the election data, tidied up the statistics, reconfigured columns, and generated the following pie chart and bar graph.

The pie chart was designed to be interactive so that the user can choose a poll and see how residents there voted. The embedded version below, however, is only partially interactive. Users can click on a colour and see the vote totals, but the legend that allows for the poll-by-poll breakdown is missing. For full interactivity, you need to go here, click on "visualize" at the top and choose pie chart. I joined the Google Fusion users group to determine if the embed code was faulty and was told by the administrator that this feature isn't available yet. I was invited to make an implementation request which I have done.

2010 Election Results - Ward 36 Councillor Poll-By-Poll Breakdown

The embedded bar graph, below, functions as was intended, allowing users to click on individual bars to compare the overall election results.

2010 Election Results - Ward 36 Overall Vote

My next step was to merge the Ward 36 Councillor election data with the results of the Mayor’s Race. Toronto elected a controversial right wing politician as mayor and I wanted to see if any patterns could be found there. Despite several hours of fiddling with the configuration of the data, I could not merge the Councillor spreadsheet with the Mayor spreadsheet because Google Fusion only allows the merging of rows and not columns. I tried unsuccessfully to switch the x-axis and y-axis and further research appeared to confirm that this is not possible. Although I was unable to compare the Councillor vote to the Mayor vote, I produced a pie chart with a clickable poll-by-poll breakdown of the vote for Mayor. Like the previous pie chart, for full interactivity, you need to go here.

2010 Election Results - Ward 36 Mayor Vote Poll-By-Poll

Leaving politics behind, I turned my attention back to mapping because I wanted to progress beyond creating maps by hand and experiment with lattitude and longitude and geotagging. There was no appropriate open data so I decided to create a restaurant map because market research for the Birch Cliff Blog indicates there is a high demand for restaurant information among potential users. It's not journalism, but rather a good skill-building exercise. After researching various mapping tools, I decided to use Bachgeo, which offers the ability to create an interactive, embedded map with built in geocoding and no need for latitude and longitude coordinates. The map was successful but the process was still time consuming in that quite a bit of data entry was required to enter the addresses. A dozen restauraunts inexplicably migrated 20 kilometres west of Birch Cliff and the fix required dragging the markers by hand to their proper position. I was pleased that I could filter the restaurants based on cuisine which was one of the goals I set for the map.

I wanted to create a version of my map in Google Earth and embed it on my website for the simple reason that it looks cool and because it would require exploration of KML and API's. I signed up for a Google Earth API key and was directed to a developer’s guide that indicated this course of action was clearly beyond my skill set:

This documentation is designed for people familiar with JavaScript programming and object-oriented programming concepts. The Google Earth API is modelled after KML, so you should also consult Google's KML documentation.

I began a search for work arounds and discovered a plugin that was not an option because my blog is hosted at wordpress.com. I tried TakItWithMe but that wouldn’t work because my Google Earth Map doesn’t have an url. Ultimately I found Google’s KML embed gadget and realized that I could generate a KML from the Batchgeo site where I had created the map. The KML needed to be hosted on the internet, however which may be easy for someone with more experience but instructions to “put it online somewhere” posed a problem for me. I tried for hours, posting it on Google Docs and Google Sites among other things, but in the end expert help was needed and the KML found a home on Dropbox. The map still doesn't fly in like it's supposed to (you have to zoom manually), but I'm working on it.

Data journalism can take many forms and in an effort to explore as many visualisation styles as possible, I researched Dipity, a very simple program to create engaging, interactive timelines using embedded documents, photos and videos. I immediately saw its value as a journalistic tool to help explain the development history of the Quarry, an issue of vital importance to many residents of Birch Cliff. There is a plan on the table to erect a residential development on the Quarry lands, comprised of seven 24 to 27-storey apartment towers totalling 1,455 units. The community has been fighting the proposal for 40 years and the background is lengthy and complicated. One need only look at the FAQ I wrote almost a year ago in Microsoft Word and compare it to the Dipity timeline in order to see the value of the latter.

I wanted my final experiment to be a mashup generated by Yahoo Pipes, a useful and user-friendly feed aggregator and manipulator that would enhance a hyperlocal blog and requires no coding ability. My goal was to create a pipe compiling RSS feeds related to the coverage area of the Birch Cliff Blog. The project took several hours because the pipes continually came up empty, for both feeds and Flickr images, but the problem was not so much technical as it was content related. Birch Cliff simply doesn’t get media coverage which is part of the rationale for establishing the blog in the first place. The final product is an RSS pipe of feeds pertaining to the larger geographic area of Scarborough.

Conclusion

Now that I’ve established some base knowledge about data journalism it is my plan to continue accessing open data at toronto.ca/open in order to generate content for the Birch Cliff Blog. It seems to me, however, that the service is targeted more to programmers than lay journalists and I’ve started a dialogue with the experts behind the initiative on how people with my level of experience can best use the site.

Local data journalism for someone with no programming experience is achievable but not for the faint of heart. In my case, it required innumerable hours of research and a “can do” mindset that refused to be intimidated. My advice to others is to do extensive research, read the instructions, view the video tutorials and then do it all over again. Start with something simple and build your way up to more complicated projects. That being said, in order to progress in the field of data journalism, in the near future I plan to enroll in a series of courses, starting with beginner HTML.

If you've read this far, thanks very much. I would appreciate any comments you might have, especially about things I might try going forward. Please check out my regular blog here.