Tag Archives: data journalism

This is a write-up on how I made a slideshow for the Under-17 World Cup.

The U-17 World Cup is the first-ever FIFA tournament to be hosted by India. Like many of you, I’ve seen plenty of men’s World Cups, but never an U-17 one. To try and understand how the U-17 tournament might be different from the ‘senior’ version, I compared data from the last U-17 World Cup held in Chile in 2015 and the last men’s World Cup in Brazil in 2014.

The data was taken from Technical Study Group reports that are published by FIFA after every tournament. (The Technical Study Group is a mixture of ex-players, managers and officials associated with the game. You can read more about the group here.)

In particular, I used the reports for the 2014 World Cup and the 2015 U-17 World Cup. The data was taken pretty much as is, and thankfully didn’t have to be processed much. An example of the data available in the report can be seen in the image below. It shows how the 171 goals in the 2014 World Cup came about.

The main takeaway from the comparison with the men’s World Cup is that the U-17 World Cup might see more goals and fewer 0-0 draws on average. The flipside is that there could be more cards and penalties too. For more details, check the slideshow.

BE LESS INTIMIDATING FOR READERS

I know just using one World Cup each to represent men’s and U-17 football may not be particularly rigorous. We could have also used data from the previous three or four World Cups in each age format. But if I did that, I was scared the data story would become more dense and intimidating for readers. I wanted to make this easy to follow along and understand, which is why I simplified things this way.

Another thing I did to make this easier to digest was to stick to one main point per card (see image above). The main point is in the headline, then you get a few lines of text below showing how exactly you’ve arrived at the main point. The figures that have been calculated and compared are put in a bold font. Then there is an animated graphic below that, which visually reinforces the main point of the slide.

The data story tries to simulate a card format, one that you can just flick through on the mobile. I used the slideshow library reveal.js to make the cards. But I suspect there is a standard, more established method that mobile developers have to create a card format, will have to look into this further.

The animations were done with D3.js, with help from a lot of examples on stackoverflow and bl.ocks.org. If you’re new to D3 and want to know how these animations were done, here’s more info.

ANIMATING THE BAR CHART

The D3 ‘transitions’ or animations in this slideshow are basically the same. There’s (a) an initial state where there’s nothing to see, (b) the final state where the graphic looks the way you want and (c) a transition from the initial state to the final state over a duration specified in milliseconds.

For example, in the code snippet for the bar animation above, you see two attributes changing for the bars during the transition—the ‘height’ and ‘y’ attributes changing over a duration of 500 milliseconds. You can see another example of this animation at bl.ocks.org here.

ANIMATING THE STACKED BAR CHART

This animation was done in a way similar to the one above. The chart is called a ‘normalised stack chart’ and the code for this was taken from the bl.ocks.org example here.

The thing about this chart is that you don’t have to calculate the percentages beforehand. You just feed in the raw data (see image below) and you get the final percentages visualised in the graphic.

ANIMATING THE LINE CHART

The transition over here isn’t very sophisticated. In this, the two lines and the data points on them are basically set to appear 300 milliseconds and 800 milliseconds respectively after the card appears on screen (see the code snippet below).

A cooler line animation would have been ‘unrolling’ the line as seen in this bl.ock.org example. Maybe next time!

ANIMATING THE PIE CHART

Won’t pretend to understand the code used here. I basically just adapted this example from bl.ocks.org and played around with the parameters till it came out the way I wanted. This example is from Mike Bostock, the creator of D3.js, and in it he explains his code line by line (see image below). Do look at it if you want to fully understand how this pie chart animation works.

ANIMATING THE ISOTYPE CHART

Yup, this chart is called an isotype chart. This animation is another one where the transition uses delays. So if you look in the gif, you see on the left side three cards being filled one after the other.

They all start off with an opacity of 0, which makes them invisible (or transparent, technically). What the animation does is make each of the cards visible by changing the opacity to 1 (see image above). This is done after different delay periods of 200 milliseconds for the bottom card, 400 for the card in the middle and 600 milliseconds for the card on top.

FINAL WORD

If you’ve never worked with D3 before, hope this write-up encourages you to give it a shot. You can look at all the code for the slideshow in the github repo here. All comments and feedback are welcome! 🙂

So I created an interactive for Wionews.com (embedded below) on the assembly elections taking place in five states. This write-up goes into how I did the interactive and the motivations behind it.

The Interactive is embedded below. Click on Start to begin.

The interactive looks at three things:

where each party won in the last assembly election in 2012 in each of the five states, visualised with a map.

where each party won in the last Lok Sabha (LS) election in 2014, if the LS seats were broken up into assembly seats. This was also done with a map.

the share of seats won by each major party in previous assembly elections, done with a line chart.

I got all my data from the Election commission website and the Datameet repositories, specifically the repositories with the assembly constituency shapefiles and historical assembly election results.

Now these files have a lot of information in them, but since I was making this interactive specifically for mobile screens and there wouldn’t be much space to play with, I made a decision to focus just on which party won where.

As mundane as that may seem, there’s still some interesting things you get to see. For example, from the break-up of the 2014 Lok Sabha results, you find out where the Aam Aadmi Party has gained influence in Punjab since the last assembly elections in 2012, when they weren’t around.

The interactive page on the AAP in Punjab, 2014

ANALYSING THE DATA

While I got the 2012 election results directly from the election commission’s files, the breakdown of the 2014 Lok Sabha results by assembly seat needed a little more work with some data analysis in python (see code below) and manual cross-checking with other election commission files.

PUTTING IT ALL ONTO A MAP

The next thing to do was put the data of which party won where onto an assembly seat map for each state.

To get the assembly seat maps, I downloaded the assembly constituency shapefile from the datameet repository and used the software QGIS to create five separate shapefiles for each of the states. (Shapefiles are what geographers and cartographers use to make maps.)

A screenshot of the QGIS software separating the India shapefile into separate ones for the states.

The next task is to make sure the assembly constituency names in the shapefiles match the constituency names in the election results. For example, in the shapefile, one constituency in Uttar Pradesh is spelt as Bishwavnathganj while in the election results, it’s spelt as Vishwanathganj. These spellings need to be made consistent for the map to work properly.

I did this with the OpenRefine software which has a lot of inbuilt tools to detect and correct these kinds of inconsistencies.

The purist way would have been to do all this with code, but I’ve been using OpenRefine, a graphical tool, for a while now and it’s just easier for me this way. Please don’t judge me! (Using graphical tools such as OpenRefine and QGIS make it harder for others to reproduce your exact results and is less transparent, which is why purists look down on a workflow that is not entirely in code.)

After the data was cleaned, I merged or ‘joined’ the 2012 and 2014 election results with the shapefile in QGIS, I then converted the shapefile into the geojson format, which is easier to visualise with javascript libraries such as D3.js.

I then chose the biggest three or four political parties in the 2012 assembly and 2014 LS election results for each state, and created icons for them using the tool Inkscape. This can be done by tracing the party symbols available in various election commission documents.

Some of the party icons designed for the interactive

HOW IT’S ALL VISUALISED

The way the interactive would work is if you click on the icon for a party, it downloads the geojson file which, to crudely put it, has the boundaries of the assembly seats and the names of the party that’s won each seat.

The interactive map showing the NPF in Manipur in 2014

You then get a map with the seats belonging to that party coloured in yellow. And each time you click on a different party icon, a new map is generated. (If I’ve understood the process wrong, do let me know in the comments!)

I won’t go into the nitty gritty of how the line chart works, but essentially every time you click on one of these icons, it changes the opacity of the line representing the party into 1 making it visible while the opacity of every other line is reduced to 0 making them invisible.

Now I haven’t gone into the complexity of much of what’s been done. For example, if you see those party symbols and the tiny little shadows under them (they’re called drop shadows), it took me at least two days to make that happen.

It took two days to get these drop shadows!

MOTIVATIONS BEHIND THE INTERACTIVE

As for the design, I wanted something that people would just click/swipe through, that they wouldn’t have to scroll through, and also limit the data on display, giving only as much as someone can absorb at a glance.

My larger goal was to try and start doing data journalism that’s friendlier and more approachable than the stuff I’ve been doing in the past such as this blogpost on the Jharkhand elections.

I actually read a lot on user interface design, after which I made sure that the icons people tap on their screen are large enough for their thumbs, that icons were placed in the lower half of the screen so that their thumbs wouldn’t have to travel as much to tap on them, and adopted flat design with just a few drop shadows and not too many what-are-called skeumorphic effects.

Another goal was to allow readers to get to the information they’re most interested in without having to wade through paras of text by just tapping on various options.

The sets of options available to the user while in the interactive

I hacked a lot of D3.js examples on bl.ocks.org and stackoverflow.com to arrive at the final interactive, I’m still some way away from writing d3 code from scratch, but I hope to get there soon.

Because I’m not a designer, web developer, data scientist or a statistician, I may have violated lots of best practices in those fields. So if you happen to come across some noobie mistake, do let me know in the comments, I’m here to learn, thanks! 🙂

Shijith Kunhitty is a data journalist at WION and former deputy editor of IndiaSpend. He is an alumnus of Washington University, St. Louis and Hindu College, Delhi.

Mumbai saw its third data meet on 26th October, 2014 with a total of 14 participants, in-spite of it being a Diwali weekend. This time around we decided to try out a new place and the venue was a roof top place located at Chium Village, Khar West. A nice cozy place but a tad bit difficult to find for people who are not familiar with the area.

This time also the crowd was titled heavily towards the tech side.

The speaker was Sanjay Bhangar, co-founder, CAMP, who is a web developer for the past 8 years, with extensive experience in online video and mapping technologies. who first, gave a small introduction to the Data Meet, its founders Thejs and Nisha and how it now operates as a trust and that the idea is to encourage open data movement among data enthusiasts.

1. Introduction to our video archival platforms – they have been running this for the last five years. He explained how to gather metadata about all Indian films ever made, general video analysis tools ( timeline generation / cut detection), etc.

He explained the use of , https://pad.ma and how it is an online tool for saving videos.

2.Mapping schools in Karnataka – explained how they have been collecting data on schools in Karnataka and are working with the Akshara Foundation who run a lot of programs on schools and they have a lot of child level data which allows you to track performance of children in schools across the state. A suggestion was made if they could also map crime data highlighting the recent crime against children in Bangalore schools.

3.He showed us an example of how he worked on a project of mapping historical data for the New York Public Library.

The next data meet will be held on 29th November, 2014. Pls follow the Mumbai Meet-Up Group to know about the details.

A few weeks ago we held an Intro to Data Journalism Workshop. Josephine Joseph was in attendance, she regularly writes for Citizen Matters, Bangalore’s local paper that knows all. She was working on this story and has published it last week with Citizen Matters, I’m very happy to crosspost it here as a great example of local data journalism.

East Bangalore area, particularly Whitefield- KR Puram – Mahadevapura area, is on the prime real estate map. What are the projects coming up next? What are the implications?

Investing in real estate in Bangalore is a dream of any investor. However, is the growth of this sector in tune with the infrastructure that the city can handle?

A close look by Citizen Matters at 26 constructions coming up in Whitefield – KR Puram area in East Bengaluru shows some alarming observations. When the 8,000 flats are fully occupied, new residents will need 10,662.87 KL of water a day (equivalent of 1780 water tankers of 6000 Litres). More than 19,697 cars will add to Whitefield traffic.

Ministry of Environment and Forests (MoEF) rules make builders of projects of more than 20,000 sqm built up area, apply for an Environmental Clearance (EC) from the state, along with all the other permissions and NOC from BBMP, BWSSB, Karnataka Ground Water Authority (KGWA) to drill borewells prior to construction commencement.

The State Expert Appraisal Committee (SEAC) receives the applications and recommends checks and balances, prior to recommending a project for EC to the State Environment Impact Assessment Authority (SEIAA).

The SEIAA reviews project details, clarifies issues and only then is the EC issued. In cases where construction has begun without an EC, the builder is served with a show cause notice. The KSPCB can file cases against builders under the Environment Protection Act if they proceed with construction without an EC.

Last Sunday, August 31st, Thej and I worked with an Economic Times Journalist Jayadevan PK to design an intro to data journalism workshop. For a while now there has been quite a bit of interest and discussion of data journalism in India. Currently there are a few courses and events around promoting data journalism, we thought there was definitely room to start to build a few modules on working with data for storytelling. Given that we have not done too many of these we decided to do an introduction and leave it limited to a few people.

We had four story tellers with us, from various backgrounds. We spent the morning doing introduction and what was their experience with data, what their definition of data journalism is and why they wanted to take this workshop. Then we had them put up some expectations so we can gauge what the afternoon should focus on.

We then had Jaya go through the context of data journalism in terms of the world scale and the new digital journalism era.

Then we spent some time going over examples of good data journalism and bad.

After we went through resources people can use to get data. We touched upon the legal issues around using data and copyright issues. Then we discussed accuracy and how to properly attribute sources.

Visualization Roadmap
The participants thought understanding how to visualize would be helpful. So we went through a sort of visualization roadmap. Then went through stories they were working on to see how we would create a visualization and also how to examine the data and come up with a data strategy for each story.

People wanted another day to let the lessons be absorbed and some more time to actually have hands on time with the tools. Also even at the intro level it is important to make people come prepared with stories, so they have something to apply the ideas to.

To say we learned a lot is an understatement. We will definitely be planning more intro workshops and hopefully more advanced workshops in the future, we hope to continue to learn what people think is important and will keep track and see what kinds of stories come out of these learning session.

If you want a particular workshop feel free to request one here. Stay tuned to the blog and to the list to hear about the next one.