Thursday, September 27, 2012

The road safety chart in De Morgen 27/09/2012

The front page of "De Morgen" today (27/09/2012) features an article on public transportation and its relationship with road safety. The headline says "The less public transport, the more victims in traffic". To support this the frontpage shows a chart that I found difficult to understand. I had to look at it for several minutes before I understood what was going on. But I'll let you be the judge:

First let me say that I applaud the fact that such an important subject is covered on the front page. Secondly, I'm always happy when I see that an article supported by statistical material is so prominent in the news. That said, there are quite a few problems with the chart:

These circles represent the different statistics, however each has its own scale so that you need to be very careful how you compare them.

The independent variables (different measures of public transport use in the cities) are ordered from small to large, except the first one (percentage of inhabitants using a car for work and school related trips). I guess that's because all other independent variables measure the popularity of public transport in some way and the journalist wanted to have a hitparade of public transport friendly cities.

The dependent variable is the number of victims per 10,000 inhabitants. The pies are all bigger, although there is no reason to do so, other than that it is the variable of interest.

The dependent variable is not ordered from low to high or from high to low, but the position of the pies represents latitude and longitude. It took me a while to realize that because there's no underlying map of Belgium or Flanders behind it.

The colors represent the cities. There are more colors representing the number of victims than feature for the dependent variables.

Even for a trained eye it is not at all obvious that there is a strong
correlation between use of public transport and road safety.

I'm not a transport expert, but I seriously doubt that the % of inhabitants with a subscription or a public transport service is a valid indicator for Hasselt. That city has basically free public transportation. So either that number is not correct, or either it is low because public transport is free for the inhabitants of the city of Hasselt.

By the way, I don't question the thesis that good public transportation has an impact on road safety measured by the number of casualties, I just think the graph is not clear and does not support the thesis as far as I can judge.

Is there a better way of representing the data? Hardly. In my mind, a simple table still does the best in this case:

At least it reveals that lots of figures are missing. Another, more colourful option is a series of horizontal barcharts with the bars sorted in the main variable of interest (i.e. victims). In this case I'm using the complement of the percentage of inhabitants that use a car for work or school, so that all independent variables have the same direction. The graph is produced with Tableau, the visualization software that iVOX, the company I work for, is experimenting with for its reporting needs.

This chart shows that the correlation that the journalist wants to convey is not that clearly present.

Finally, you could assume that all independent variables are indicators of a latent variable that represents the usage and availability of public transport. There are many statistical techniques that can be used for that. In this case, with such a low number of observations and so many missing values, I prefer to use a very simple approach:

First I'm using the complement of the percentage of inhabitants that use a car for school or work

Secondly I have rescaled all independent variables in z-scores. z-scores are obtained by subtracting the average from each observation and dividing by the standard deviation. Now they can be compared between cities and over the different measures. Negative z-scores are values below the average and positive z-scores are above the average.

To resolve the problem with the missing values I have calculated an overall score that measures the "public sector"- friendlyness of a city by taking the median over all non-missing z-values.

The correlation between that "public transportation" compound variable and the relative number of road victims is -.21. So it is in the predicted direction, but it is also low. It means that variation in our compound score of public transport only explains about 4.5% of the variation in road safety.

Finally, we've plotted the road safety figures in a map. The surface of the circles represents the relative number of victims. And the colorcoding represents the compound "public transportation" variable, with red representing a low value and red a high value on public transportation use in a city taken over all indicators. The values in the middle or grey. Cities that did not have any indicators for public transportation, but only a value for road safety are grey as well.

The map shows that,based on the data shown by the journalist, there is some variation in road safety, but that public transportation only plays a minor role to explain this.
Again, it's perfectly possible that there is a relationship between use of public transportation and road safety. It's just that the material presented in the article in De Morgen today did not support this conclusion very strongly. Furthermore the way the statistics were shown created more confusion than it helped supporting the claim of the journalist .

About Me

Istvan Hajnal is a veteran of more than 20 years in the fields of data analysis, survey methodology and market research. First at the University of Leuven, Belgium and then
about 10 years with The Nielsen Company, the world's largest Market Research Company. Istvan is currently Insights Director, Marketing & Data Sciences for GfK, Belgium.
He received a master's degree in computer science (Leuven), a master's degree in quantitative applications in the social sciences (Brussels) and finally a Phd in Social sciences from the University of Leuven.
He blogs about Data Science but occasionally also on management and leadership in general and the Market Research Industry in particular.