Daily news about using open source R for big data analysis, predictive modeling, data science, and visualization since 2008

December 05, 2011

Vote Compass: visualizing Canadian poll results with R

Vote Compass is an online "electoral literacy application, whose goal is to encourage engagement with and stimulate discussion around the policy platforms of Canada's political parties. In the lead-up to the 2011 Canadian election, Vote Compass collected the results of an on-line 10-minute survey from more than 2 million participants, and used the results to align voters with the political party that best matched their stated political beliefs. (You can see an overview of the project in this YouTube video.)

Also, the breakdown of the question responses by sub-group are also presented using the ggplot2 package:

It's certainly a visually pleasing and interesting way of exploring the results of this survey. One design point: at first, I wasn't sure what to think about using continuous lines for the sub-group charts above, since there were only five possible survey responses. (It seemed a bit odd to interpolate and smooth between "somewhat disagree" and "neither agree not disagree", for example.) But having explored the site a bit, I found these charts surprisingly useful to interpret the results: it's much easier to compare the subgroups with these overlaid "density" charts, than it is using the traditional stacked barchart or similar. And I guess preferences (from "strongly disagree" to "strongly agree") do lie along a continuum, even if only discrete survey responses are allowed.

Anyway, it's nice to see such clarity of vision and attention to detail in the presentation of data like this.

The section about the party platforms is a separate research area from that of the data: The party platform positions are backed up with text from the official party texts or speeches from leaders, and there's a backend to the tool to help users navigate these positions.

We understand that the raw data, as collected, are themselves not representative of the population. The scale used to input the data onto the maps, for example, takes the "Average" riding, and then maps other ridings relative to it. The assumption is that those variables that lead users to select into using the tool are relatively constant across ridings, and thus relative differences between ridings should be informative enough about riding population differences. We're quite aware of selection effects. This is stated also in as non-technical language as we could come up with under the maps (we don't assume people think much about selection effects). The same applies to the density plots. Even if we can't say that the values represent population values, with the assumption mentioned above, differences between groups should be meaningful.

Just as an aside, we're also currently looking into techniques to weight the data by census totals, political interest, and by matching the data to a nationally representative sample taken during the 2011 election.

It is an attractive presentation, but at the same time, is a standard choropleth map the best type of graphic to use in this case?

I find the large areas of color to be somewhat misleading (or at least hard to interpret), especially since population distribution in Canada is highly uneven. Canada's three territories (about 40% of Canada's land area) account for less than 1% of Canada's population (I think, actually, it's closer to a third of a percent). Yet, in the graphic, they figure very prominently, particularly because our eyes tend to be drawn to the large areas of color.

I am also not sure that I like that there is a possible overlap of the color palette for the choropleth map and the background color of the overall graphic. I would think that more distinct colors would be more effective.

The introduction that a participant sees says this: "Vote Compass is a free educational tool developed by political scientists. It asks you for your opinion on a number of political issues and then shows you your position in the political landscape as well as how your views compare with the platforms put forward by each of the political parties."

I tried it out before the Federal election; I was never informed that my responses were being recorded and would be used later as though a poll had been conducted. So I experimented: I tried to guess how to come up with answers that matched a particular party. Then, after the election, I heard reports in the news that the results were being used as though they were a poll. That's just bogus.

The decisions to create the scale and use the colours as we did were the result of a (really) long discussion. There are some pretty harsh colour constraints: we cannot use any of the party colours to display the data for obvious reasons. Thus we were left with either yellow or something of a pink/purple. Because of the self-selection problems, we specifically wanted to make the scale relative to the average riding, and thus we needed two separate colours with a neutral one in the middle. We chose the "average" riding, because the average is a quantity that is commonly understood by the public. Instead of black to white to yellow, perhaps we could have used pink to white to yellow. My suspicion is that this wouldn't work overly well, because the contrast would be lessened, but I'll be sure to give it a try if we create choropleths in the future.

I personally find the size of the territories to be a minor concern, but I'm open to suggestions about this. Most of the comparisons we make are between regions, provinces, urban/rural areas, so the usual questions are something to the effect of "How does province/region/city y differ from province/region/city x?" We also created the city maps so that differences within geographically small but population-dense areas were not underrepresented as they are on the large map. If you have any suggestions about how better to display the data to show such differences, please let me know. I'm open to using multiple graphics that display the same information as well. I'm of the belief that with statistical graphics, there's never a "best" way, and that some graphics highlight different sides of the data that other methods miss, and vice versa. A supplementary series of graphics to the choropleths could be quite useful.

Hi Duncan,

We made sure to be explicit about the fact that we would be analyzing respondent data in our disclaimer, privacy statement, and on various interviews in the news media. All the data are anonymous, so there's no way for us to know to whom an observation belongs. None of the data are given to the political parties, and it is used for academic purposes. We also have security protocols to prevent our data from being affected by those who test the tool, which I can't go into for obvious reasons. To come up with the answers that matched a particular party, we designed the site so that users could see all of the official documentation that supported how and why each party was coded as it was. This part was actually quite a large research endeavour as you can probably imagine.

You mentioned that you "find the size of the territories to be a minor concern." I assume you're Canadian, so you would know somewhat inherently to not let the large area dominate your interpretation. I can guarantee that most of my students (I'm in India) won't know much about Canada's territories and provinces or the population distribution in the country; thus, for those "viewers" in particular, choropleth maps can be misleading.

Incidentally, the day this post was made on the Revolutions blog, another choropleth-related post was made at the Junk Charts blog. (See here: http://www.webcitation.org/63kDd2vhh)

That post showed choropleth maps being displayed side-by-side with cartograms, and I found that having the combination of charts to be an interesting attempt to overcome the main problem I see with choropleths. However, I think that in Canada's case, the cartograms would become severely distorted, which might significantly reduce their effectiveness.

As you mentioned, "there's never a 'best' way" to prepare statistical graphics, and having multiple graphics (and more detailed graphics of more densely populated areas, like you have already done) definitely helps.