The Wrong Answers

Ever since I started looking into the results from Season 4, I’ve been interested in those classifications that are wrong. Now, when I say “wrong,” I really mean the classifications that don’t agree with the majority of volunteers’ classifications. And technically, that doesn’t mean that these classifications are wrong in an absolute sense — it’s possible that two people classified something correctly and ten people classified it wrong, but all happened to classify it wrong the same way. This distinction between disagreement with the majority and wrong in an absolute sense is important, and is something I’m continuing to explore.

But for right now, let’s just talk about those classifications that don’t agree with the majority. To first look at these “wrong” classifications, I created what’s called a heat map. (Click to make it bigger.)

This map shows all the classifications made in Season 4 for images with just one species in it. (More details on how it’s made at the end, for those who want to know.) The species across the bottom of the map are the “right” answers for each image, and the species along the left side are all the classifications made. Each square represents the number of votes for the species along the left side in an image where the majority voted for the species across the bottom. Darker squares mean more votes.

So, for example, if you find aardvark on the bottom and look at the squares in the column above it, you’ll see that the darkest square corresponds to where there is also aardvark on the left side. This means that for all images in which the majority votes was for aardvark, the most votes went to aardvark — which isn’t any surprise at all. In fact, it’s the reason we see that strong diagonal line from top left to bottom right. But we can can also see that in these majority-aardvark images, some people voted for aardwolf, bat-eared fox, dik-dik, hare, striped hyena, and reedbuck.

If we look at the heat map for dark squares other than the diagonal ones, we can see which animals are most likely confused. I’ve circled in red some of the confusions that aren’t too surprising: wildebeest vs. buffalo, Grant’s gazelle vs. Thomson’s gazelle, male lion vs. female lion (probably when only the back part of the animal can be seen), topi vs. hartebeest, hartebeest vs. impala and eland(!), and impala vs. Grant’s and Thomson’s gazelle.

In light blue, I’ve also circled a couple other interesting dark spots: other-birds being confused with buffalo and hartebeest? Unlikely. I think what’s going on here is that there is likely a bird riding along with the large mammal. Not enough people classified the bird for the image to make it into my two-species group, and so we’re left with these extra classifications for a second species.

It’s also interesting to look at the white space. If you look at the column above reptiles, you see all white except for where it matches itself on the diagonal. That means that if the image was of a reptile, everyone got it. There was no confusing reptiles for anything else. Part of this is that there are so few reptile images to get wrong. You can see that wildebeest have been misclassified as everything. I think that has more to do with there being over 17,000 wildebeest images to get wrong, rather than wildebeest being particularly difficult to identify.

What interesting things do you see in this heat map?

(Read on for the nitty gritty or stop here if you’ve had enough.)

Nitty Gritty

I learned last week that some of you enjoy all the little details. So here’s what I did this week: I used last week’s analysis to pull out the 44,471 capture events containing just one species (according to the majority). I then pared this down to just the ones that had gathered enough classifications to have a consensus — something I didn’t do last week that I probably should have. I labeled each classification with the “right” species based on the majority rules I explained last week. I left out any capture events where the answer was “hard to figure out.” That left me with 447,901 classifications from 38,450 capture events. And then I just tallied everything up in a big table, counting how many classifications of species X there were for a capture event that had the majority classification of Y.

To make the heat map, I used a spiffy little program called JMP that makes quick data analysis easy. But you could make the same sort of map using open software like R. The first time I made the map, I used the raw number of classifications as the gradient from white-to-black in the heat map. But there are so many wildebeest capture events that they completely swamped everything else out; there was just a black square for wildebeest-wildebeest and a gray one for zebra-zebra, and everything else was white. So I did what scientists often do when confronted with this sort of scale problem: I took the natural log of the number of classifications. This has the effect of making large numbers appear closer to one another and smaller numbers appear to spread out more. That did the trick, and the result is the heat map you see above.

8 responses to “The Wrong Answers”

The warthogs seem to be disproportionately misidentified, given that they only make up 2% of the total. I’m guessing that some of this is down to those difficult close-up shots – trying to identify a screen full of fur is not easy and sometimes came down to a wild guess.

As for the identification of humans as buffalo and elephant, well…the less said about that, the better!!

I suppose this is partly because there are simply more of them (as with wildebeest), and partly because of the disproportionate number of closely related species to choose from (e.g. all those cat-like and bovid animals).

The one weird outlier is baboons. They have relatively few classifications overall, and only one other closely related species available. But look at the mixed bag of animals they are confused with!

A very interesting analysis, and (I suppose) the reason why this is a perfect project for Galaxyzoo.

Most of the mix-ups seem to be between similiar appearing species (e.g., gazelles/impalas, buffalo/wildebeest). That is understandable. Not meant as a criticism, but the tutorial could have been a bit clearer on some of these look-alikes (otherwise known as the “I wish I knew then what I know now syndrome).

Do some people WANT to see some particularly favorite animal? How else would a dik-dik (cute-cute) be confused with so many other species?

As far as the “bird-other” catagory, it may be that it is birds flying in the background. Some people may flag these (I’m guilty), and some may not. There couldn’t be that many unflagged ox-peckers could there?

With the tutorial we were constantly torn between providing more information and simplicity. If a tutorial is too complicated, it can turn people off right from the start. But yes, that means some information ends up being left out.

The question of how wanting to see a particular animal affects people’s classifications is something I find really interesting. I’ve been thinking of a good way to get at the answer to this question, but it’s going to take some time. I’ll hopefully blog about it in a month or two.

Good thought about the Other Birds. I think you may be correct about people either classifying or ignoring far away flying ones. Definitely something to explore further…