Correlation Analysis of Pokemon Usages

Here is a further statistic that I like to crank out from time to time. This time it regards looking for correlation in Pokemon usages.

What is correlation, though? Two Pokemon's usages are said to correlate if the usage increase or decrease of one of them affects the usage increase or decrease of the other. There are two types of correlation: direct correlation and inverse correlation.

Two Pokemon's usages are said to exhibit direct correlation if, when the usage of the first Pokemon increases, the usage of the second one increases as well by roughly the same margin, and vice-versa. Two Pokemon's usages are said to exhibit inverse correlation if, when the usage of the first Pokemon increases, the usage of the second one decreases by roughly the same margin, and vice-versa.

In statistics, there is an important measure of correlation called Pearson's product-moment correlation coefficient. I didn't use this method of correlation, however; I used a simpler method. What I did was simply to find the difference of (i.e. subtract) the increase or decrease of a Pokemon's usage in the last 6 months from the increase or decrease in usage of the other Pokemon. If this subtraction is quite small, the Pokemon would exhibit usage correlation.

I analysed the top 100 Pokemon used in the Standard metagame and the top 50 Pokemon used in the Uber metagame. The reason why the UU metagame was not accounted for is that we don't have enough statistics for its metagame yet to formulate a good correlation analysis out of it.

Could you show how strong or otherwise the correlations are? This could allow your system to detect more slight correlations without exaggerating them.

Very cool stats, its interesting to see that because of the massive amount of sample data you can see correlations between some rare Pokemon than would only actually encounter each other in a small % of matches (like Umbreon and Zam).

It also seems that a lot of the Pokemon with noticeable correlations are leads, could you possibly do a lead correlation analysis?

About axis labels: the vertical axis is percentage usage, while the horizontal axis is:

1 - Nov, 2 - Dec, 3 - Jan, 4 - Feb, 5 - Mar, 6 - Apr

The correlation formula indicates how much they correlate as a number. The nearer the number is to zero, the better the correlation is. I decided not to show any numbers so that I don't confuse the new Smogon user. I could provide the formulae though.

I might do a lead correlation analysis, yes - however, not in the near future. It didn't take me long to implement this (on Excel), but it did take me a good deal of time to perfect the formulae.

Then how about a "simple" table for the less math inclined (what you have done so far), as well as one with the correlation displayed as a number for those who want more in depth info (and more sensitive to slight correlations)?

eric, I could just as well post the 25,000 numbers (that's twenty five thousand) that signify the correlation of each Pokemon with each other. There will basically be two 100x100 tables for OU, and two 50x50 tables for Ubers. I don't know if people _really_ want to browse through a 100x100 table of numbers, which is why I didn't post it (and why Excel is excellent for doing this).

I guess that if people want to see it, I'd just upload the Excel sheet for people to download if they feel like. I'll do that after I return home from work, though.

mm, that would be over the top (though I suppose some people may like it) but a more sensitive readout than given in the OP would be nice. I mean, looking at the top 50 Ubers a vast majority of them are:
Direct: None
Inverse: None
And the same can be said for a good portion of OU, when probably most of them have some weaker correlation with at least a few Pokemon that has not shown up. Making the tables less sensitive could mean that statistical noise creates some odd results, but so long as you can see how strong the correlation is you can judge for yourself which to pay the most attention to.

The problem with just having the entire table could be that most Pokemon have a level of correlation with each other that is not separable from the statistical noise. The ideal situation would be to have a table of all correlations that you can set a "threshold" for notable correlation yourself, and it shows you all the correlations stronger than that value. But having you pick a single (more lenient) value and including the strength of correlation would be awesome.

Also, why so few comments? This seems like a pretty interesting set of data already for OU/Ubers players.

I find it strange that none of these are exhibiting the predator-prey relationships that occur in real-life ecology. If you have the time, X-Act, maybe you could account for some time delay for some of the obvious ones?

The statistics are interesting, but they make me want to form possibly nonexistent reasons for the correlations. The only reason I can right now come up with for the inverse correlations without time delay is that two Pokémon fill the same niche in the metagame, and that explanation only works for a few, e.g. Jolteon and Zapdos, Weezing and Umbreon, Electrode/Aerodactyl/Crobat. And even those only very roughly fill the same niche.

Moderator

It's all very interesting, although I think that - as every statistic should remember^^ - correlation is not causation. For example, there's little to no concept link (counter/teampartner and so on) between Blaziken and Rampardos IMO

Even though correlation is not necessarily causation, a true correlation always has some explanation (at least in the vast majority of cases). It might be the predator-prey relationship I discussed in my previous post, which is a direct causation, or it may just be a third, unknown Pokémon or set of Pokémon causing the correlation. The confounding factor - the third Pokémon - could also explain correlations between two Pokémon that fill the same niche.

X-Act, you say that you did not provide a PMCC value. Obviously I do not want to pile more onto your plate, would it be too hard to generate a list of "correlation partners" for each Pokemon? If we could just find the |PMCC| values of each Pokemon with respect to the top 30 or 40 Pokemon in each tier, and then order them largest to smallest (obviously we would want to display the parity of each PMCC value in the list), this would give us an easier way to quantify how well each Pokemon correlates with another, and I believe would be very useful for analysis writing and suspect nominations. From what I remember, PMCC is just a function of three "varience type" calculations, but I do not know how complicated it would be to write a script that could calculate these and order them for you. It might be worth looking into, it would at the very least, mean that we wouldn't necessarily need this arbitary threshold value, and the PMCC is easy to interpret (being simply a value where |x|<1)

First of all, PMCC is not simpler to interpret than my method. For PMCC, the nearer the number is to 1 or -1, the stronger the correlation, but where are you going to take the cut-off? Is it +/-0.9? +/-0.8?

Secondly, I did apply PMCC to the Pokemon usages, and found that the correlation generated wasn't as good as I wanted it to be. Basically, my idea of perfect correlation is: if a Pokemon increased by p% usage from one month to the next, the other Pokemon usage should increase or decrease by that same amount. I found that PMCC doesn't cater for this enough, so I discarded it.