While I'm sure you've probably posted explanations before Antar, I feel that we could use some explanation of what the the "real" vs. "raw" percentages and the like mean, and an explanation of why they're included and the like in the OP of the topic.

I know this has almost certainly been addressed before, but it's somewhat difficult to track down exactly where this was explained. Hell, if you can find a post where it was explained satisfactorily you could just link to it in the OP and call it a day. I feel this would help a lot of people (including myself :P) to better understand what all the numbers mean and how to interpret them, and also stop some people from thinking that the fact that the columns don't all match up means there are errors in the data collection.

"Real": Only counts the Pokemon which actually appear in battle (Doubles not supported)

The reason for the name "real" is historic--back when I first took over the stats and then the running of PO, only the Pokemon that appeared in battle were recorded in the logs, so there was no way to actually *get* the full team stats. When I modified PO to generate logs with full team info in them, we were left with a decision regarding which stats to use, and the argument was that counting only Pokemon appearing in battle was somewhat more legit, because that corresponded to actual, or "real" usage (that argument lost out in the end).

Filtering out troll alts, etc. is done at the normal weighting level, FYI (keep in mind--we're using Glicko score, not Elo, where your rating *can* drop below the starting rating).

I'll decide whether to do 1 std. dev or two some time before March 1, once I look at the distribution of ratings on PS.

Click to expand...

Thank you for the weighting link as I did not bother to read it before, and it was rather straightforward. That explains so much about the weighing of a Pokemon's usage in tiering, but this seems to depend on what is the "average player" and that trolls who perform worse in their battles relative to the average player are given fewer weight than more skilled players.

A standard score of 1 is a standard score 1. But what I meant was that something based on one's relative ranking depends on the composition of people in that sample. The presence of trolls certainly do affect the mean skill of the ladder by depressing it (and skill is not something with a hard quantity that possesses a true zero and can be easily measured, like height, but something that can be normalized). For the SAT, there are fewer "trolls" who take it just to score in the 200s on the subtests for the obvious reasons, and it possesses a natural mechanism to exclude "trolls" without any statistical filters. Pokemon battling does not have a strong disincentive to discourage trolling, and trolls can collectively influence the definition of "average".

How does the system prevent trolls from influencing the definition of the average player? Although I doubt trolls have much to do with the highly kurtotic distributions at the high end on the old system.

Edit: I do not think this post belongs here since it does not concern metagame trends or usage. But I still believe it is a legitimate question.

Official Data Miner

Edit: I do not think this post belongs here since it does not concern metagame trends or usage. But I still believe it is a legitimate question.

Click to expand...

Eh, it's fine.

Regarding trolls, it's true that much of what we've built is based on the assumption that players like to win, but I haven't noticed the rating distribution being at all lopsided. I've actually been pleasantly surprised by how Gaussian everything looks.

This is the distribution of Elo* ratings for the OU ladder for most of January, IIRC. It's borked around 1000 because most alts on PS are only associated with 1 or 2 games (I think 80% of all alts only play one game). But other than that, it's pretty good. The right tail is a little heavier, which makes sense, since players are more likely to "reset" a low alt than a high one, but even that's not totally skewed.

So bottom line: I get that trolls *could* be a problem, but given that we don't see any peaks on this distribution around extremely low ratings, I think we're okay.

*This is not the Elo rating currently deployed on PS. This is Elo calculated strictly using the standard Elo formula, with a K factor of 50, no modifications, no hacks, no nothing.

MoxieInfinite, a U-Turn, Volt Switch or Baton Pass counts as a switch-out unless it delivers a KO, in which case it counts as a "U-Turn KO" and isn't counted towards Checks & Counters.

Click to expand...

Does this explain the the counterintuitive data that Genesect doesn't "check" or "counter" anything in OU, except Pinsir? Obvious Latios seems to be checked by it, since (Scarf) Genesect has two moves it can choose from to KO it.

I was wondering, how is the data for spreads (nature and EVs) stored in the raw data? I ask because for quite a few Pokemon, they have a diversity of spreads, so listing only the most common ones winds up with 50% or more of the spreads listed under "other". Obviously you can't display every spread, but would it be possible to display, say, the top 2-4 most common natures (with percentages) separately from the EVs, so that you can at least get an idea of which natures are most popular (and by what margin)?

I was wondering, how is the data for spreads (nature and EVs) stored in the raw data? I ask because for quite a few Pokemon, they have a diversity of spreads, so listing only the most common ones winds up with 50% or more of the spreads listed under "other". Obviously you can't display every spread, but would it be possible to display, say, the top 2-4 most common natures (with percentages) separately from the EVs, so that you can at least get an idea of which natures are most popular (and by what margin)?

Click to expand...

Well, I think the most important stat people are interested in are the amount of speed investment a Pokemon has received.

It would be helpful if there was some data that shows how a given Pokemon's speed is distributed among the players. People may use spreads to speed creep other Pokemon that try to speed creep it.

One, for instance, may run a Landorus-T that creeps 44 Speed Rotom-W (just 8 EVs needed) (and 44 Speed Rotom-W doesn't show up in the stats). And some Rotom-W might try to creep that before it U-Turns out by putting 8 extra EVs. I do not think this type of spreads would show up in the usage statistics.

It also makes you wonder how many let's say, (Mega) Scizor, are trying to outspeed minimum speed Heatran (and/or Rotom-W before it is burned) to hit it with Superpower. It seems to be a worthy investment if one wants to get through Heatran (and/or Rotom-W), or lose momentum by manually switching out. This doesn't seem to show up on the statistics either.

Well, I think the most important stat people are interested in are the amount of speed investment a Pokemon has received.

It would be helpful if there was some data that shows how a given Pokemon's speed is distributed among the players. People may use spreads to speed creep other Pokemon that try to speed creep it.

One, for instance, may run a Landorus-T that creeps 44 Speed Rotom (just 8 EVs needed) (and that doesn't show up in the stats). And some Rotom-W might try to creep that before it U-Turns out by putting 8 extra EVs. I do not think this type of spreads would show it in the usage statistics.

Click to expand...

You are correct that it wouldn't show Speed investment; I will admit I wasn't thinking of that at the time. My ulterior motive is that I breed Pokemon in-game, so it helps to know which natures to give them (since natures can't be adjusted after hatching, while EV spreads can be set later). Obviously the situation is different for people using simulators. I'm not sure how you would want to set up the display of spreads if you wanted to focus on Speed, especially since EVs are much more finely adjustable. Trying to ensure you displayed the majority of levels of investment seems like it might require a lot more slots.

I was wondering, how is the data for spreads (nature and EVs) stored in the raw data? I ask because for quite a few Pokemon, they have a diversity of spreads, so listing only the most common ones winds up with 50% or more of the spreads listed under "other". Obviously you can't display every spread, but would it be possible to display, say, the top 2-4 most common natures (with percentages) separately from the EVs, so that you can at least get an idea of which natures are most popular (and by what margin)?

Click to expand...

As far as I know, the json data has ALL the spreads used if the player meets the 1500 cutoff. This leads to common pokemon like Rotom-Wash having about 10000 different spreads listed with many of them having a count under 5. If you wanted to, you could parse this data to get a percentage of the natures used.

Official Data Miner

ArcFurnace -- yes, the data for spreads is collected by counting the occurrence of each and every individual spread. migetno1, the 1500 cutoff is not a "hard" cutoff. See my Weighting FAQ for more details.

One note: if the Pokemon's spread contains useless EVs (255 EVs in one stat, improperly optimized LC spreads), my scripts round that down and bin it with the equivalent spread that contains no useless EVs.

Calm_Mind_Latias -- it's at the top of my "to-do" list to start generating "speed tier" info from usage data (throwing in Choice Scarf and speed-boosting moves as well). But I just haven't had the time recently.

Working on analyzing the raw data myself to get the information I want. It's actually going pretty well (hooray for Python), but I have a question about the data format- in the raw data, each unique spread for a Pokemon is paired with a number. What exactly does that number represent? I was assuming it was something along the lines of "number of times this spread appeared", but there has to be something else adjusting it, since it's not necessarily an integer and if you add them all up it doesn't add up to the 'Raw count' variable for that Pokemon. Is it being adjusted by the weighting function intended to reduce the impact of bad players on the stats?

You'll need Python installed (this was created in Python 3.3). Save the code as a .py file, put it in a folder with the .json file you want to analyze, and run it from a command line window. No error handling, though, so make sure you spell things right.

So I'm starting to look a bit deeper into the Checks and Counters data courtesy of the json files and I did borrow some of Antar's source code. The first thing my limited python knowledge managed to find me was what I'm calling "Average Opposing Success Rate". What this means, to oversimplify a bit, is what is the chance something good happens to the opponent when we have Pokemon X out. It's probably a bit easier to explain with actual numbers, so I'll get that up right below. These are taken from the top 100 most used Pokemon in OU last month.

AOSR(Move your mouse to reveal the content)AOSR (open)AOSR (close)

Pinsir: 32.6606536624097

Mawile: 32.6844481742917

Manaphy: 34.3630632235175

Heracross: 34.4424965021286

Charizard: 35.0592024325037

Medicham: 35.3023614950572

Volcarona: 35.4067400910503

Lucario: 35.4455291037521

Gyarados: 35.5116863785033

Conkeldurr: 36.516907127894

Dragonite: 36.5545320428452

Aegislash: 36.5681975927158

Kyurem-Black: 37.8309717680315

Bisharp: 38.3649220193829

Kingdra: 38.531349203473

Venusaur: 38.5553733987965

Clefable: 38.8988327548322

Garchomp: 39.1048053132335

Breloom: 39.2098243163858

Cloyster: 39.3413409294272

Talonflame: 39.3922228568415

Gardevoir: 39.6711615064359

Alakazam: 39.9964141451897

Gengar: 40.1997603794041

Reuniclus: 40.2864221128222

Crawdaunt: 40.6154542915553

Salamence: 40.6454368190726

Azumarill: 40.6689531224642

Haxorus: 40.7513280330215

Scizor: 41.2739257364425

Keldeo: 41.4610637679135

Greninja: 41.7516802360431

Weavile: 42.5671333255041

Landorus: 42.8043943463543

Slowbro: 42.8043943463543

Togekiss: 43.5658724202397

Porygon2: 43.9430431099078

Arcanine: 44.1548236761597

Sableye: 44.3135961436258

Gliscor: 44.3913692085438

Metagross: 44.450457754148

Terrakion: 44.4600453571302

Blastoise: 44.5511315724735

Latios: 44.7306280229763

Mamoswine: 44.930485065953

Hydreigon: 44.9459549417092

Chandelure: 45.0101588925653

Nidoking: 45.0129339855979

Ferrothorn: 45.4970226366418

Genesect: 45.6367708095554

Sylveon: 45.7302719438663

Infernape: 45.8951290756273

Diggersby: 46.3248129428848

Absol: 46.3320133131536

Umbreon: 46.3743233324173

Excadrill: 46.3759120080044

Darmanitan: 46.4111232917662

Heatran: 46.5177171463897

Goodra: 46.6600627629875

Zapdos: 46.6995220785531

Thundurus: 46.9785487566663

Thundurus-Therian: 47.4919887437474

Trevenant: 47.498523525822

Florges: 47.9143122421163

Noivern: 47.9300544465411

Quagsire: 48.0237927022329

Whimsicott: 48.0444210975334

Starmie: 48.1152167905302

Aggron: 48.1960393308203

Ditto: 48.3003550165925

Latias: 48.3643931242995

Tyranitar: 48.366004291485

Manectric: 48.3852945812972

Jirachi: 48.6828542642543

Vaporeon: 49.0304566322915

Klefki: 49.2993117538896

Espeon: 49.3014704588863

Mandibuzz: 49.3193762553036

Gastrodon: 49.6502828137244

Jellicent: 49.7605682005731

Ambipom: 49.7643480320047

Chansey: 50.6178222346396

Skarmory: 50.9443778369248

Magnezone: 51.5004764249148

Celebi: 51.807078744494

Ninetales: 52.0344370841584

Blissey: 52.1084252145068

Jolteon: 52.1130142076991

Crobat: 53.1164262429457

Donphan: 53.3124666216968

Landorus-Therian: 53.3148566332438

Tentacruel: 54.225819632441

Rotom-Wash: 54.3354877933639

Politoed: 55.0057811770138

Deoxys-Speed: 55.543091182007

Deoxys-Defense: 57.0743333619611

Galvantula: 57.5611122966849

Scolipede: 58.5615741449645

Forretress: 61.3480515649239

Smeargle: 63.4775826385627

Let's start at the top of the list with Pinsir, with an AOSR of about 32.66. That means that once he got out on the field, the opponent only gained an advantage (Pinsir switched or was KOed) 32.66% of the time. Compare that with Smeargle, who gave the opponent an advantage nearly twice as often.

As for my first impressions, there are a lot of Megas high on the list. Half of the Pokemon with scores below 40 had a Mega Evolution available to them. If Gamefreak wanted these guys to be the powerhouses of their teams, they certainly succeeded. You may notice Rotom-W and Landorus-T being near the bottom of this list, which is strange for the super-standard bulky momentum core they are. My theory is that this is due to Antar's list counting U-Turn and Volt Switch as a switch out. Still, in theory this would only affect the number of positive outcomes for the U-Turner/Volt Switcher, since the opponent likely checks them if they stay in to tank the moves anyway.

Some caveats/other observations about this data:

These numbers were taken from the Checks and Counters data, which coincidentally looked at what happened when a Pokemon was switched out or KOed. Thus, Pokemon who are often used as suicide leads, like Smeargle, Galvantula, and the Deoxys formes, will have innately worse scores than the rest, regardless of their ability to support the team. If you don't understand anything I'm saying, peruse through here. Antar explains the situations that make up this data quite well.

Generally, stallier Pokemon have worse scores. I would guess it is due to the offensive nature of the meta putting heavy pressure on stall teams, although Venusaur, who seems to be the best wall right now, has a good score for a defensive poke.

Chansey is currently performing about 2% better than Blissey.

I'm only calling it AOSR because it was the first thing that came to mind and I didn't want to keep writing "Average Opposing Success Rate" a bunch of times, so if anyone can think of something shorter/leads to a better acronym, I'll implement that as well.

I'm eventually trying to build up the Python knowledge to have a weighted average success rate for the Pokemon itself rather than its counters. You can approximate the unweighted version by subtracting these numbers from 100, but it will actually be slightly less due to double downs/double switches making up a small amount of these scores.

Most importantly, and perhaps the biggest problem, is that these stats don't weight by usage, they just need to pass the minimum encounters to get counted equally. I chose this because CRE has generally favored Pokemon lower in usage, simply because the higher deviation of their counters means the CREs of their counters will be lower. In addition, while the crap pokes lose more due to high deviation, they don't actually weight much at all. If anything, they'll attribute more to pokes who aren't matched up a lot. I'll apply the weighting when/if I get good enough at Python to do so.

Official Data Miner

These numbers were taken from the Checks and Counters data, which coincidentally looked at what happened when a Pokemon was switched out or KOed. Thus, Pokemon who are often used as suicide leads, like Smeargle, Galvantula, and the Deoxys formes, will have innately worse scores than the rest, regardless of their ability to support the team. If you don't understand anything I'm saying, peruse through here. Antar explains the situations that make up this data quite well.

Click to expand...

I can give you the "Encounter Matrix" if you want it. That's the comprehensive table of what happens when X faces off with Y and should yield better results.