Sunday, June 24, 2012

On June 20th none of UW Delver cards have been banned. This was a horrible decision, but I'm not going to say one word about specifics of UW Delver in this post, since that has been discussed ad nauseam in other places. The entire post is about DCI's reasoning for lack of bans, and how it shows total lack of understanding of basic statistics and game theory. Let me quote the decision:

The DCI looked at the results of competitive Standard events. We found that while a high percentage of the participants played White-Blue Delver decks, that the win rate of those decks was very close to par. For instance, in a recent MTGO PTQ, the win rate of White-Blue Delver decks against non-Delver decks was a bit under 51%. In general there are decks that the Delver deck is strong against, and decks that it is weak against, but on average the deck tends to get results close to average.

Additionally the number of people playing high level Standard events is the highest ever.
Looking at the Magic 2013 card set, it appears that there may be more tools for other decks than for the White-Blue Delver deck, though time will tell if this bears out. The DCI will continue to observe how this plays out, but is taking no action.

I know nothing of tournament attendance, and as for new M13 cards I'll just quote this as a cautionary tale:

We tried to enable a few specific anti-Jace weapons in Mirrodin Besieged with Phyrexian Revoker, Hero of Oxid Ridge, and Thrun, the Last Troll, but the metagame solved those cards pretty quickly with Squadron Hawks and Swords.

New Phyrexia brought with it Despise and Hex Parasite, but those cards just aren't powerful or versatile enough.

Anyway, the biggest argument is "on average the deck tends to get results close to average". I'll show you in simple mathematical terms why this is not only wrong, it's completely backwards.

Simplifying assumptions

None of these assumptions are essential to the argument, they just make the math simpler. If you have time, you can redo the proof in a more complicated variant.

Let's assume that each player plays to win (more on that later), knows meta, players playing each archetype are equally good on average (this doesn't preclude some players being better, as long as they're either few of such superstars or they're not particularly attached to any archetypes, or both), and has no budget constraints (that's a reasonable assumption in Standard, where competitive deck prices are similar, less so in Vintage).

Let's assume there are N possibly tournament-viable archetypes in a format, and disregard issues like minor variants within each archetype, sideboarding strategies etc.

For each pair of such archetypen I and J, p(I,J) is probability that first player (playing I) wins the match. We don't need pre-sideboard and post-sideboard win probabilities, since if α and β are pre and post sideboard probabilities respectively, then:

p = αβ + α(1 - β)β + (1 - α)ββ

And by similar reasoning all kinds of on-play vs on-draw chances can be folded into a single number. There will be slight inaccuracy if some games are best-of-3 and others are best-of-5, if some games have unusually high chance of being drawn unintentionally due to timeouts (like control mirrors vs aggro mirrors) and if people decide to do intentional draw or not depending on what their opponent is playing, but we'll just fold as much of these as we can into a single "match win %" number, and won't be too concerned with that cannot be simulated this way.

How players decide what to play

There are many ways to simulate this, and they all lead to very similar outcomes. Since we assumed N archetypes, let's start with each of them having 1/N of the field. To make all these long lists of numbers look less dreadful I'll call the decks by randomly assigned 2 and 3 color combinations (but they're actually just labels).

Then people are paired randomly with another player, and each match loss makes a player change their deck with some low probability to a deck randomly chosen from current meta (so if 20% of people are playing archetype X, a player which wants to change their deck will pick X with 20% probability; if they already play X they have 20% chance of picking another decklist within the same archetype for simplicity).

Now this isn't particularly realistic - someone who lost 5/5 games is more than 5x more likely to change their deck than someone who lost just 1/5 on mana screw - but almost any such procedure for evolving meta will lead to very similar outcomes.
Got it so far? Now let's create a fresh format with 20 archetypes, generate random matrix of match probabilities within 25%..75% range (except mirror match is always 50% by definition), set decklist-change-after-loss to 1% and simulate some rounds.
What will be archetypes' meta shares and their average win percentages? Let's see one such random meta!

Random Fair Meta

It's time to run some simulations.
At the beginning meta looks really balanced, no deck has particularly high or particularly low chance against the field (it is honestly just a coincidence that UW got on top):

UW - 5.0 (55.4)

Bant - 5.0 (52.7)

WR - 5.0 (52.5)

GB - 5.0 (52.5)

WB - 5.0 (52.3)

UR - 5.0 (51.9)

RUG - 5.0 (51.1)

Grixis - 5.0 (50.8)

BUG - 5.0 (50.6)

Kaalia - 5.0 (50.4)

Jund - 5.0 (49.5)

Junk - 5.0 (48.9)

Naya - 5.0 (48.9)

Esper - 5.0 (48.6)

GU - 5.0 (48.2)

GR - 5.0 (48.1)

BR - 5.0 (47.4)

UB - 5.0 (47.2)

WUR - 5.0 (47.1)

WG - 5.0 (46.0)

After 2000 rounds we start to see meta forming, but no deck is particularly bad:

UW - 13.7 (54.1)

Bant - 8.6 (52.5)

WB - 7.2 (50.6)

Grixis - 6.9 (52.7)

GB - 6.6 (50.4)

WR - 6.3 (49.5)

UR - 5.5 (48.8)

RUG - 4.9 (48.4)

Esper - 4.5 (50.4)

Kaalia - 4.3 (48.5)

Jund - 4.1 (49.0)

GU - 3.7 (49.3)

BUG - 3.4 (45.5)

Junk - 3.3 (47.2)

Naya - 3.3 (46.9)

UB - 3.1 (48.5)

GR - 3.1 (47.0)

BR - 3.0 (47.6)

WUR - 2.8 (47.4)

WG - 1.8 (43.7)

4000 rounds, UW is top deck, but Grixis has highest win percentage, so it goes up:

UW - 21.1 (50.0)

Grixis - 14.4 (54.0)

Bant - 11.5 (50.0)

GB - 7.0 (50.7)

WB - 6.1 (48.5)

Esper - 5.0 (49.9)

WR - 4.4 (47.8)

GU - 4.0 (51.4)

Jund - 3.9 (50.7)

Kaalia - 3.6 (50.5)

UR - 3.2 (46.1)

UB - 3.1 (51.7)

RUG - 2.7 (45.7)

Junk - 1.9 (47.8)

BR - 1.9 (47.7)

WUR - 1.7 (46.8)

GR - 1.6 (46.9)

Naya - 1.6 (46.5)

BUG - 1.0 (42.7)

WG - 0.4 (41.4)

10000 rounds, a lot of decks are seeing no play, Grixis was even briefly top deck, but then falls down in popularity, a lot of rearrangement in top 5:

But did you notice something interesting about win chances (they're against whole field including mirrors)? At no point did any deck have a chance far higher than 52%, while worst decks were really low like <40% bad. Shouldn't average of win rates be exactly 50%?

Actually, no! Average win rate against random field (round 0) is 50% - but once meta establishes win chances are calculated against other good decks (so they're close to 50%), white bad deck changes are also calculated against good decks (so they're much worse than 50%). It doesn't matter how hard any deck crushes bad decks, since they're not in meta in any appreciable amounts, and so they're not counted.

Let me restate this:

Average win rates of decks in any meta are always lower than 50%, since better decks are played more than bad decks. It is difficult for any deck to have much higher win rate than 50%.

Random Unfair Metas

Now that was a reasonably healthy meta. What about metas which are totally unbalanced and unfair?
Let's simulate one.

Now let's pick one deck - let's say UW (for no particular reason, it just happened to be the first in my list of labels, honest...) - for which win probabilities are taken from 0.45..0.75 range, while all other decks have to use 0.25..0.75 range. That's right - this UW doesn't have particularly high chances against any deck, it simply never has particularly low chances against any deck.

This meta is about as degenerate as they ever get. It's 68.7% top deck, 30.8% deck designed to beat the top deck (and it barely does so), and 0.5% third deck.
And do you see? Top deck's win percentage is below 50%!

Degenerate meta simply showed in high meta share, and total absence of almost all possible decks (in fair simulation 12 of 20 archetypes had some play, now it's only 3 of 20). What typically shows in simulations is that first all decks that lose to top deck get down to 0%, then top deck establishes ridiculous meta shares (often >90%), then sometimes decks which are good against it get significant share.

This one is even worse - GB has decent matchup against UW, but since Esper is really brutal against GB (70% win rate), it makes GB totally nonviable, and UW has 91.8% meta share. So we have top deck (UW), meta deck (GB), and meta-meta deck (Esper).

Conclusions

I could keep rerunning these simulations, and there are some interesting and nonobvious points there, like having fewer bad matchup being more important than having any amazing matchups.
The big thing which should be obvious in retrospect is:

In any stable meta top deck's win percentage will always be close to 50%. If it was much different than that people would switch to the top deck or away from it. Degeneracy can only be seen in meta share %. It never shows in win percentages.

So if win percentage doesn't show which deck is the best, what does it show?

High average win percentage of certain decks means meta didn't adapt to these decks yet. Flat win percentages of top decks near 50% mean meta is stable, not that it's balanced.

That's right - statistics show that current Standard meta is both degenerate (Delver's very high % of the field), and stale (top decks very close to 50% win rate) without much chance of evolving.
I thought this should be obvious to anyone with a clue about statistics and game theory (to learn start here) - it's basic result from first chapter of any game theory textbook that results of all strategies will converge to the same average. But apparently from both DCI announcement and ensuing discussion it seems that very few people understand this and the awful argument gets repeated over and over again.

Now this post is concerned only with statistics. Maybe M13 will fix it. Maybe someone will come up with brilliant anti-Delver deck out of nowhere. Maybe people enjoy playing Delver mirrors so much they'll keep coming to tournaments regardless of how many Delver decks are played. It's all possible, and I don't really have any evidence for that. But arguments from win percentages are wrong, and mathematics is very clear about that. It's too late to fix it, but let's hope the next time DCI faces similarly degenerate meta they'll look at numbers that matter (meta share of top deck) not numbers that show something completely different (win rates of top deck).

7 comments:

what about the fact that something like 20 cards from the deck will be rotated out in Oct? no Ponder, no Mana leak, no Probe, no Seachromes, no vapor snags, no gut shots...granted im sure they're just use Unsummon for Vaporsnags... but what about the other cheap stuff.... M13 hasnt shown anything as good for Blue to replace them with... might have to see what RtR comes up with.... considering the All Star of Ravnica is Jace im sure Blue will get some nasty stuff.

Sheds a lot of light on the issue. I don't exactly agree with a part of the conclusion however "That's right - statistics show that current Standard meta is both degenerate (Delver's very high % of the field), and stale (top decks very close to 50% win rate) without much chance of evolving." The simulation you ran grouped decks into colors, which is fine for demonstration but it is misleading because all the color combinations are known and there is no room for a new "color" to suddenly pop into the simulation. If you had used something else to distinguish the decks it may not be so misleading, but the reality is that we are talking about decks, with strategies. New ideas and strategies pop up all the time and throw monkey wrenches into the metagame. I don't think the simulation you used predicts that we won't see much evolution. In fact there has been lots recently in my local area. Delver was once a 60+% deck here a few weeks ago, but since the SCG open it has dwindled to lower than 25% in favor of RG aggro, zombies, and pod lists. Delver has a tough time with those decks. Those decks are also great against the rest of the field.

So what you're saying is... That this is way too complicated for most people to understand, and that you could simply be manipulating these statistics in your favor (in the mind of someone who cannot understand this). Therefore, if most people do not see this as a viable resource, you have wasted a lot of time on something that you cannot change anyways. I hope you make at least $100,000 with your knowledge of statistics- it really is very impressive. You really cannot expect the average joe to follow this, you know. Just face it, dude; Delver is great but in all reality I think players will simply get tired of playing it and switch regardless of statisticisms. (That is not a word!) Even if it does stay where it is, what the hell... if you can't beat 'em, join em. Thats what I did so that I could better understand the deck and figure out a strategy to beat it from the inside out. That's besides the point, however. My simple advice is that although you seem to be very intelligent, it is not very bright to waste your time on something like this. But alas, my point is mute because... I read it.

Maybe the reason why delver is dominating because it's an archetype is popular with good players, and players who want to be good?

I used to play caw-go, and there were many decks that beat it, but there were also a shit ton of mirror matches. In all be told the deck's winning relied almost completely on drawing a good hand and hoping they had no answers. Just like every deck.

Your data is wrong because, quite frankly, it was never real. It holds no candle to real life. There are thousands of other factors you can't quantify for, and one of those is preference. People in major tournaments play what they believe the best deck to be. They have certain underlying assumptions about what the best deck is supposed to include. They have a predisposition that blue is the best color. Of course it is, right? It has mana leak.

Now, consider that every player at a PTQ, or other somewhat competitive to highly competitive tournament is either good or wants to be good, they're going to have the same mentality on what to play.

If 50% of players play the same thing, and consider it has a 51% victory chance like Wizards said, it's going to win.

If 5 people play 1 deck, and 5 people play each a different deck, then it's almost guaranteed that the 1 deck 5 are playing will win. It has the highest chance out of anything to win.

So, instead of the deck 'being the beat', it's more like everyone just wants to play the same deck. Removing it will just mean some other deck, probably a blue deck, will become the crowd favorite and win everything.

My software

Creative Commons

Unless otherwise expressly stated, all original material of whatever nature created by Tomasz Węgrzanowski and included in this blog, is licensed under a Creative Commons License. It is also licensed under GFDL (for Wikipedia compatibility).