Monday, February 10, 2014

How likely are you to get 12 wins in the Hearthstone Arena, given your skill level?

Blizzard recently released Hearthstone, a TCG-style video game similar to Magic the Gathering. Hearthstone has a play mode called the Arena where the player assembles a deck out of random cards and uses it to play against randomly-selected other players. The player plays games until they lose 3 times (or win a maximum of 12 games). The player is rewarded depending on the number of wins they get.

A few weeks ago on reddit, there was a post titled "How hard is the Arena? The answer, with Math (TM)", which showed how likely the different arena outcomes would be if each game was decided randomly. You should read the full post -- there are some great points about how unlikely 8+ wins is, and how outcomes that some players view as devastating, like 1-3, are actually common.

However, there is something important missing from that analysis -- some decks are better than others! Here, I'm going to extend on that analysis by incorporating the strength of each deck and the player's skill.

Let's represent each player’s power using a number between 0 and 1. A player’s power is the fraction of players that player is stronger than. The weakest player has power 0. The strongest player has power 1. The average (median) player has power 0.5, and so on. Note that this power incorporates both the player's skill and the power of their deck. Therefore whatever your skill level, you will have a different power level each time you run the arena.

We’re going to make three assumptions about the game:

1) In the arena, you always play someone with the same win-loss record as you. Blizzard has said they try to perform matchmaking to make this the case, although it is not true all the time.

2) The advantage a player has is proportional to the difference in their power values. The best player (power 1) has the same advantage over the average player (power 0.5) as the average player has over the worst player (power 0).

3) Prob[A beats B] = Logistic( X * (Pow(A) - Pow(B)) ) . The Logistic function is a function that converts a number into a probability value, where Logistic(0) = 0.5, and Logistic(Y) gets closer to 1 the larger Y is, and closer to 0 the more negative Y is (see plot below). (Note that Logistic is symmetric, so the Prob[A beats B] = (1 - Prob[B beats A]), as we would expect.

The value X determines how important power is to determining the course of the game. If we think hearthstone is totally luck-based (like the card game “War”), we would set X to 0, meaning that the outcome of every game is 50-50, regardless of the players’ skills.
If we think hearthstone is very skill-based (like Chess, say), we would set X to a large number, so that if A is even a slightly stronger player than B, A has a very high chance of winning. From my intuition, I think X=5 is a reasonable value -- the results below use this value. However, I computed all the arena outcome probabilities for values of X between 0 and 100. Here is a table of win probabilities given different power differences for X=5:

0

50.00%

0.01

51.25%

0.1

62.25%

0.25

77.73%

0.5

92.41%

0.75

97.70%

Given just these assumptions, we can compute exactly how likely each arena outcome is for a deck of a particular power. To do that, we start off with all the players at 0-0, with equal frequencies of players of all powers. (In my calculations, I group the players into 1000 bins). Then, for players of a given power, we compute the chance that player encounters a player of each other power and the player's likelihood to win against them. That gives us the fraction of the players at a particular power that will move to 1-0, and how many will move to 0-1. We repeat that process, calculating how many get to 2-0, 1-1 and 0-2, and so on, all the way up until 12-2.

From that calculation, we can see what the chance of different arena outcomes are for players of different power levels. First let's look at the outcomes of the average player (power 0.5):

1-3

16.69%

2-3

28.24%

3-3

27.17%

4-3

15.52%

5-3

5.63%

6-3

1.42%

7-3

0.28%

8-3

0.04%

9-3

0.01%

10-3

0.00%

11-3

0.00%

12-2

0.00%

12-1

0.00%

12-0

0.00%

As we can see, the average player gets between 2 and 4 wins. It's worth noting that, unlike the case where all games are decided randomly, the average player is very unlikely to get 0 wins, and it is virtually impossible for them to get 12 wins. This is because, as the player performs poorly (or well) on the first few games, they get paired with weaker (stronger, respectively) players, pushing their outcome closer to the average.

Now let's look the chance of outcomes for a strong player (power 0.9):

0-3

0.14%

1-3

0.96%

2-3

3.62%

3-3

8.83%

4-3

14.72%

5-3

17.92%

6-3

17.10%

7-3

13.62%

8-3

9.49%

9-3

6.00%

10-3

3.53%

11-3

1.96%

12-2

1.62%

12-1

0.42%

12-0

0.06%

The stronger player generally gets between 4 and 9 wins. However, even strong players rarely reach 12 wins. This is because virtually all of the decks at 8+ wins are also very strong.

It's worth noting that extreme outcomes (0-3 or 12 wins) are somewhat more common in the real game than they are according to this analysis because of the fact that you aren't always matched to someone with identical arena records. This probably doesn't make much difference for common records (like 1-1), but it could make a big difference for rare records like 10-0. In those cases, you're likely to be matched to a deck with a worse record than yours, and therefore have a higher chance of winning and going on to 12 wins. Common outcomes (e.g. 3-3) are (very) slightly less likely due to the same fact.

You can view all the results in this spreadsheet. The spreadsheet shows the full outcome probabilities for many different skill levels and what skill levels you're likely to encounter at different arena records, all for multiple different values of X:

6 comments:

First of all, excellent post! Very solid analysis. But I would expect no less of a PhD in CS with a Stanford background. =P

I feel a valid critique of this analysis is the following assumption:

"To do that, we start off with all the players at 0-0, with equal frequencies of players of all powers."

This assumes that each power level is equally likely among arena entrants. A more valid assumption for starting powers might be to sample from a Gaussian distribution with mean 0.5 (and variance small, or perhaps a log-Normal distribution to avoid <0 values). This would make the extreme power levels less likely, and make the middling power levels more likely. The effect that this would likely have on your simulations is to make the extreme results (0 or 1 wins or 10+ wins) more likely due to more disparate power levels between opponents for those with extreme starting power levels being more likely.

How do the results change (if at all) with a different prior distribution? You can use random.gauss(mu, sigma) to sample from a Gaussian.

Thanks for the comment! Actually, the equal frequencies of players is given by the fact that we represent a player's power by the fraction of players they're stronger than. On the other hand, there could be a problem with the assumption that the advantage is given by the different in players' powers. If power was Gaussian-distributed, then we would expect the top-most players to have a huge advantage over players just slightly below them. Likewise, we would expect the worse players to have a huge disadvantage over those just slightly better than them. We would need win statistics to see if this were the case, both of these cases go against my intuition.

Touché. The Gaussian prior might have those properties which may be less than realistic.

While using a uniform prior provides a good baseline, I think we could could get a more accurate picture. For intuition, I'm not sure a .999 power player has the same likelihood to beat a .9 player as say a .299 player has to beat a .2 player.

As you mentioned, the best way to get a hold of this kernel is with some real win statistics. But in their absence we must continue to theory-craft! I wonder how various differing Beta distributions for the prior would affect the final win distributions!?

Anyway, thanks for adding some rigor the the /r/Hearthstone discussions!

Thanks for the comment! The logistic function is a natural way to convert real numbers into probabilities, and it is used often in machine learning and statistics. You can read about it here:http://en.wikipedia.org/wiki/Logistic_function

The logistic function is the inverse of the log-odds function: log-odds(p) = log(p/(1-p)). If we imagine that the difference in score is proportional to the odds that the stronger player wins, we should use the logistic function to pick win rates.

Of course, this is ultimately an assumption that can't be verified, so it's possible that the win rates would be better modeled with some other function.