How we confirmed that FUT seasons uses ELO matchmaking

Is FUT seasons matchmaking random or does it attempt to find an opponent with a similar track record? We decided to find out.

Before leaving his position at EA Chu Boi, EA sports’ former head of communication, confirmed that FUT Seasons uses ELO matchmaking in a Reddit AMA:

“A while ago we did show skill ratings (which incorporates ELO) in matchmaking for FIFA. However, we found that that people were using this for the wrong reasons. For example: many would deliberately lose in single H2H matches in order to bring their skill rating down. Then they would then use that lower skill rating to matchup against weaker opponents in order to progress in FIFA Seasons. Therefore, we decided it best to not have it displayed anymore.”

Outside EA, little is known about what exactly ELO rankings (or rightfully Elo after it’s inventor, Arpad Elo) are used for in FIFA. A qualified guess based on Chu Boi’s response would be that ELO rankings are used for creating more even matches. After all, very few people enjoy playing a match that they either cannot win or cannot lose.

The purpose of this article is to test the hypothesis that players with equal skills are more likely to be matched up than players with different skills than they would be if matchmaking was completely random. We will test this hypothesis using data from the web app’s game data section, which provides access to match statistics from the latest 10 matches for all players playing FUT seasons. Additionally, you can access the performance stats for all players, meaning that you have access to data, which all other things equal describes the skill level of a player.

If our hypothesis is correct, we should expect to see a bigger percentage of matches, where the involved players are even in terms of performance stats, than would be the case if matchmaking was randomized.

Determining the skill level of a player

A precondition for testing whether two players are evenly matched is the ability to measure the skill level of each of them.

When available, the web app’s game data section allows everyone to access stats like Win/Loss ratio, Goal difference per match and Best completed division, i.e. stats which ought to reflect a player’s skill level.

Using the above mentioned stats, we have created a skill ranking for 3,200 players in total. Each player gets a ranking expressed as a number between 0 and 10, where 10 means that the player is the best player relative to the rest of the sample.

To what extent does our skill ranking reflect the skill level of a player? To test the validity of our skill ranking system, we attempted to predict the outcome of all the matches in our sample using the ranking gap between the players involved. We then compared the predicted outcome to the actual outcome.

The chart below illustrates the prediction accuracy as a product of the ranking gap between the players involved. The blue and yellow curves show the percentage of correct guesses (blue) and wrong guesses (yellow), meaning that the wrong player won the match. Draws are not considered.

As one would expect, our prediction accuracy increases as the ranking gap get’s bigger.

Does FIFA attempt to match people with equal opponents?

The test of the ranking system’s ability to predict match outcomes confirms that our home-made player rank is a valid measure of a player’s skill level. Consequentially, the ranking gap between two players involved in a match can be considered a measure of the degree of equality in that match.

If our hypothesis about ELO matchmaking is correct, we should expect to see that ranking gaps are smaller on average than would be the case if players were matched randomly.

To test this hypothesis, we divided the actual sample consisting of 2.180 matches into 0.5 ranking point-wide intervals and counted the number of matches in each interval (green curve below). Next, we created 2.180 random match-ups within the same set of players and divided these match-ups into the same intervals (purple curve below) while counting them.

Then, all that is left is to compare the two curves:

Actual versus expected ranking differences

It is clearly visible to the naked eye that the two curves are separate, and that the green curve (actual ranking gap distribution) contains a bigger share of equal matches than the curve based on random matchmaking. The difference between the curves is statistically significant (p = ~0.00 at a significance level of 95 %) , meaning that this difference isn’t coincidental.

Implications

FIFA’s matchmaking algorithm does increase the number of matches, where both players have a fair chance of winning, considerably. As an example, let’s take matches where the ranking gap is maximum 2.5 points. At a ranking gap of maximum 2.5 points, the lesser player still has minimum 28 % chance of winning. Even though his opponent’s chance of winning is nearly twice as high, a 28 % winning chance is considerably more than Barcelona’s average La Liga opponents had this season. FIFA’s matchmaking algorithm increases the share of matches, where the ranking gap is below 2.5 points, from 5 in 10 to 8 in 10 matches.

The chart below is another way to express our findings. Our match history sample contains the “10 latest matches” history of 220 players (we call them our base-players). In the chart below, we plotted every base-player, having the average rankings of his 10 latest opponents as Y-coordinate and the base player’s own ranking as X-coordinate.

Average ranking of the opponents of 2 * 110 base-players.

The blue and orange dots came from two different samples. The blue sample was collected from the “top of the pile”, meaning that we seeded our collection algorithm with a top player and collected match data about his opponents, their opponents, their opponent’s opponents and so forth. The orange sample was created by seeding the collection algorithm with a random player name.

It is clearly visible that the average ranking of a base player’s opponents depends on his own ranking. Roughly speaking, the higher the base player’s ranking, the better opponents he gets. Again, the result is statistically significant (p = ~0.00 with a regression coefficient of .64 +/- 0.05 at a significance level of 95 %).

A notable observation when looking at the chart is that there are exceptions to the general rule that your opponents’ average skill level equals your skill level. The dots in our chart do not form a straight line intercepting the Y-axis at 0 with a regression coefficient of .50. Rather, we see the lesser base-players face some tough opposition, whereas the better players usually get opponents slightly below their own level. The reason for that is quite obvious:

If you are close to the absolute top, the chance of picking a better opponent than yourself is smaller, because there are fewer to choose from above you than below you. And of course vice versa.

Another thing worth mentioning is that this chart delivers some insight into why FUT feels quite inconsistent: Although the dots in the chart above are average opponent rankings, the fluctuation is clearly visible in the fact that base-players with similar rankings face different levels of opposition on average during that 10 match run. Hence, it is quite clear that even though your average opponent may be equal to you, you should still expect to see a considerable degree of variance in the level difficulty. A likely explanation to this variation is that FUT’s matchmaking algorithm needs to operate with a certain level of tolerance in order to ensure timely match making. Hence, the matchmaking still has elements of randomness despite the aspiration to create equal matches.