Fact check: Does bronze benching work?

According to Arlington69 – the author behind this recent Reddit post – using a bronze bench gives you an advantage. Based on a fairly big sample of his own matches, he concludes that bronze benching increases the chances of getting matched up against weaker opponents in terms of squad quality.

The conclusion would be somewhat plausible if related to online friendlies. But when I saw Weekend League mentioned in Arlingon69’s post, my skeptic alarm went off. It just seems counter intuitive that EA would reduce the advantage of improving your team, which would be the effective outcome of matching you up with similarly rated opponents. And last time I checked, they made money from selling packs.

Still, the results appeared convincing, so I decided to take a look at Arlington69’s analysis and the data supporting it.

A few words on bronze benching

Bronze benching is the not-so-noble art of pretending that your team is worse than it really is. Due to the way FIFA calculates the squad overall rating, adding low rated bronze players as subs will lower the overall team rating considerably. As seen below, a full TOTS team with a 91-rated starting 11 becomes 84-rated when adding the lowest rated bronze players as subs. Hence, if Arlington69 is correct, my centre team below is more likely to get matched up against the squad on the right than the much scarier squad on the left.

A notable detail here is that the question Arlington69 raises in part is answered already. For those of us seasoned enough to remember when FUT online friendlies was a thing, there is little doubt that bronze benching works (worked) in that game mode.

However, FUT online friendlies is nearly dead, and FUT’s game modes do not necessarily use the same squad making mechanisms. Therefore, it makes some sense to study the effects of bronze benching in regards to FUT seasons, Weekend League and Daily Knockout Tournaments.

Arlington69’s bronze benching study

Arlington69 recorded his and his opponent’s squad rating over the duration of 670 matches. His sample included FUT seasons, Weekend League and Daily Knockout Tournament matches.

Arlington69’s conclusions are largely based on the charts below, which are supported by a bit of explanatory text in the original Reddit post.

In the chart to the left, “Average Distribution of opponents rating compared to mine”, he plotted the difference in stat points between his squad and the opposing squad. He notes that 75 % of his opponents were within 2 stat points of his own squad.

The most interesting chart is however the chart on the right, titled “Distribution of opponents based on my rating”. In this chart, he breaks his sample into five graphs – one per squad rating level he has used. The visual representation does indeed seem to suggest that Arlington69 got lower rated opponents when he used his 82 rated squads (light blue) than when he used his 86 rated squads (dark blue).

Based on this visual analysis, Arlington69 concludes that the game will pick lower rated opponent squads if you use a lower rated squad yourself. Although he didn’t run any tests using a bronze bench, he concludes that bronze benching will get you easier opponents because it lowers your squad rating.

There is however a couple of problems with this conclusion. First and foremost, we should remember that the mere fact that two variables – [own rating] and [average opponent squad rating] – appear to be correlated doesn’t lead to the conclusion that a causal relationship exists between them.

Reliability and validity

A cornerstone principle in science is that research needs to be reliable and valid. Reliability means that sample(s) must be sufficiently large to ensure that a repetition of the study won’t lead to a different conclusion. Validity means that the produced results, if reliable, should be sufficient to either confirm – or rule out – that matchmaking takes squad rating into account.

I see problems in both respects. In regards to reliability, it’s a problem that no statistical tests were conducted. Especially when Arlington69 divides his sample into sub sections, statistical inaccuracy should be a concern. The biggest concern is however the validity of the applied method, as I will explain below.

The results fit with the opposite conclusion as well

In the complete population of FIFA players, few are stupid/masochistic/etc. enough to take a 65-rated squad into action while few will be wealthy enough to run a 95+ squad. Therefore, the huge majority of players use average squads.

And because of that, it’s likely that a survey of all squads put into action in online matches would reveal something similar to a normal distribution centered around said average. Under those circumstances, completely random matchmaking will lead to that a player using an average squad (like most of us) will see that his opponents being normally distributed around his own squad in terms of rating.

In other words, the chart on the left looks exactly as we would expect in a scenario where the game picks random opponents, meaning that average opponent squad rating isn’t dependent on your own squad rating. It is therefore not possible to conclude anything about whether opponent squad rating is dependent of own squad rating based on the chart on the left.

Time matters

Now, I’m of course aware that the colorful chart on the right. We see that Arlington69’s opponents were higher rated when he used an 86-rated squad than when he used his 82-rated squad. And normally, it would be fair to say that average can’t be both 82 and 86. Or can it?

The complicating factor here is that average grows with time. Over the duration of the FUT year, the average FIFA player improves his squad gradually as new special cards are released.

If we make the very likely assumption Arlington69 played his 670 matches over several months, the average squad would have improved considerably during that time span. And provided that Arlington69’s own squads improved too, we would expect to see that the average opposing squad was better when he used his 86-rated squad than when he used his 82-rated squad – simply because he used those squads at two different points in time.

A look at the raw data

The problem I raise here is not purely hypothetical. Arlington69 kindly provided access to his raw data, which allows me to test my suspicion directly.

The sample doesn’t contain the dates of the matches, but it happens to be divided into a pre patch and a post patch section. The patch in question is the kick off glitch patch (title update 6) released January 24th 2018. Therefore, we know for certain that all pre patch matches were played before January 24th whereas post patch matches were played after that date.

And when I compare the average ratings of squads used by Arlington69 and his opponents pre and post patch, I see exactly what I expected above:

Pre patch

Post patch

Own rating

83,0

84,5

Opponent rating

82,8

84,7

While Arlington69’s own average squad rating grew from 83.0 to 84.5, his average opponent squad grew from 82.8 to 84.7.

This observation doesn’t rule out that his own squad rating is causally connected with average opponent squad rating, meaning that bronze benching would work. But it does fit very well with the hypothesis that both variables grew because of a third variable, namely the general growth in squad ratings over time as improved items are released. After all, it would be quite strange if the very reasons that allowed Arlington69 to improve his squad weren’t present for his opponents. Given these circumstances, we obviously can’t conclude that bronze benching works based on the analysis carried out by Arlington69.

Does bronze benching work?

Arlington69’s analysis may not be waterproof, but his data set is systematic and large and luckily, his huge effort hasn’t been in vain. Indeed, it’s possible to come up with a solid conclusion in regards to the effects of bronze benching based on his data.

In the table below, I have inserted average opponent squad ratings for each rating level used by Arlington69 on both sides of the patch. I also included 95 % confidence intervals.

Pre patch

Post patch

Own squad

Avg. opp. squad

Matches

Avg. opp. squad

Matches

81

82.4 +/- 0.8

47

N/A

1

82

82.9 +/- 0.4

110

82.9 +/- 0.5

41

83

82.9 +/- 0.3

61

84.7 +/- 0.9

23

84

82.9 +/- 0.3

148

84.7 +/- 0.5

99

85

83.2 +/- 0.7

32

84.7 +/- 0.5

53

If we for a second ignore 82-rated matches, two inevitable conclusions stand out.

First, the data confirms my suspicion above: The average opponent squad rating increases over time. We see that the average opponent squad rating is the same – but bigger – post patch than pre patch for all rating levels.

Second, and most importantly, own squad rating and opponent squad average rating are not correlated. In human language, your average opponents don’t become easier if you lower the rating of your own squad. Bronze benching does not work!!

The exception – 82-rated squads – has a natural explanation: A closer look at the sample sheets reveals that 82-rated matches pre patch are found last in that pre patch data set, whereas 82-rated matches played after the patch are found first in the post patch dataset. Hence, it would appear that all matches played with 82-rated squads were played close to each other time-wise. Therefore, the average opponent rating didn’t differ between those two measurements.

Conclusion

Although Arlington69 ends up concluding that bronze benching works, a more thorough analysis of his data leads to the exact opposite conclusion: Bronze benching has no impact on what opponents you are matched up against in the game modes included in his sample.

The graphic representation of the entire sample below is perhaps the simplest way to illustrate why I arrive at that conclusion.

Especially when we look at the pre patch section, we see that Arlington69’s use of different squads had no impact on the ratings of the opposing squads. We also see that later in the year (after the patch on Jan 24th), there is a gradual improvement in the squads, but this applies both to Arlington69 and his opponents. And the most likely reason is the release of TOTS and other special items. This is perhaps what (mis)lead him to his conclusion, but aside from the fact that both variables grow, they clearly aren’t causally connected.

On a last note, I need to state that the sample mixes different game modes. Different game modes means different matchmaking methods. However, it is likely that we would have seen an effect if for example Weekend League used squad rating based matchmaking, even though Seasons most likely doesn’t. Thus, I consider it most likely that bronze benching doesn’t work in any of the game modes included here.