Fact check: Does the keeper make more saves against higher rated opponents?

In a recent post, Arlington69 allegedly presents evidence showing that the keeper makes more saves when matched up against a higher rated squad. The evidence would be consistent with the handicapping theory if it holds water. But do the numbers really add up? We did the crunching.

The claim

The basis of the claim in dispute is a statistic presented in this post, which forms part of his extensive series of momentum-related posts. In this specific posts, Arlington69 presents a statistical comparison of the keeper’s save ratio against opposing squads rated respectively lower and higher than his own squad.

The statistic can be seen below in part. Pay attention to the column in bold, which allegedly shows that the keeper (with some exceptions) makes more saves when the opposing team is rated +4 and when it’s rated -3.

Screendump of part of the data presented in the original post.

In his post “Why I believe in momentum (SHM)”, which summarizes all his earlier posts on momentum and handicapping, Arlington69 makes the following statement with reference to the statistic above:

“When I played a team with a rating higher than mine their keeper saved 50% of my shots on target whilst when I played a team of lower rated than mine their keeper saved 60% of my shots on target. You would expect player with higher stats to find it easier to score goals than lower rated players. If handicapping exists the result is what you would expect to see.”
(– Post on Reddit)

Although Arlington69 doesn’t use the term correlation directly, the essence of his claim appears to be that the lower (more negative) the rating difference, the lower the save percentage and vice versa. It would indeed be a decisive discovery in regards to handicapping if this turned out to be true

But does Arlington69 really have evidence to support such a drastic claim?

Criticism

Although we could point to many other critical method problems here, we took the fast track and went directly for the sample size.

Needless to say, it takes a considerable number of observations to conclude with a reasonable degree of confidence that something is correlated with something else. Very often, you can find even the strangest hints of correlation if the samples are small enough. Thus, the first thing Arlington69 should have done is to test whether his sample size is sufficient. He didn’t do that, so we decided to do it for him.

In this case however, determining the sample size is a problem in it’s own right. On one hand, we are dealing with several hundred shots. But on the other hand, these shots aren’t truly independent observations because they are made by the same players, among whom some will make better finishing decisions than others. If we assume that the sample size is the number of players, it’s fairly small. If we assume that it’s the number of shots on target, we can multiply by a factor 10.

But even if we give Arlington69 the benefit of doubt and calculate a confidence interval using the larger number (shots on target), we still get overlapping confidence intervals as seen in the table below.

Overlapping confidence intervals simply means that we don’t have a sufficiently large sample to conclude that the save percentages for the various rating differences actually differ. And when you can’t conclude that the save percentage for >=4 is bigger than for <=-3, you obviously can’t conclude that the save percentage is correlated with the rating difference either.

Conclusion

To quote Arlington69, you would expect players with higher stats to find it easier to score goals than lower rated players. His statistic doesn’t prove the opposite, and it would in truth have come as a surprise if it did.

In earlier articles, we demonstrated that higher rated versions of a player score more goals than the lower rated editions – and the difference is statistically significant! Given that higher rated strikers all other things equal are part of higher rated teams (they influence the OVR in an upwards direction), this strongly suggests that Arlington69’s conclusion not only is unsupported but also incorrect.

If we had the capability to build a sufficiently large sample of independent observations, we most likely would see keepers making fewer saves against better opponents.