RandBall: When is a small sample size no longer a small sample size?

Blog Post by: Michael Rand

June 11, 2014 - 2:05 PM

Through seven innings in Wednesday afternoon's game against the Blue Jays, Joe Mauer was 2-for-3 with a walk. That's a .750 on-base percentage for the slumping Mauer, and if he could keep that up for the rest of the season he would be the MVP.

But most of us understand those numbers are what we call a "small sample size" -- a sometimes relevant set of data, but numbers that nonetheless can't be extrapolated to inform us of a trend.

In the larger context, Mauer is having a poor season. But his diminished output only represents about 5 percent of his career at-bats. Are these two-plus months of Mauer still a small sample size?

We asked the honest question on Twitter: when does a small sample size for a hitter magically become an adequate sample size? Because while most of us like to toss around the "small sample size" phrase these days, very few of us are actually well-versed in what it means.

In essence, the size of a relative sample is relative to what you're measuring. With someone like Josmil Pinto, with limited career at-bats, this is fairly cut and dried. With Mauer, though, it's still complicated. Do we choose to believe the greater sample -- more than 5,000 career plate appearances, which suggest Mauer is a very good hitter -- or the smaller but still relevant sample size from this season?