The Red Sox are averaging 5.5 runs per game, second best figure in baseball after the Cardinals, who, however, are playing in a division that is 9 games below .500, in the weaker league. The AL East is already 20 games above .500.

Yet the Sox have been held to 3 runs or less in half their 24 games. And they've scored 9 runs or more 8 times. They've scored 4-8 runs just 4 times. (See below for the full breakdown.)

So, what the F is going on?

Is this a small sample size fluke, or is it because the current lineup isn't really MLB-best versus quality pitching, but absolutely feasts on bad pitchers?

One way to look at this: you could estimate expected RS for each game by taking the RA/9 of each starting pitcher they've faced plus the RA/9 of the team's bullpen and weighting them 2:1. And then graph this versus actual RS to see if the slope looks to be significantly steeper than 1.

I'm sure there are other ways to address the question. And I hope that someone has the time to compile the suggested stats, or look at the question in neat and novel ways. (I may have a chance to do the correlation study next week some time.)

Has anyone done any work on the relationship between beating up on bad teams vis-a-vis ekeing out wins against good teams and so forth?

Obviously baseball is a different game, but it seems like a similar sort of analysis could be done as that done by FootballOutsiders in football showing that stomping weak opponents had a stronger correlation with success than the grittier wins against good teams that may be more satisfying but, by the numbers at least, say less about the team.

The key here would be to consider how the research design ought to be set up for baseball--multiple (esp. starting) pitchers for any given team is the most obvious variable that jumps to mind (as per the variable RA/9 EV mentions above). But the general thrust across sports is the idea of taking advantage of opportunities presented (e.g. unforced errors in tennis), so this seems like a promising area for investigation if we can figure out the best parameters to investigate for the case of baseball.

I mean really, 5 or less starts is a rather small SS to start reliably classifying guys like Milone/Drabek/Parker as an example of one or the other. Then you have the other 3 games versus Tampa to weigh against the one Shields shut us down, which included a beating to the arguable #2 proven/quality starter we've faced behind Verlander (Price). Peavy's game vs the other 2 games we hammered Danks/Humber, ect ect ect.

Another month or so would obviously help in clearing out some of the noise.

This Red Sox team seems a lot less patient than previous editions. They don't walk much at all; even buys like Sweeney and Gonzalez have BB rates well below where you'd expect them to be. Perhaps this is a factor? Or, they have an offense where a lot of production is coming from depth guys (Aviles, Sweeney, Ross) who are neutralized by stronger pitching.

This Red Sox team seems a lot less patient than previous editions. They don't walk much at all; even buys like Sweeney and Gonzalez have BB rates well below where you'd expect them to be. Perhaps this is a factor? Or, they have an offense where a lot of production is coming from depth guys (Aviles, Sweeney, Ross) who are neutralized by stronger pitching.

Everybody is neutralized by stronger pitching - hitting is the art of waiting for a mistake, recognizing a mistake as such, and knowing what to do with it. If the pitcher throws the pitch he's trying to, he's very likely to get you out.

Given the sample size, no, this isn't statistically significant, of course. Which does not mean there is no trend, of course, just that the data aren't sufficient evidence to prove one. If this is real, it's most likely coming from two types of managerial decisions:

1) A tendency to play for one run in low-scoring games but for multiple runs in high-scoring games. Valentine seems much more apt to play for one run in tight games than his predecessors.

2) There seems to be a growing quality gap between the high-leverage and low-leverage relievers on typical MLB rosters; there are more high-quality arms than I can recall in the past, but fewer that are decent. Might just be my impression -- I haven't tried to run numbers -- but I wonder if this comes from a shift in the thinking of general managers across the league. If real, such a gap would provide the positive feedback that EV noticed: games where a starter is knocked out early tend to become even higher-scoring as the low-leverage guys get used, while the good starts become even lower scoring because the top end of most pens is very strong.

Would a simple standard deviation measure help here? My stats classes were 25 years ago, but wouldn't measuring the volatility of the Sox' offense by using standard deviation and then comparing it to other teams give us one perspective on the subject? Maybe I'll take a quick look at it over the weekend, unless somebody wants to beat me to it. I've thought for the past few years that the Sox' distribution of runs has seemed rather inconsistent, but that's a completely subjective observation of mine.

It's not your imagination. The Sox' walk rate was first or second in the league every year from 2005 through 2009, then dipped slightly to #3 in 2010 and 2011. During that period it has always been over 9%, often over 10%. So far this year, we're #7 at 7.6%.

On the O-Swing% side, after being one of the bottom three teams in the league from 2005 through 2010, they rose to #9 last year and are #8 so far this year. They are increasingly prone to chase.

This has been a pretty obvious trend in their acquisitions. If you look at the position players on the roster now who weren't on it two years ago--Gonzalez, Crawford, Aviles, Ross, Sweeney, Salty, Shoppach, Punto, Byrd--only Gonzalez, Punto and Shoppach have a career BB% above this year's league average of 8.1% (Sweeney's BB% is exactly 8.1, so I guess you could count him too). They're not prioritizing plate discipline in acquiring players.

The bench is filled with guys who can't get on base either, to the point where Punto as PH (please, a walk!) is a defensible decision.

Sweeney's gotten on base at a nice clip, but let me know if you think his .450 average on balls in play is sustainable.

Largely a function of all the injuries; but guys like Ross, Aviles, Byrd, Salty, etc. seem like players with significant weaknesses who are potentially easier to pitch to; they consistently have crappy at bats which is more apparent against good pitching.

In short, there's a lot of players starting on this team that wouldn't be starting on Red Sox teams of recent past. The days of wearing down pitchers with a relentless attack, with a guy like Bill Mueller batting ninth are gone.

Why not just do a Kolmogorov-Smirnov test? It wouldn't account for the possibility that the distribution of opposing pitcher quality the 2012 Red Sox have faced differs significantly from average, but it'd be a quick and easy (and possibly informative) test. If someone has runs scored data for the Red Sox and the AL handy but isn't set up to to do a KS test, PM me.

So, this discussion bring up the questions of when did the Red Sox organization decide that high OBP was not that important anymore ...and why?

OBP does not seem to have played a prominent role in Theo's overvaluation of Crawford. Signing a hacker to a $140m contract was a conspicuous departure from the prior norm.

Replacing Alex Gonzalez with Marco Scutaro fit the OBP model. Exchanging Reddick for Sweeney was another case of trading power for patience, so I doubt that the approach has changed entirely -- but if a lineup is to grind down the opposing starter and raise his pitch count, then the entire lineup needs to work toward the same goal. The patience of Pedroia and Ortiz can be undone by first-pitch-swinging scrubs batting 6th through 9th.

K-S shouldn't be testing for normality - runs scored should be Poisson distributed, although I just checked and the current Sox' run scores record is significantly different from a random Poisson draw with the same average number of runs scored. I also just looked at the '07 Sox and the '11 Yankees (who scored pretty much the same R/G as this year's Red Sox). Both of those teams ALSO aren't Poisson distributed (what's going on with that?***), but AREN'T significantly different from the current Sox' run distribution. The feast/famine is just random noise so far.

***I'll tell you what's up with that: when teams don't score, it's because the starter is doing well, will go deep in to the game and, more often than not, be replaced by the top relievers in the bullpen. When the starter sucks, he's pulled early, the garbage time relievers come in, and the offense scores a lot of runs. All three teams I've looked at have had more low-scoring games than you'd expect from a Poisson distribution.

***I'll tell you what's up with that: when teams don't score, it's because the starter is doing well, will go deep in to the game and, more often than not, be replaced by the top relievers in the bullpen. When the starter sucks, he's pulled early, the garbage time relievers come in, and the offense scores a lot of runs. All three teams I've looked at have had more low-scoring games than you'd expect from a Poisson distribution.

Excellent point; this matches what we'd expect, right? Quality of pitching after the start is endogenous to how the game is going; St. Bayes taught us not to forget what we know just because we were doing math, yes?

I pulled all picthing lines for all games for 2012 to date. I seperated out games where BOS was the opponent and recalculated each pitchers RA/inning. When using this non-Red Sox expected RA with the innings pitched for each pitcher who faced BOS I found that the expected RS/9 was 3.93.

I pulled all picthing lines for all games for 2012 to date. I seperated out games where BOS was the opponent and recalculated each pitchers RA/inning. When using this non-Red Sox expected RA with the innings pitched for each pitcher who faced BOS I found that the expected RS/9 was 3.93.

A similar excercise using NYY data shows an actual RS/9 of 4.96 vs. expected RS/9 of 4.39.

For the teams I've looked at, this is very typical. Of course the mean is higher than the median, which is why, at this point, this is still nothing more than random noise.

Yes, of course the mean is higher than the median, but 81% higher? (It's now 5.42 mean, 3.00 median).

The median RS/9 of the 2007 Sox was 5.00 and the mean was 5.60, which is 12% higher. I'm guessing that's very typical. We're scoring 97% of the total runs we did in 2007, but on the average night, we score 60% of what we did.

I must have missed the stats class where they taught us to not actually look at the data that's presented and just declare it to be random variation even though the observed effect is six or seven times as large as usual. My bad.

I must have missed the stats class where they taught us to not actually look at the data that's presented and just declare it to be random variation even though the observed effect is six or seven times as large as usual. My bad.

Yeah, exactly above - run the K-S test on the current Red Sox team and compare the distribution with any team that has a similar R/G. There is no significant difference for the teams I've tried.

Yeah, exactly above - run the K-S test on the current Red Sox team and compare the distribution with any team that has a similar R/G. There is no significant difference for the teams I've tried.

I'm not sure you'd expect there to be. You don't need the Sox to be significantly different from the reference distribution for something to be happening on the baseball field that is worth paying attention to. Because the reference distribution may or may not already include a sensitivity to the opposing pitcher.

The working hypothesis here is that some teams may inherently have a wider SD of RS because their offense is more sensitive to the quality of the opposition pitching, while other teams are much less sensitive and hence have a narrower SD.

You would need a very sophisticated model, or run a lot of simulated trials, to predict what the range of observed SDs of RS for a bunch of teams with a given offense ought to be given the null hypothesis of no sensitivity to quality of opposition pitching. I don't believe anyone has done that and compared the resulting, predicted range of SDs to the observed range.

So when we look at all the teams with similar RS/G and look at their RS distribution by games, and plot them, you get some with fairly peaked distributions and some with much more spread ones. The question is, if you re-played all those seasons, would there be any tendency for teams to repeat their distribution? You can't assume that the observed range in kurtosis is just random.

So when I point out that the observed effect is 6 or 7 times larger than normal, and you reply that in fact that's not statistically significant given the sample size, when you look at what other teams have done historically, that doesn't answer the question. Now we want to know if the fact that such effects can be that large in this small a sample is what you would expect from sheer randomness, or whether it requires some real component to make it that extreme. IOW, if our hypothesis that there is some real component to the kurtosis is correct, the observed wide SD so far may be significant versus the null hypothesis that the Sox are a team with no such component.

One way to test this would be to look at a team with a huge SD and see if, in fact, they are feasting on bad pitching and struggling against good pitching, both to an unusual degree. Which is what I really want to know about the Sox so far.

And, BTW, even if we were to find out if that were true of the Sox so far, it wouldn't mean that it hadn't itself been ultimately random. You might find that some of the guys showing that pattern had no career tendency to do so. But it would still be good to know. The question is ultimately whether the Sox being 11th in offensive WPA despite being 3rd in wRC+ is entirely luck, something that has just happened without any discernible underlying pattern, or whether it can be described in terms of the team beating up on bad pitching and struggling against good pitching, more so than you'd expect. If the latter is true, you can then begin the investigation of whether that has any predictive value.

(To my knowledge, quality of opposition splits for hitters have never been adequately studied, because no one has controlled for opposing pitcher handedness.)

So when I point out that the observed effect is 6 or 7 times larger than normal, and you reply that in fact that's not statistically significant given the sample size, when you look at what other teams have done historically, that doesn't answer the question. Now we want to know if the fact that such effects can be that large in this small a sample is what you would expect from sheer randomness, or whether it requires some real component to make it that extreme. IOW, if our hypothesis that there is some real component to the kurtosis is correct, the observed wide SD so far may be significant versus the null hypothesis that the Sox are a team with no such component.

Yeah it would be totally sweet to have some kind of statistical test that would compare two distributions' kurtosis (and, if we're going to be really rigorous about it, we should test for skewness too, shouldn't we?). But of course, we know that R/G can't be normal since it's a counting stat, so we can't just do some stupid F-test to compare the variances. Yes, as you have well explained, what we really need is some non-parametric test to compare the kurtosis (and, just to make me feel better, skewness) of two distributions. I wish some really smart Russian had actually put some thought in to how to do that, because it could be a really useful test, not just for baseball but for a bunch of different fields, including politics and business. I bet if some really smart Russian - or, actually, this happens a lot, TWO really smart Russians - had come up with a test like that, it would be taught in statistics classes!

Oh well, maybe if you have some free time you can come up with something off the top of your head.

Yeah it would be totally sweet to have some kind of statistical test that would compare two distributions' kurtosis (and, if we're going to be really rigorous about it, we should test for skewness too, shouldn't we?). But of course, we know that R/G can't be normal since it's a counting stat, so we can't just do some stupid F-test to compare the variances. Yes, as you have well explained, what we really need is some non-parametric test to compare the kurtosis (and, just to make me feel better, skewness) of two distributions. I wish some really smart Russian had actually put some thought in to how to do that, because it could be a really useful test, not just for baseball but for a bunch of different fields, including politics and business. I bet if some really smart Russian - or, actually, this happens a lot, TWO really smart Russians - had come up with a test like that, it would be taught in statistics classes!

Oh well, maybe if you have some free time you can come up with something off the top of your head.

Completely missing the point much?

No, I didn't say we needed to compare the kurtosis of two distributions. I explained in some detail that we don't know jack about the kurtosis (and skewness) of MLB run distributions, in terms of whether their difference from one another is random or meaningful. Is there any tendency for teams with a very peaked distribution to have a peaked distribution the next season? I don't believe anyone's ever studied that.

Yes, your smart Russians tell us that the Sox run distribution was not unusual when compared to other MLB run distributions. That's not the question I was asking.

And, yes, the run distribution has regularized. It also seems as if they've also had some good offensive games against good pitchers (Blanton), and some bad offensive games against bad pitchers (Matusz), neither of which I remember seeing earlier in the season, although of course that memory may be highly subject. Which is why I wanted to actually look at the numbers.

The reason I started this thread is because I was curious as to whether the strange run distribution the Sox had early in the season was because they were consistently beating up on crap pitching while not hitting good pitching, which was the simplest explanation for the wide spread in values. If that is the case, that's an interesting phenomenon whether or not it has predictive value, because we are interested in the game of baseball. (I mean, does that ever happen, over a short period of time, to a degree that looks like it might not be completely random? I don't know.) And it is certainly an interesting phenomenon even if the same distribution of RS per game could have been produced at random without the phenomenon. That the latter is true does not tell you that the phenomenon is not present. And your reference distribution may itself include the phenomenon.

Here's an analogy. A band goes on tour, playing small clubs, and they play to either empty houses or packed houses, nothing in between. Very strange. And we're curious why. You grab some tour attendance figures for some bands of equal stature and tell us that there's absolutely nothing unusual about that sort of distribution, it happens all the time in a tour that short. Tour a bit longer and you're bound to see some half-filled houses.

But this doesn't address any of the questions we're curious about. Why is it that some nights the house is empty, and some nights packed? Is it the amount of college radio airplay? The presence or absence of a preview article in the local paper? Or is it much closer to what you'd call "random," e.g., the weather, whether local sports teams have an important game, or a host of other factors unrelated to the music biz? These are all factors that influence the reference distribution. And we don't know jack about that. We don't know whether there are bands that get consistent airplay in every city and hence don't see many empty or packed houses, and other bands that get ignored in some cities while getting massive airplay in others. We don't know whether some bands are very reliant on press previews while others have fans that are largely illiterate. All interesting questions that are not addressed at all by simply reporting that the observed pattern isn't the least bit unusual in a sample of the given size. Is it not unusual because of the weather? Or because there are lots of bands dependent on college radio airplay and we're looking at one of them, and in a tour this short you might hit three terrible cities and three great ones, just at random?

I want to know the answer to the baseball equivalents of these questions. I really don't give a shit about statistics except as a tool to answer those questions. And I can't think of anything more wrong-headed than running a single stat test and then concluding that we should cease our curiosity about those questions because the one stat test tells us that the questions are uninteresting. Maybe to you they are. My condolences, really.

1) A tendency to play for one run in low-scoring games but for multiple runs in high-scoring games. Valentine seems much more apt to play for one run in tight games than his predecessors.

2) There seems to be a growing quality gap between the high-leverage and low-leverage relievers on typical MLB rosters; there are more high-quality arms than I can recall in the past, but fewer that are decent. Might just be my impression -- I haven't tried to run numbers -- but I wonder if this comes from a shift in the thinking of general managers across the league. If real, such a gap would provide the positive feedback that EV noticed: games where a starter is knocked out early tend to become even higher-scoring as the low-leverage guys get used, while the good starts become even lower scoring because the top end of most pens is very strong.

These, BTW, are exactly the sorts of questions that interest me. If there are managerial differences in the use of one-run strategies, that would absolutely affect the RS distribution, and you might see a very small but significant year-to-year correlation in kurtosis, skew, and/or standard deviation for teams that kept managers but none for teams that changed them. That would totally rock as a finding.

And if there is a growing tendency for the sort of bullpen stratification you speculate about in point 2, you would see a recent tendency for RS distributions to widen. That would be really interesting, too.

No one has studied these distributions and what produces them. There's certainly a suspicion that there are consistent offensive teams and inconsistent ones (the '77 Sox had that reputation), but why? Is it better to be one way or the other for post-season success? There are a ton of unasked questions.

You can make all the studies you want, and produce all the post facto statistics and explanations that you want. Knock yourself out. But if your goal is to find strategies which can improve a team's performance, post facto explanations won't cut it. Make a testable prediction, and then see what happens.

When I write testable, I mean "robust". If you have some idea about managerial strategies in late-and-tight situations, don't just tell us what should happen to the Red Sox if they do X; tell us what should happen to every team in the majors if they do X. We can then follow all the teams over the rest of the season. Some teams might stop doing X in the middle of the season due to injuries or trades. The larger the set of teams in the sample, the more likely it will be that after the season ends, we can see if your prediction is correct.

The reason I started this thread is because I was curious as to whether the strange run distribution the Sox had early in the season was because they were consistently beating up on crap pitching while not hitting good pitching, which was the simplest explanation for the wide spread in values. If that is the case, that's an interesting phenomenon whether or not it has predictive value, because we are interested in the game of baseball.

A team that hits well against bad pitching and badly against good pitching is not an interesting phenomenon. You took a few games and said the Red Sox were unusually good at hitting bad pitching and unusually bad at hitting good pitching. I said it was not interesting because it was just a few games. The question was, two weeks ago: are the Red Sox going to continue to have a bimodal run scoring distribution. The smart bet was that their scoring would "regularize" (as you put it). Any person reasonably familiar with the basics of statistics would make that bet because there was nothing to suggest that they were anything but a good, but typical offense. You lost that bet.