The worst of the best (part II)

Hi there, and welcome back. Last week, as you may recall, we asked one of the classic questions of baseball debate: what’s the worst team ever to win the World Series? In Part I, we presented the teams and explained how we’ll answer it. In this column, we get to the good stuff—the answer!

First, to back up a second, this enterprise depends on the help and aid of SG from the Replacement Level Yankee Weblog. He has a computer program set up to run 1,000 simulated seasons for whatever 28 teams across baseball history he puts into it, and he’s willing to plug in the 28 teams I asked him for.

What 28 teams to choose? I looked at a combination of actual record and Pythagorean record to pick the 28 title winners. Or, to be more accurate, I gave him two groups of 14 title winners. After all, the rise of divisional play has, by its very nature, opened up more avenues to teams with less regular season success to win it all. So the 28 teams are divided into two divisions: 14 pre-1969 teams (1969 being the first year of the League Championship Series), and 14 since 1969.

OK—fine, great, wonderful. Can we hurry past the recap already? You’ve been waiting a week, so here it is—the fun part: results.

Pre-divisional play teams

Let’s start with the 14 squads that predate 1969. All these teams had the best record in their league in their flag-winning season. Yet, that does not all mean they were created equal.

Of the 65 teams that won it all back when the World Series was the entire postseason, who were the worst teams of them all? They are listed from worst winning percentage in SG’s sims to the best, with average simulated wins to the nearest tenth of a win. Also included are their average runs scored and allowed in the sim, and their real life W-L and actual Pythag record. Here they are:

So, the worst world champion from the pre-LCS days was the 1933 Giants. That’s a bit of a surprise. My money was on the 1959 Dodgers. The Giants are a forgotten champion that no one talks about, but it’s worth noting it was the first of five consecutive 90-win seasons for the club, a span during which they won three pennants (but only this one World Series). It was also their first full year after John McGraw retired.

The poor placement for the 1933 Giants is especially interesting to me because a few years ago I did another sim courtesy of SG looking at teams that lost the World Series. I was especially interested in nine clubs that lost the World Series they played in and didn’t win in any surrounding seasons.

One of those nine was the 1933 Senators, who lost to the Giants in that World Series in five games. Those Senators finished eighth of the nine teams I was most interested in for that sim. According to the computer, 1933 really wasn’t much of a Series.

The Giants didn’t have much of an offense in 1933. It wasn’t bad, but it also wasn’t too good—which makes it inferior for a world champion. They finished fourth in runs scored that year, and only three of their starting position players had an OPS+ over 100. (That said, in a sign of how times have changed, New York’s 82 homers were the most in the NL).

Yet that’s not what cost them in this sim: their pitching did them in. That’s weird because in real life New York won with pitching. Carl Hubbell had the season of his life, posting a minuscule 1.66 ERA in over 300 innings. Hubbell’s longtime rotation companion, Hal Schumacher, also had the best season of his career, with a 2.16 ERA and a league-low 6.92 hits per nine innings.

A possible explanation: New York’s pitching was great, but entirely dependent on six arms to carry them. Those guys did 80 percent of the pitching on the year, but the rest of the staff was pretty bad. My hunch is that the computer simulations gave too much playing time to New York’s dregs. It’s worth noting that the 1933 Giants had the largest standard deviation for runs scored of any team in this sim exercise.

That said, regardless of differences in run scoring, there was little change in their results. Among the 28 teams in the sim, the Giants had one of the four best records only four times. All the other pre-1969 teams made the top four at least seven times. The 1933 Giants came in 24th or worse 521 times in 1,000 sims. They were in the bottom quartile almost two-thirds of the time. Maybe the computer just didn’t like their main arms that much.

The 1964 Cards were, like the 1933 Giants, the first pennant winner for a core that claimed three pennants in five years. St. Louis had a winning percentage that was fairly unimpressive for a world champion. In fact, their 93-69 real-life mark was the second-worst of an NL pennant winner from 1903-1972. St. Louis’ Pythag record was five games worse than their real record, too.

The Cards claimed the pennant when the Phillies infamously flopped in the home stretch of the 1964 pennant race, and the Redbirds beat the Yankees in seven games in October. In the sims, only the 1933 Giants allowed more runs than the 1964 Cards. That’s one problem with a computer simulation: you won’t get Bob Gibson‘s seemingly superhuman level of achievement in October baseball.

The next-worst teams are two expected to appear on the bottom: the 1945 Tigers and 1959 Dodgers. The Tigers had the least talent of any champion—hey, it was WWII, after all, and players didn’t all get back right away when the war ended. The Dodgers were a weird team in transition: the Boys of Summer hitters had gotten old and the Don Drysdale-Sandy Koufax 1-2 punch hadn’t come of age.

Remember how the 1964 Cards had the worst winning percentage from an NL pennant winner from 1903-72? The 1959 Dodgers were the worst.

At the other end, it’s nice to see the 1919 Reds do so well. They’re the one team that didn’t deserve to be in here based solely on their record. They’re in because their opponent threw the Worlds Series, forever tainting Cincinnati’s claim to greatness. Nothing will ever erase that stain, but the Reds had enough talent to legitimately win one.

The 1926 Cards coming in first is a surprise. They are one of only six world champions to win fewer than 90 games (not including teams from shortened seasons). One factor: they had a very good offense—both in real life, and especially here, where they averaged over a half-run more per game than most other teams.

Actually, what’s striking is how down the overall offensive numbers are. The 1926 Cards have easily the best offense above, but they still scored barely four runs per game. Even adjusting for lower scoring when teams with good pitching meet up, that’s still low. Maybe the 1926 Cards just had matchups work to their advantage: they were an offense-first club while most competitors were pitching-first. Perhaps. That’s just a guess.

If you add it up, the teams above finished 1,148-1,120 – meaning the recent champions performend worse overall. Let’s see how they did.

Playoff-era champions

It makes sense these guys would do worse in a simulation—after all, you no longer have to have the league’s best regular-season record to win a world title. Here they are, formatted just like the last bunch:

The 1987 Twins. Worst here and worst overall. Yeah. I can argue against the 1933 Giants or 1964 Cards, but not the 1987 Twins. Folks, this was a team that allowed more runs than it scored in the regular season. That’s bad. Going by runs scored and allowed, the 1987 Twins weren’t as good as the 1978 Twins, who finished 73-89 on the year.

Getting away from runs scored and allowed, their 85-77 record is second-worst for any champion ever. They had the fifth-best record in the AL that year, but the four better teams were all in the other division.

There’s an X-factor in real life that doesn’t carry into the simulations: home-field advantage. The Twins weren’t a good team, but the noisy Metrodome gave them the best home record in the league (offsetting their miserable 29-52 road record). True to form, in the postseason, they went 6-0 at home and 2-4 on the road. The 1987 Fall Classic was the first World Series in which the home team won every game.

The 1987 Twins doing worst overall isn’t surprising. What is surprising is the next-worst team in the divisional era doing better than the worst pre-divisional champs: 1933 Giants, 1964 Cardinals, and 1945 Tigers.

That said, the right team came next-to-last in the divisional era. The 2006 Cards are famous as the team that barely finished .500 but managed to skate through three rounds of the postseason to claim the world title.

They were actually a better team than their record. They had a terrific start to the season, jumping out to a 42-26 record before some players suffered through injuries and others became ineffective. They worked their way through their problems in time for the postseason and received improved performances from the back of their rotation. That said, a lot of teams would be better if not for injuries.

Really, the St. Louis Cardinals dominate the bottom of these lists. One of their teams comes in next-to-last in both groups. Also, the 1987 Twins beat the Cardinals in the Worlds Series, as did the third-worst post-1969 champion: the 1985 Royals.

Like 1987, the 1985 Series went seven games. This time, the Royals had to rally back from a three-games-to-one deficit to win it all. Famously, the Cards were one inning away from the championship when Don Denkinger blew a call at first base, the Cards folded and the Royals won it all from there.

KC’s problem was hitting. They’re the worst offensive club ever to win it all, finishing next-to-last in the 14-team AL in runs scoring. Only three batters on the entire Royals team had an OPS+ over 100. Yeah, they’re pitching could bail them out, but that’s a lot to ask of any staff. While their staff was really good, it wasn’t really great. It was “only” second in the league in ERA. You’d expect only an all-time great staff could lead such a bad offense to the tile.

The Royals did have a great postseason. Not only did they win both rounds of the playoffs, but they came back from three-games-to-one deficits in both. Give them credit for that—but really, all these times I’m describing were when they were at their best in the postseason. That goes without saying, or they wouldn’t be here.

The biggest surprise among the recent teams is how well the 2000 Yankees do. They had one of the worst records of any champion, but finish third best among the recent teams. Huh? I think there’s a quirk in the sims that explains this.

A player performed excellently for the Yankees in a part-time role: Glenallen Hill hit an out-of-his-mind .333/.378/.735 (not a typo: a .735 slugging average). From my experience with the sims, sometime it doesn’t regress a player’s performance to the mean when he gets more at-bats. Similarly, David Justice had a 145 OPS+ in a bench role for the Yanks, and that’s well over his normal production. This probably explains why those Yanks do so well.

Of course, this means the exercise is flawed and shouldn’t be taken as the last word. That was never the case, though. There will never be a final word to a question like this, which is why it is such a fun question to ask.

This simulation series does do a few things, though. While it’s not perfect, it does put some teams’ performances in perspective. Those 1987 Twins really are the worst, for example. The 1933 Giants may not be as bad as this lets on, but they are one of the worst title winners from their generation. And the 1919 Reds can hold their heads up high.

Oh yeah, one other nice thing about these sims: they’re fun. And that’s never a bad thing.

Comments

You’re right about Hill, but I wouldn’t really call what David Justice had as a “bench role” for the 2000 Yankees. He played his first game for the Yankees on June 30th, which was the team’s 74th. He then started 75 of the last 88 games.

If Diamond Mind Baseball was used for the simulation, it could explain some of the quirks.

First, DMB has an option to Limit Bench Playing Time, which prevents bench players from playing more than they did in real life. If that option was not on (it should have been), bench players with gaudy (or at least better) stats are going to play more than they should.

Second, DMB can have issues with older style pitching staffs. The pitcher “rest” system is designed for 4 and 5 man rotations. Smaller rotations usually result in the computer manager starting tired pitchers because no one else is available, resulting in less than optimal results. This usually hurts dead ball era and earlier teams the worst, but the 1933 Giants could have suffered from it. While they effectively have a 5 man rotation (4 main starters, and a cast of 7 other guys to fill out the fifth slot), Hubbell was also used in relief 12 times in real life. DMB probably used him as a reliever too and he was probably quite tired by the end of the season.

The other problem with assessing the 2000 Yankees, is that, after being up by 9 games on Sept 14th, they proceeded to go 3-15, and were outscored in those games by 148-59. Absent that, they projected to be a 97 win pythag team.

I’m not sure if you’re still looking at the BBTF thread for this post. If not, here’s a little bit to think about before the Best of the Best project (and sorry for the stuff that repeats Paul G’s post):

I’m still not convinced a Greatest Team league can be pulled off without discriminating against teams with flimsy pitching staffs. If you went through and manually managed each game, making sure that starters get the right number of CG and cutting pitcher fatigue short, I suppose you could even the playing field somewhat. Otherwise, you’re stuck with a computer manager with a quick hook and a fatigue system that punishes small staffs. DMB’s pitch-by-pitch system, which is a strength for most post-WWII seasons, is a huge weakness for older seasons, particularly deadball seasons. I strongly doubt anybody can manage the 1934 Cardinals with real life lineups and transactions and still win the National League; after all, Dizzy will be pitching tired every other day at the end of the year. That’s why I don’t buy this “the 1933 Giants are the weakest World Series winner of that era” conclusion.

Anybody who saw the results of the original 28 Great Teams league knows that SG couldn’t have had LBPT turned on, not with the way Jim Dyck and Joe Ginsberg (1954 Indians) were being overused. There is also a question as to which seasons these teams originated from: did some have L/R splits added? Were some from poorly created homebrew disks (I’m not sure if the 1922 Browns were in there, but if they were, that’s a prime suspect)? Throwing in those extra off-days can help the old time teams somewhat, but can also give an unfair advantage to modern teams, who still benefit from having a stronger bench.

If you want to do this right, you’re going to need to have a strict limit on the roster size (perhaps even under 25 men to help out those old time teams), and you’ll probably need to manually create transactions to make sure all players are used realistically (no keeping Dyck and Ginsberg on the roster to tempt the computer manager). Without real life (or user created) transactions, DMB’s computer manager is extremely poor at managing player usage, and that’s without diving into the realms of cross-era play.

We haven’t even begun to discuss normalization. DMB’s HR normalization system works great if you’re using a post-1927 season, but gives really strange results for anything earlier. The decision to use an additive system for home runs while using a percentage based normalization system for everything else (I think the process is documented in the help files; if not, it’s on the 2006AGP disk) still strikes me as bizarre. Basically, 2001 Barry Bonds is going to hit at least 40 HR in 1908, no matter what you do to the park effects and no matter what kind of pitching he faces.

DMB’s ballpark rating system works great for individual seasons, but really falls apart when you start mixing and matching stuff. The ballpark effect system really is nothing more than a hits distributor: you’ll notice that all ballparks in a specific league average out to 100 for 1B, 2B, 3B and HR (taking into account, of course, the distribution between LHB and RHB). When you take those parks out of their league context and mix them together in a fictional league, the balance is gone. It’s not easy to readjust ballpark effects, either: how do we know that 1908 Exposition Park really was more of a pitchers’ park than the 1980 Astrodome, and that it wasn’t due to the Pirates’ generally inept offense?

Cross-era normalization in general can lead to some really strange results if you’re not careful. I personally prefer Bill Staffa’s approach over at SkeeterSoft, which does involve some statistic modification. You run into unrealistic results at the extremes: the 2001 Barry Bonds hitting 40 or 50 HR in 1908, for example, or the 1999 Pedro Martinez having an ERA approaching zero in 1968. You’ve got to cap it off at some point in time. There’s a good thread about this over at the APBA: Between the Lines Delphi Forum, in the “Context and Normalization” topic, if you’re interested. Basically, you’ve got to figure out a way to cap off those extreme performances. Bonds is still Bonds in 1908, but a lot of those HRs turn into 2B and 3B: there’s no way he’s going to hit 4 times as many home runs as the real life league leader, unless you really believe baseball has improved that much over time (in which case you’ve got a big problem with the 1908 pitchers).

There’s still the question of the pitching – hitting balance league wide as well. Should you adjust for the fact that hitters will face better pitching than they did in real life, or that pitchers will face better hitting on average than they did in real life? Would it just balance out?

As a side note, I’d like to see if somebody could set up a Best of the Best league without having any repeat players. That would mean more than choosing between the 1975 Reds and 1976 Reds: it would mean choosing between the 1975 Reds and 1986 Mets. Maybe I’m the only one who wants to play that sort of league.

Honestly—I like talking about baseball normalization in the abstract more than actually playing the season. After all, when you do get it set up and the 1927 Yankees end up 20 games out in front, it’s not quite all that satisfying.

Enjoyed the article, Chris, as I have done various simulations myself (albeit on smaller scales than 1,000 games).

My hunch was with those 1987 Twins also (though the 1981 Dodgers were certainly not as good as their counterparts who lost to the Yankees in 1977-78 and were probably not as good as the 82 Cardinals either). Was surprised though by the 1933 Giants. I too thought the 1959 Dodgers would fair worse.

Scheduling, too, may be a factor for anomalies. More frequent rainouts and game-shortened games prior to stadium lighting, Astroturf and domes helped rest pitchers. On the other hand, more frequent double-headers might too be a factor with fatigue.

One problem with doing a best-of-the-best replay (Chris touches on this somewhat with regards to player performance and usage) is that a manager (real life or a DMB computer manager) is going to manage the 1933 Giants differently when playing all their games against opponents of similar quality (i.e. pennant winners or the 2nd place club) than against their own league’s cellar-dwellers for that year.

It would be a much more massive simulation project, but I wonder if even better results might be gathered with a BOTB replay that uses not only the 28 pennant winners themselves, but all the teams in the league for those given seasons (i.e. the 33 Giants, 33 Dodgers, 33 Cubs, 33 Pirates, 33 Braves, 33 Phillies, 33 Cardinals, 33 Reds; 64 Cardinals, 64 Phillies, etc.)

A 10-season league (clubs from 9 pre-61/62 seasons that had 8 teams per league and one 10-club season would have 82 teams participating, perfect for a home-and-home with 81 opponents.)

That way, the pennant winners would feast on the lower division clubs from different eras and the pitching rotations don’t get as easily worn out from having to face top hitting clubs game after game.

I’ve recently begun using DMB to run some simulations in the 1970s. I’ve used BBW in the past and while I enjoy that game too, I found its inability to have realistic park factor adjustments disappointing. As someone who enjoys studying ‘What-Ifs’, park factor usage is a must-have if you’re going to see how a team would have done had a certain trade not taken place, etc.

As for ballpark factors crossing eras, I’ve looked at this and do find it is possible to make certain adjustments, though it involves some work. For example, how would Roy Campanella have hit had he played in 1958 – with the Dodgers back in Brooklyn playing in Ebbetts Field? You first compare Ebbetts Field’s factors to other the 1957 N.L. parks that had the same field dimensions as they would in 1958 (a few had changed). With that you can then approximate what PFs Ebbetts would have had in 1958 by creating a similar ratio to those same parks using 1957 data. All N.L. left-handed batters would benefit (by eliminating the L.A. Coliseum and replacing it with Ebbetts), Duke Snider especially.

Another project I’m (slowly) working on – SG might be interested in this one – would have the Yankees playing in their remodeled stadium but with its 1973 dimensions. Graig Nettles should pop a few more home runs with that. Of course being a Boston fan, I’m nixing the Rivers, Randolph and Gamble trades. And since NYY doesn’t get Ken Brett, they can’t trade him for Carlos May. (Also taking Reggie off the O’s and putting him back in Oakland.) To be fair (I know, why be fair to the Yankees?), Bobby Bonds and Elliot Maddox will be healthy. The real question will be: does Steinbrenner fire Billy Martin after the 1976 season ends with the Yankees finishing 3rd behind the repeat champion Red Sox and 2nd place Baltimore?

As for normalization and how many home runs Barry Bonds might have hit in 1908, one certainly must adjust for the steroid factor – since they weren’t available back then.