Doctoring The Numbers

Hot Starts (a.k.a. Should Royals Fans Get Excited Yet?)

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Hi. For those of you who have just started reading Baseball Prospectus in the last six months, allow me to introduce myself. I'm Rany Jazayerli. I used to write a lot around here. I hope to write a lot again at some point in the future.

In the meantime, I regret to say that my appearances at BP will be sporadic at best. The American Board of Dermatology has managed to do something no one else--not my wife, not my family, not thousands of exasperated readers--has been able to do: Convince me to stop writing about baseball.

On August 10th and 11th, barring unforeseen circumstances, I plan to sit for my board-certification exam in dermatology. As this represents the most difficult, most important, and--thank God--last examination in my medical training, I have been persuaded/threatened/coerced to devote the time ordinarily spent writing about baseball on studying for said exam. (Something about the best interests of my patients...I guess. Everyone knows dermatologists are just aloe pushers anyway. Pimple Poppers MD. I mean, skin doesn't need a doctor!)

In the meantime, enjoy this irregularly scheduled edition of Doctoring the Numbers, and I ask for your patience if it's August before I get the chance to write another one.

One more thing: I'd get that mole looked at if I were you--it might be skin cancer, and an early diagnosis could save your life. Hey, I'm a dermatologist. Saving lives is what I do.

Today, I want to look at the relevance of a hot start on a team's overall winning record. (I know--where do I get these ideas?)

As I write this, the aliens who have collectively taken over the Kansas City Royals' entire roster are 14-3, the best start in team history. Not to be outdone, the Yankees are 15-3 and have outhomered their opponents this year by the miniscule margin of 35 to 4, which is a stat that deserves its own DTN article, if not its own episode of The X-Files. And both teams are trying to keep up with the Giants, who after Sunday's loss are 15-3 despite outscoring their opponents by the downright-reasonable margin of just 107 to 81.

The topic of the meaningfulness of hot starts has intrigued analysts since the Tigers' remarkable 35-5 start in 1984 persuaded Bill James to look at the subject in his 1985 Abstract. One of the major problems with this sort of data analysis is just getting the data for the day-by-day standings for every day in baseball history. James, working by hand, only had data from 1965 to 1984, but then he did not have the services of the incomparable, indispensable David W. Smith (the W. stands for "Support Project Retrosheet!"), who graciously provided me with just the data I needed.

The sheer quantity of data--day-by-day records for every team, in every season--became a problem; for the studies below, I only looked at data from 1930 to 1999, because that was the most data I could fit into Excel's 65-some-odd thousand rows. (I eliminated seasons with less than 140 games played, i.e. 1981 and 1994.) And keep in mind that because I only have day-by-day data, meaning that the first game of doubleheaders is missing in most cases, the data may not add up precisely. Given the sample sizes, I am confident that the missing data will not impinge upon the integrity of the data in any significant way.

The following charts show, at each level of games played, how teams with each possible record at that point fared for the season as a whole. I also included a chart to show how teams played for the remainder of the season, so that the overall records of teams that started, say, 5-0 aren't biased by the fact that they've got five games in the bag--we knew that already.

On first blush, it seems fairly remarkable that three games can have so much bearing on a team's entire season, particularly at the extremes: Teams that started 3-0 played .542 ball all season, averaging 88 wins in a 162-game season, while teams that started 0-3 had an average record of 77-85. Nearly 30% (56 of 189) of the 3-0 teams made the playoffs, while just 8% (15 of 187) of the 0-3 teams bounced back to reach the postseason.

When you consider the reasons more thoroughly, though, it's not that surprising. A 3-0 start doesn't make a team a contender. It's just true that contending teams are, by definition, more likely to win three games in a row--at any juncture of the season--than a second-division club. Using simple probability theory, a team that wins 55% of its games has a (.55*.55*.55=) 16.6% chance of starting the season 3-0, while a team that wins 45% of its games has just a (.45*.45*.45=) 9.1% chance. Small differences in the caliber of a team add up (or, speaking mathematically, multiply down) when you look at their ability to put together a sustained winning streak.

No team is likely to start 3-0, but the odds are so much greater against bad teams that without knowing anything about a team except that they started 3-0, you can conclude that the team is more likely to be a first-division team than not. And sure enough, of the 189 teams that started 3-0 in the study, 137 won more than they lost, four finished at .500, and just 48 had losing records.

As we increase the start size a little more, the differences become more pronounced, particularly when looking at the team's final destination. I've included another set of columns in this chart, showing just how likely teams in each row were to finish the season with 1) a winning record and 2) a playoff berth:

Incidentally, the only two teams in the study to reach the playoffs after an 0-5 start were the 1974 Pirates and the 1995 Reds. The Reds were able to make up the deficit despite the abbreviated 144-game schedule.

As we increase the number of games, the sample sizes at the periphery become vanishingly small, so in the following columns I have combined the rows at the top and bottom of the chart to make for meaningful samples. In each case, I have placed the average number of wins in parenthesis, so for instance in the following chart, of the 21 teams listed under "9+ wins", five had 10 wins and 16 had nine wins, so the 21 teams as a whole averaged 9.24 wins.

Pretty amazing symmetry there for those 5-5 teams, huh? The 286 teams that started 5-5 finished with 22668 wins, 22669 losses, which is about as close to .500 as humanly possible without actually breaking even. They were also just one team shy of having exactly 50% of their teams finish over .500. Statistical studies almost never yield such clean results, so I wouldn't blame some of you for taking the name of my column a little too seriously.

More importantly, there is a smooth, stepwise progression of the data here. With the exception of the peculiar number of 2-8 teams that made the playoffs, each additional win at the 10-game mark led to a better end-of-season record, a better chance of finishing over .500, and a better chance at reaching the playoffs.

The only team in the study that started with 9+ wins and failed to finish over .500 (the same outlier team in the 7-0 group) was the 1966 Indians, who started 10-0 before settling in to finish 81-81. While they are not in the study group, another Indians team of more recent extraction--last year's--started 11-1 but finished just 74-88. Royals fans should take heart, though: The 2002 Indians are the only team since 1901 to finish under .500 after an 11-1 start.

If there's one thing to take home from this study, it's this: A hot start by the Indians means precisely nothing. In addition to the 1966 and 2002 versions we've already discussed, the 1941 and 1988 teams both started 16-4, and both finished under .500. Teams that don't don the ugly visage of Chief Wahoo are a perfect 10-for-10 in converting a 16-4 start into a winning record, and seven of those 10 teams made the playoffs.

The only 4-16 team to right the ship back to a winning record by season's end was the 1996 Red Sox, who finished 85-77. The 1973 Cardinals managed to get back to .500 despite their 4-16 start, which no doubt led to some consternation in the Gateway City, given that they finished just 1.5 games behind the NL East "champion" Mets.

No team between 1930 and 1999 so much as made the playoffs after starting 6-14 or worse. A more recent study, however, reveals that teams employing a stuffed primate as a promotional tool have a better than 0% chance of turning a 6-14 start into October glory. (Between the 2002 Angels, who started 6-14, and the 2001 Athletics, who started 8-18 before finishing 102-60, I'm inclined to say that any and all conclusions of this study are declared null and void in the 21st-century American League West.)

Exceptions aside, we see the same stepwise progression here even though the individual sample sizes have shrunk somewhat. It is interesting to note that there appear to be "threshold wins" within the data that, once a team has crossed those given levels, appear to be more meaningful than other wins. The difference between a 5-15 start and a 6-16 start, for instance, is worth 31 points of winning percentage, or more than four wins, over the course of the remainder of the season. Similar thresholds are seen between nine and 10 wins, and between 12 and 13 wins. Beyond 13 wins, though, additional wins appear to have a miniscule impact on the rest of the season. It's reasonable to conclude that there is no meaningful difference between the quality of a 16-4 team and a 14-6 team. Especially if that team is the Indians.

I think we've found the magic threshold. Here are the 11 teams that started 23-7 or better:

1939 New York Yankees: Won World Series

1946 Boston Red Sox: Won AL Pennant

1952 Brooklyn Dodgers: Won NL Pennant

1955 Brooklyn Dodgers: Won World Series

1977 LA Dodgers: Won NL Pennant

1984 Detroit Tigers: Won World Series

1988 Oakland A's: Won AL Pennant

1990 Cincinnati Reds: Won World Series

1993 Philadelphia Phillies: Won NL Pennant

1998 New York Yankees: Won World Series

Eleven teams, 10 of whom played for the World Championship, five of whom won it. If that's not a dead-lock, sure-fire, in-the-bag recipe for success, I don't know what is.

The only outlier in the group is the 1945 New York Giants, who finished 75-79. The Giants' performance can be discounted some because it was a war year, and more to the point, as the war ended many former stars returned to their teams late in the year, skewing the final results as teams which got more of their players back beat up on teams still stocked with war-time caliber players. Besides, the Cubs won the NL pennant that year, which just proves that things were messed up.

The lone outlier among the 22-8 teams was the 1995 Phillies, who finished 69-75 after their torrid start. On the other side of the chart, the two teams that rebounded from a 10-20 beginning to reach the playoffs were the 1974 Pirates and the 1989 Blue Jays, who jettisoned Jimy Williams at the 36-game mark and replaced him with Cito Gaston, surely one of the most successful in-season managerial changes of all time.

At the 30-game point, the data--at least at the extremes, which is what's most pertinent to our discussion--become truly meaningful. Of the 115 teams that started 20-10 or better, just seven failed to finish with a winning record; of the 113 teams that started 10-20 or worse, only eight finished above .500. Combining the data, just 15 of the 228 teams on the extremes, or 6.6%, changed course by the end of the season. Looking at the 20-game data, 25 of the 215 teams--11.6%--on the extremes (14 or more wins or losses) changed course. And out of the 59 teams that started 22-8 or better, or 8-22 or worse, just one--the 1995 Phillies--made a complete about-face by season's end.

We'll stop here for now. In my next article, I plan to use linear regression analysis to put all the data above into a single, elegant formula. Failing that, an inelegant, downright clumsy formula will do. I also plan to look more closely at the subset of teams whose early-season start runs completely counter to their previous season's performance. It's one thing to say that a 14-3 team is likely to finish in the money, but what about a 14-3 team that lost 100 games the previous year? (I know--where do I get these ideas?)

We'll explore those topics next time. If there is a next time. I have this exam to study for, see.

Rany Jazayerli is an author of Baseball Prospectus. Click here to see Rany's other articles.
You can contact Rany by clicking here