Today my column on Baseball Prospectus (subscription required) is a sort of part II building on last week's offering (no subscription required) and delves a bit more deeply into the double steal and discusses both the limits of play by play event codes like those found in Retrosheet and double steal attempts from a Win Expectancy (WX) standpoint. The thumbnail version of the discussion on the underlying strategy (in the article I also discuss individual manager's use of the double steal and the runners who were most prolific in this regard) can be summarized in the following table that includes data from 1970 through 2006 and excluding 1999. The table shows the base situation, number of outs, successes, double steal attempts, the percentage of attempts that were successful (defined as both runners being safe but not necessarily awarded stolen bases) and the average break even percentage for such attempts. The break even rates were calculated using a model where the most likely costly failure was used in the calculation which involves the lead runner being thrown out and other runners advancing.

Attempts with two outs are not included for reasons discussed in the article related to the incompleteness of the scoring system. Considering only attempts with runners on first and second it appears that managers generally do pretty well in their decision making process and are successful 66% of the time with a break even percentage of 64%.

However, with runners on first and third the overall success rate is under 31% while the break even percentage is over twice that indicating that delayed double steals are very risky and typically not a good idea. However, it should be remembered that the definition of a successful delayed double steal also typically includes the scenario where the runner on second is put out with the runner on third scoring. When those are included the success rate rises to 41.2%. To determine whether the strategy is then a good idea we would also need to recalibrate the break even model to where successes of these types were also included. Doing so, however, would actually raise the break even rate. For example, with runners at the corners with nobody out in a tie game in the top of 5th inning, the visiting team has a 58% chance of winning in a run environment of 5.0 runs per game. If the double steal fails with the runner on third getting thrown out at the plate with the runner on first advancing, that probability shrinks to 48%. If the attempt is successful under the model used in the table above (both runners advancing) the probability shoots up to 65% but if the play is successful with the runner on third scoring and the runner on second being thrown out the probability raises to just 59%. In the former case the break even percentage would be 60% while in the latter it is 93%. This is so since the loss of the trail runner and the additional out will always lower the offensive team's probability of winning. Keep in mind that your mileage will vary depending on the inning, score, and run environment but the general rule will hold for all such scenarios.

Of course some percentage of these attempts are also not able to be teased out of the play by play data since the runner on third may retreat to the bag while the runner on second is charged with a caught stealing. There is no way to differentiate between that scenario and one where the runner on third is simply a spectator in normal stolen base attempt by the runner on first. In order to do so the codes would have to reflect which runners were breaking with the pitch. That is one of the limitations we have to live with at present. My guess is that if successes and failures in that scenario were able to be found, they would generally cancel each other out - but that's just a guess.

The situation with runners on second and third and the bases loaded are more interesting since unsuccessful attempts are characterized by one or more runners being caught stealing while other runners advance. With runners on second and third that means that either the runner on second was thrown out at third and the runner on third scored or the runner was caught at the plate and the runner on second advanced. But here successful attempts would not include delayed steals and so the overall success rate of just 9% and the extreme rarity of the play is evidence that it is a poor percentage play.

Monday, March 26, 2007

With less than a week to go before the season gets underway there are plenty of rookies and prospects who have made their mark this spring. Baseball America tracks these with its' daily Prospect Report email for subscribers (and you know you should really subscribe). In today's email we saw the following players who are doing very well along with their results from yesterday:

Particularly, I found this passge to ring true to what I read previoulsy.

Finally, there is religion. Rome is saturated with it — there are prayers and oaths, offerings made to deities known and unknown, and religious processions and priestly orders. A pagan world, in other words, is not one in which we control the gods, as trendy leftists suppose, but in which we are ever at risk of offending some god for failure to make the right offering or sacrifice. Moreover, these gods rarely provide a guide to conduct or right behavior — they are inscrutable.

Too Much of a Good Thing. The novelty of interleague play, which began in 1997, has begun to wear off. While Major League Baseball touts the fact that attendance at interleague games is 13% greater than at other games, an excellent piece in SABR's Outside the Lines summer newsletter more correctly puts that figure at 5% after making adjustments for the time of year when interleague games are played and the fact that most games take place on the weekends.

My wish is that interleague play would be cut back from the average of 15 to 17 games per team played in recent years to something more like nine (or fewer for the NL Central) and further restrict those to one alternating "natural rivalry" (itself a problematic concept for many teams) with two other series, one home and one away, thrown in the mix. This would not only lessen the need for teams to adjust their rosters for play in the other league but also help to minimize the contact between the leagues--a strength of the game when it comes to All-Star and World Series time.

And while we're at it, we may as well rebalance the schedule a little. While I've been a fan of the unbalanced schedule, agreeing with the logical necessity of intra-divisional play being primary if you're going to have divisions and the fact that it is necessary if you're going to keep interleague play, the current system is a little out of whack. Teams often play 19 games against opponents in their own division but just six against opponents in other divisions. While this allows teams who are behind in the division the opportunity to take matters into their own hands, it's not so good for fans who see the same set of teams week in and week out. Yes there are complexities to the schedule, but playing a few less games against division rivals and the reintroduction of the four-game weekend series and the occasional doubleheader in conjunction with a reduction of interleague games would likely free up enough of the schedule to play other league rivals nine to 11 times per season. Barring radical realignment and the demise of interleague play, this is probably about the best we can hope for.

Like master chefs, the best science writers pare away all but the most succulent material, trimming details essential to the researcher that would be only a distraction to the reader. And by carefully crafting narrative and using expository devices that showcase the drama of scientific exploration and discovery, popular works can maintain a high level of scientific integrity while making difficult and technical subjects not only accessible but moving and compelling. Good science writing can humanize the abstraction of scientific research by establishing visceral, meaningful connections to questions and issues we care about and by humanizing the scientific process itself. In Einstein's words, scientific research consists of "years of anxious searching in the dark for a truth one feels but cannot find, until a final emergence into the light." A reader who is led to envisage this search, I believe, will start to bridge the gulf between the science and the humanities. The best science writing can have that effect.

I purchased this book for my wife, the erstwhile nurse, for Christmas after hearing an interview with Max on NPR's Science Friday. After she finished and recommended the book, it sat on my night stand amongst the collection of books that are either in the queue or in various stages of completion. After finally finishing Team of Rivals by Doris Kearns Goodwin and Diamond Dollars by Vince Gennaro (I told myself I wouldn't start another until those were complete and I'll write more about the latter in the future) I picked it up and found it even more interesting than I had first imagined.

In short Max tells the story of the research that led to the discovery of prions (pronounced "pree"-on by most but "pry-on" by the British) and their role (such as the state of current research can figure out) in a variety of neurological diseases in humans and other species from the kuru of the Fore people of Papua New Guinea to scrapie in sheep, mad cow disease or BSE (bovine spongiform encephalopathy) in cows and CWD (chronic wasting disease) in deer, to GSS (Gerstmann-Straussler-Scheinker) to CJD (Creutzfeldt-Jakob) and finally to FFI (fatal familial insomnia) which forms the core of the story around which the book is written.

Prion diseases, so the theory goes, are infectious bits of protein that are abnormally structured and cause other proteins with which they come in contact with to fold incorrectly as well through a process akin to crystallization. They are especially fascinating since they appear to be the only kind of disease that takes three distinct forms. Some are inheritable like FFI (termed genetic), others are infectious like Mad Cow, and others (although Max admits in the Afterword that he doesn't really believe this is possible) are accidental (termed sporadic) as is the case with CJD. Prion diseases are also important because they're akin, although not the same much to the disappointment of researchers, to diseases like Alzheimer's, Parkinson's and Huntington's where defective proteins play a role. Beyond the obvious benefit of helping those who suffer from prion diseases, the hope is that understanding even those diseases that effect relatively few (BSE has killed less than 200 people worldwide despite widespread fear that it would kill millions) will lead to breakthroughs in these diseases that do affect millions.

Just as in Greene's description of what the best popular science writers do, Max humanizes this complex and still not well understood field through the use of two very human stories. The first centers on an Italian family (one of 40 such families in the world where a child of a person with FFI has a 50% chance of contracting the disease) whose various branches have likely suffered from FFI since at least 1765. It was then that, ironically enough, a Venetian doctor probably contracted FFI, a disease which typically strikes around 50 years of age and causes its sufferers to sweat profusely, their pupils to contract to pinpricks, as they eventually to lose sleep to the point where they often hallucinate and eventually die of exhaustion. In the end their brains, especially that part that controls autonomic (not under conscious control) impulses like sweating and sleep, are almost entirely eaten away as misfolded proteins ravage the brain. Max traces the disease to the present day recounting the lives and deaths of many of the family members and the family's struggle to come to grips with the implications of their disease in the modern world where research holds out some hope of progress. Having some understanding that the disease is brought on by stress, most of the family had lived under the unspoken rule that, as in the words of one family member, "the best way to prevent the disease was not to mention it." Much of that has changed thouhh and now the family has even created an association to raise money and reach out to other FFI families.

Interspersed with this often heart wrenching account is the story of the scientists who first encountered the prion disease kuru in humans (spread through the funerary feasts of the tribe where body parts of dead relatives were consumed) after "first contact" with the Fore people in Papua New Guinea in the 1940s. The most interesting character in this vein of the story is Carlton Gajdusek, the scientist with pedophillic tendencies who worked among the Fore tribe and who received a Nobel Prize for his work on prions. Max then chronicles the various lines of research and accompanying animosity in the scientific community as well as perhaps the most polarizing figure, Stanley Prusiner, who also won a Nobel Prize and coined the word "prion". This part of the account takes on the air of a mystery while discussing the various approaches and theories that came and went in the years since kuru was discovered. From that perspective the book appears to be fair in its attribution of the contributions made not only by Gajdusek and Prusiner but many others as well who often did not receive the credit due them. And most importantly Max is up front in acknowledging what we don't know and gives the reader a sense of the uncertainty, hence the mystery, still to be solved. That may not always be satisfying since we love our stories to have endings, but is intellectually honest and I think that's all we can ask of any writer.

Woven into both these stories are a good deal of the political and social history of the kuru epidemic, scrapie in sheep going back to the late 1700s, Mad Cow in England in the 1980s and 1990s, and finally CWD and the hint of Mad Cow in America in the present even detailing the conspiracy theories of the "Creutzfeldt Jakobins", as Max calls them, who believe there is a massive cover up underway. In the end, Max is hard on the British and American governments and uses the infectious transmission of prions as an object lesson in the dangers of human arrogance.

An excellent book and one that exemplifies what is best in popular science writing.

Thursday, March 22, 2007

SI.com: Are you familiar with the statistical-minded Internet sites like Baseball Prospectus?

Schilling: Yeah, I love those guys. I don't always agree with them but I think those are some incredibly smart guys. I've actually worked in the past with some guys there on statistical stuff I do for preparation. Will Carroll is the guy I've exchanged some information with.

SI.com: Do you think that Internet-based baseball analysts and writers should be available for BBWAA awards and Hall of Fame voting?

Schilling: Oh, it'll come full-circle at some point. Why wouldn't it? They already have a much larger impact than the Murray Chass' of the world would like to believe. I mean, you've got guys who are putting out what I know to be legitimately valuable statistical information and its relevance to a game in a win or a loss at Baseball Prospectus. Then you have guys that I'm not too fond of, like Murray Chass, who says, "What is VORP and who cares?" It was a stupid article. The only thing it did was show his ignorance to me in modern day baseball. Because those numbers do matter, those numbers do have value. Do they have value to me in getting a player out? No. But I would tell you that there are a lot of front offices that use those numbers for a lot of important decision making

My column today on Baseball Prospectus revolved around the double steal and tackling questions like how often it's been employed, how successfully, and whether that tracks with what is optimum from a strategic perspective. While there are some issues with the underlying play by play data that make it difficult to count all such attempts, we can get pretty close. One of the tables I left out of that article was the career numbers for the current crop of managers. The following table lists the 30 managers from 2006 and their career numbers on double steals.

Note: A successful double steal is one in which both runners are credited with a stolen base or neither runner is put out.

In this current crop of managers Mike Scioscia is by far the leader in terms of success and in 2006 was 13 for 13. Both Mike Hargrove and Willie Randolph have called for 8 or more per season and Tony LaRussa is close at 7.8. Lou Pinella wasn't included in this list since he didn't manage in 2006 but historically he is 111 for 160 in his 18 years for a .694 percentage and averaging 8.9 attempts per season. Cubs fans should expect a few double steals in 2006. Let's just hope they come with a few wins as well.

Tuesday, March 20, 2007

Last week Neal Williams and I published two articles on Baseball Analaysts related to quantifying third base coaching here and here. In those articles we showed the results for 2006 as well as the aggregates for all coaches from 2000 through 2006. In the interest of full disclosure here are the results for all individual coach seasons from 2000 through 2006 ordered by the Ratio of the "coach influenced" Rate to the "non-coach influenced" Rate. So Tom Foley, under this version of the measure, recorded the top two seasons.

In 2003 and 2004 he performed about the same (and negatively) but in 2003 the team in non-coach opportunities performed even more poorly. The same holds true for his 2000 season with the Red Sox. This points to the fact that perhaps we should be weighting the coach opportunities differently from the non-coach and as pointed out by MGL perhaps all advancements to home plate should be considered coach opportunities.

Friday, March 16, 2007

Today we'll take a quick look at the baserunning of the Atlanta Braves. From a team perspective they ranked 27th in the majors at -12.23 theoretical runs and 24th when EqSBR is taken out of the picture at -4.10. Overall they scored poorly in all four categories. Marcus Giles did well overall at +3.17 runs and Andruw Jones was at +2.34 runs. Chipper Jones did fairly poorly at -0.74 but it was Brian McCann that really dragged them down at -4.37 runs. Adam LaRoche was also at -1.39. In any case, it doesn't appear they'll improve appreciably in 2007 with the loss of Giles and much of the lineup being same.

Wednesday, March 14, 2007

Today and tomorrow a two part article on quantifying the impact of third base coaches will run on Baseball Analysts. This is an offshoot of the work I've done on baserunning in the past. I was greatly assisted by Neal Williams of Rocky Mountain SABR.

The term "coachers" was the term originally applied to coaches, the term itself derived from the resemblence of coaches to stagecoach drivers.

Tuesday, March 13, 2007

Last week in my column on BP I took a look at the various aspects of Leverage (using Keith Woolner's definition) and broke it down by lineup position, inning, and score differential as well as taking a quick look at pinch hitting.

In that discussion I provided a table that broke down each of the high level events in 2006 by it's average WX value. Reader feedback prompted me to amend that table by breaking out sacrifice hits and flies on the Unfiltered site.

In addition, several readers have asked about the breakdown of Leverage by half inning. So here it is graphically broken down into two graphs for easier reading.

As you can see in innings one through eight the Leverage is lower in the bottom half of the inning than in the top. That's the case since the score is more likely to be tied at the beginning of each inning and in fact the average score differential in innings one through eight is lower in the top half than in the bottom half of each inning. In innings 9 and higher the situation is more interesting. In inning 9 the Leverage goes way up in the bottom half of the inning since when it is played the home team is typically not being blown out (and getting a runner or two on raises the Leverage dramatically since a single run can often win or tie the game) whereas the Leverage in the top half continues to decline in the ninth inning. In the 10th inning and beyond (aggregated in the second graph as "10") the Leverage for the bottom half of the inning remains higher despite the score differential actually being higher since a single run often allows the home team to win.

As mentioned in my previous post I spent the weekend in the Valley of the Sun taking in both the warm weather and several spring training games and camps. An overview from the pure baseball side is now up on Baseball Prospectus (no subscription required).

Monday, March 12, 2007

Last weekend I had the chance to make a quick trip to Spring Training sites in Arizona attending four games in three days. I'll have more to report on Baseball Prospectus and here but one of the interesting little diversions was found at Surprise Stadium on Saturday afternoon. About 45 minutes prior to game time a panel was held that included eight women who played in the All-American Girls Professional Baseball League. The league was in existence from 1943-1954 and was of course made famous by the 1992 film A League of Their Own. I didn't realize that at first the game they played was actually softball but that the rules were changed over time to make them identical to baseball. Too bad that trend didn't take hold.

In any case the ladies answered questions from the audience and each shared which teams they played for. For the most part the women were paid between $50 and $65 a week with $3.50 per day meal money, which many of them insisted was good money for the time.

In addition to the ladies, Fergie Jenkins, who lives in the area, was on the panel and alternated taking questions from the bevy of Cubs fans in attendance (the Cubs were playing the Royals that day).

Jenkins was asked the inevitable Ron Santo why isn't he in the Hall of Fame question and simply stated that the process is broken and that he would like to know the five members who didn't vote for him. In a conversation afterwards as I procured a signed baseball, he said he thought when Bruce Sutter was inducted, that it would have been fitting for Rich Gossage and Lee Smith to go in at the same time. He also talked about his charitable work a bit and when questioned about pitch counts was adamant that there is a causal link between the money pitchers make today and their use of the disabled list (he said he was on the DL once and only because of a ruptured achilles). He also said that in his final spring start of the year he would go 9 innings to ensure that he could do so once the season started. He completed 267 games in his career.

As Fergie was finishing out his career with the Cubs in 1982-83 I can recall tuning into as many of his starts as possible and scoring the games when I could (the game prior to the one where he recorded his 3,000th strikeout sticks in mind particularly). I was amazed by his control and his pitching patterns and even purchased his book on pitching written with David Fischer, Inside Pitching and tried to apply what I could to my high school pitching career such as it was. In any case it was a great pleasure meeting him.

Thursday, March 08, 2007

Today we'll take a look at the Florida Marlins running game to compliment the Hope and Faith piece Christina Kahrl wrote for BP today.

From an overall perspective the Marlins finished 16th in baseball at -7.09 runs. When EqSBR is taken out of the equation they finish second behind the Angels at +8.88 runs since they finished dead last at -15.98 in EqSBR. As you can see from the table below the reason for their poor showing was the combination of Wes Helms, Dan Uggla, Reggie Abercrombie, Miguel Cabrera, and Alfredo Amezaga who accounted for -12.76 and recorded 37 stolen bases, 33 caught stealing, and were picked off 4 times for good measure.

On the other hand Hanley Ramirez (pictured left) placed second overall at +8.89 runs just behind Chone Figgins of the Angels. I've mentioned before that in the games I personally saw Ramirez play I saw hime twice stretch singles into doubles as outfielders lazily retrieved ground balls. Meanwhile Josh Willingham took the distinction as the worst baserunner of 2006 in The 2007 Bill James Handbook but in these metrics placed 659th out of 666 players at -4.72 runs. I published a critique of the James methodology in a column late last year. Essentially the James system doesn't take into account as much context as these metrics do by considering the fielder who fielded the ball and the number of outs as well as other baserunners. However, James does include some additional categories that I had not considered. Advancing on wild pitches and passed balls and perhaps even balks are categories that should be included in a total measure of baserunning. This is the flip side of the excellent work done by Sean Forman on catchers. They were omitted from the metrics I developed but will be included in the future. However, I'm a little hesitant to include advancing on defensive indifference. While it probably is correlated with speed and therefore baserunning, the value of advancing in those scenarios is so questionable, and the opportunities to do so coupled with the difficulty of judging when a runner is in such a situation add it up to my conclusion that including those occasions makes little sense.