Friday, July 23, 2004

This is a collection of 30 plus essays on baseball written by the late Harvard paleontologist Stephen Jay Gould over the last 20 to 30 years. Among them are two original pieces on forms of streetball in New York City when Gould was growing up and a piece discussing the lure of baseball for intellectuals like himself (which he views as purely contingent on time and place).

There is also a long tribute to Gould (he died in May of 2002) in the forward written by David Halberstam (author of Summer of '49). Many of the remaining essays appeared in various places such as the New York Times Review of Books, some of which I've read before but many of which I hadn't seen. They range from book reviews to short eulogies (of Mickey Mantle for example) to essays. One of his most famous is his essay on the disappearance of the .400 hitter originally written in 1986 for Discover magazine. He, like George Will in perhaps my favorite baseball book Men at Work, views it simply as the natural consequence of of an increasing level of play that comes closer to the "right-wall" of human ability coupled with the increasing maturity of the game. This increasing level of play tends to decrease the differences between average and stellar performers. As a result, since the mean batting average has remained roughly .260 since the 1940s, there are fewer players at both the left and right ends of the spectrum. This also tracks very well with a book that came out a couple years ago that rated the greatest hitters (for batting average) of all time through a series of statistics and determined that Tony Gwynn was the greatest.

Gould's writing is always interesting and even though he was a life long Yankees fan, he rightly despised the DH and aluminum bats. Baseball fans will find plenty to like here.

Sunday, July 18, 2004

My contention in a previous post was that OPS (on base + slug) is a useful measure of offensive production because of its simplicity, comparative ability, and correlative value. However, when ranking the greatest single seasons in OPS only three players, Barry Bonds, Babe Ruth, and Ted Williams, made the top 10 seasons. Could this be the result of some bias in favor of these particular players? For example, one obvious thought that comes to mind is that since homeruns have been flying out of the park at an increased rate since 1993 (“chicks dig the long ball”) it is easier for a player like Barry Bonds playing in the context of an expanded run environment to amass large OPS numbers by increasing their slugging percentages (the average number of plate appearances per homerun in the period 1960-1992 was about 47, from 1993-2003 it was 35, in other words a player with 600 plate appearances would hit about 12 and half homeruns in the 1960-1992 period and almost 17 in the 1993-2003 period). Another thought is that perhaps Ted Williams was inordinately helped by Fenway Park with its Green Monster and short right field line and Babe Ruth by playing in a park that was after all, the house that he built.

Correcting for League and Year To see if there is some hidden bias here we can first correct for the context by calculating the league average OPS for the 10 seasons in question and then normalizing the individual’s OPS against the league average, a concept first introduced in The Hidden Game of Baseball. For example, in 2001 the National League OPS was 756, a very high number historically. By taking Barry Bonds’ OPS of 1379 and dividing it by the league average (1379/756) we can calculate a Normalized OPS (NOPS) of 1.82, or simply 182 for short. By performing the same calculation with Babe Ruth’s 1920 season (the same raw OPS of 1379 in a league where the average was 730) his NOPS comes out to 189, a little ahead of Bonds. Here are the before and after rankings.

As you can see the same three hitters still dominate the list, however, the distribution has changed somewhat with Ted Williams garnering another spot for his 1946 season in a league with a low OPS of 690 and Babe Ruth losing his 1927 season when the league put up a fairly high OPS of 747. Williams’ 1957 season now also looks better in this light moving from 9th to 5th place. And most obviously Babe Ruth’s 1920 season now tops the list with an NOPS of 189. So in answer to part of our question we can fairly confidently say that these three hitter’s accomplishments were not inordinately helped by playing in leagues that were hitter’s paradises. In fact, the first player not of this ruling triumvirate to make the list is Mickey Mantle with his 1957 season (NOPS of 167). The only other contemporary player to make the top 20 is Mark McGwire with his famous 1998 season and an NOPS of 166 tied for 15th.

However, hitters that played in extremely low scoring run environments should be greatly helped by normalizing OPS. For example, consider Willie McCovey’s 1969 season and Carl Yastrzemski’s 1967 seasons.

Before normalization McCovey in 1969 ranked as tied for 70th all-time with a raw OPS of 1108. After correcting for a league in which pitchers dominated with an OPS of 686 he jumps to tied for 23rd with an NOPS of 162. Even more dramatically Yaz in 1967 moves from tied for 166th to tied for 27th place.

Correcting for Ballpark
But adjusting for the run environment of the league in which a player plays is only part of the context. The park in which the player plays his home games is another significant aspect. Intuitively this makes sense. It seems obvious that Larry Walker gets a boost from playing in Coors Field while Willie McCovey was hurt by playing in Candlestick Park. To take this into account sabermetricians have devoted themselves to calculating “park factors” or “park effects” for each of the major league parks. Historically this has been done by calculating a Batter Park Effect or BPF and a Pitcher Park Effect or PPF for each team. The calculation of these effects as documented The Hidden Game of Baseball involves not only comparing the scoring in each park with the scoring at other parks but also taking into account that there is a "home cooking" bias where batters naturally hit and pitch better at home (a fact well documented in Curve Ball). In addition, the calculation allows for the fact that a team's hitters do not have to face its pitchers and vice versa. The BPF and PPF are expressed as a percentage of the league average, in other words a BPF of 1.06 would mean that the batter's home park gives him a 6% advantage over the league and a PPF of .95 means that the park helps pitchers to the tune of 5%. Although factors can and are calculated for different offensive events, homeruns, doubles, triples, etc. the overall BPF is calculated based on runs scored. Here are the BPF and PPF as calculated for 2003 sorted by BPF.

Fortunately, the BPF and PPF have been calculated and are present in the Lahman database and so in order take into account the home park we simply need to multiply the NOPS by the BPF divided by 1,000. Here are the single season NOPS leaders shown previously re-sorted with a new column for normalized for park effects.

So given that Bonds has played in a relatively poor park for hitters (BPFs of 91 in 2001 and 2002) gets helped while Williams is hurt by the high BPFs of Fenway Park that are consistently over 100. And so it is once again appropriate to recreate the top 10 list with park effects.

Since Williams is hurt so much by Fenway Park he almost slips off the list entirely with only his 1941 season remaining. Ruth, however, now adds his 1927 and 1931 seasons when Yankee Stadium held a slight advantage for the pitcher.

Incidentally, Willie McCovey in 1969 moves up to tied for 11th with 165 when considering the tough hitting environment of Candlestick Park while Yaz in 1967 moves down to tied for 102nd at 148. So who is hurt most by taking into account park effects? As you might have guessed it is those who have played for the Colorado Rockies. In fact, Rockies take the top 35 spots when calculating the difference between NOPS and NOPS/PF with Todd Helton’s 2000 season taking top honors when his NOPS was 150 and NOPS/PF was 115. Conversely, Barry Bonds 2001 and 2002 seasons are most helped when park is taken into account raising his score by 18. For Cubs fans like me it’s interesting to note that Sammy Sosa’s 2000 season was tied for 2nd with a 15 point bump up to 149 once park effects were taken into account. Those who follow the Cubs know that weather patterns are the largest variable in whether or not Wrigley Field is a hitter’s delight or a pitcher’s best friend. Those who aren’t Bonds fans might take issue with assuming that the Giants home park hurts Bonds since it was built with a short right field porch with Bonds specifically in mind. Certainly Bonds, being a left-handed hitter, is hurt less by the park than are right handers and so I have a degree of sympathy for that argument. However, I don’t have any data that supports or contradicts the argument at this point. A similar argument could made against Ruth.

So does any of this change our perceptions of who had the greatest single seasons in history? Not really. Bonds, Ruth, and Williams still dominate the top spots and by virtue of Ruth taking 6 of the 10 a strong argument can be made that he was indeed the greatest hitter of them all.

On the other end of the spectrum Niefi Perez has somehow managed to grab two of the worst nine seasons in history with NOPS/PFs of 64 in 2002 and 71 in 1999.

Final ThoughtsThree additional thoughts might come to your mind when considering whether these were the greatest seasons in baseball history.

* Where's the defense? This ranking does not include defense and so can only be used as a ranking of the greatest offensive seasons in history. Although sabermetricians have tried for many years to develop defensive measures that quantify how many runs an individual saves for his team, in the end most of these schemes have difficulty. This is primarily because defense is a much more complex concept in baseball (more akin to defensive backs in football) than offense and doesn’t lead itself to quantification very easily. As Branch Rickey once famously said “There is nothing on earth anybody can do with fielding.” That said there are sabermetric measures such as Defensive Efficiency Rating (DER) and Zone Rating (ZR) that attempt to measure defense more accurately than the traditional counting stats that include put outs, assists, and errors. Bill James, in his book Win Shares, also tries to assign value to defense through a more holistic approach that takes into consideration run prevention at the team level.

* What about opportunities? As mentioned in the previous post one of the strengths of OPS is its simplicity. One of the costs of that simplicity is that OPS has nothing to say about the opportunity a player had to garner his OPS. In other words, which player is more valuable, one with an OPS of 850 who had 600 plate appearances or one with the same OPS who had 200 plate appearances? Obviously, the former since an 850 OPS is pretty good and so finding four players with an 850 OPS over 200 at bats will likely be difficult. In the rankings presented in this post this problem is largely ignored by selecting only those players with 502 or more plate appearances in a season, in other words by only selecting those players who played every day. To address this problem Runs Created per Game (RC/G) takes into account opportunities by considering how many outs a player has consumed – the most valuable resource a team has – while amassing their offensive numbers.

* What about Ty Cobb? Many readers will have noted that Ty Cobb is conspicuously absent from this list and that Cobb is often talked about in the same context with Ruth and Williams. In fact, Cobb first appears on the list tied for 33rd at 160 for his 1917 season right behind Sosa’s 161 in 2001. There are two reasons why this is the case. First, some of Cobb’s perceived value was his foot speed and base stealing ability, neither of which are particularly visible in OPS. Second, OPS is largely a measure of extra-base hitting and Cobb only hit as many as 12 homeruns twice. In his 1917 season, however, he hit 44 doubles, 24 triples, and 6 homeruns. The fact that OPS is correlated so strongly with run scoring indicates that players like Cobb who focused on hitting for average at the expense of power (assuming they could do either of course) did and continue to do a disservice to their teams by forsaking power. In short, if the often told story is true of Cobb hitting three homeruns in a game only to prove to writers that power hitting was not that difficult, then Cobb was mistaken in going back to his former style.

Finally, let's recalculate the 2003 leaders by applying both the correction for league and for park.

It should be noted that the historical rankings take into consideration seasons since 1900 for players with 502 or more plate appearances and that for seasons with HBP and SF recorded they were taken into account. In addition, others have calculated similar adjusted values, some much more complicated, including the Adjusted OPS or OPS+ on baseball-reference.com and PRO+ in Total Baseball.

Saturday, July 17, 2004

Baseball Tonight's Harold Reynolds has been known to ask this question in a derisive tone complete with hand waving when discussing statistics. You may have noticed that I've periodically used OPS when discussing the relative merits of players or for comparison. For the sabermetrically challenged OPS is defined as:

OPS = On Base Average + Slugging Percentage

It couldn't be simpler. Since OPS is simply the addition of these two the long formula would be:

OPS = ((H+BB+HBP)/(AB+BB+HBP+SF))+(TB/AB)

Of course, since the denominators of the two values being added are not the same it's not really mathematically correct to show OPS as .892 and so in many places you'll see it simply listed as 892.

So why use OPS when slugging percentage and on base percentage are so readily available? I'll give three reasons:

Simplicity. People like things simple and, especially where quantitative analysis is concerned, feel the urge to reduce the essence of the thing being analyzed to a single number. In my own field of software development this reductionism leads people to fix precise dollar amounts on projects when an estimate based on a range of amounts is really as precise as you can be given the nature of software development. Given that, however, I'm not arguing that OPS by itself is somehow better than the values from which it is derived or even that given both it conveys more information. For example, which of the following carries more information, an 800 OPS or a .325 OBA/.475 SLUG? Obviously, the latter since the separation of OPS into its component parts tells you something more - namely that the player has pretty good power but not good on base - than the aggregate number. I am arguing that in shorthand venues it's easier to give a single number than to list 2 or 3.

Comparative Ability. The strength of reductionism is the ability to then apply the single number in order to make comparisons. Given OPS we can now rank players (as all baseball statistics junkies will do) to see who was tops in the majors. Using the Lahman database and SQL Server I calculated the following leader board for 2003.

The list is dominated by just three players. But could it be that the conditions under which these three played helped them dominate? To discover if this is the case in a future post I'll talk about how we can "relativise" OPS to take into account the context in which these hitters performed.

Correlative Value. Finally, I like OPS because while it is simple to calculate it correlates well with run scoring. This means that unlike other single measures, batting average being the classic example but also slugging percentage and on base percentage taken by themselves, it is a measure of how valuable a player is in creating runs and therefore wins for his team. And so while having both slugging percentage and on base percentage at hand conveys more information about the player's attributes, combining them fairly accurately gives a measure of his value to the team. In fact, in Curve Ball Albert and Bennet note that using OPS "the number of runs scored by a team per game can be predicted within about .15 Runs per Game for two-thirds of the teams." In fact, in order to get a better correlation you have to resort to more complicated formulas such as Runs Created Per Game (RC/G) developed by Bill James, Linear Weights (LWTS) developed by Pete Palmer, Batter Runs Average (BRA defined as the OBA multiplied by the SLUG) developed by Richard Cramer and Pete Palmer, or Total Average (TA) developed by Thomas Boswell. And in fact the differences between these other techniques and OPS is not nearly as great as the difference between OPS and AVG, SLUG, or OBA. In short, OPS is simple enough to calculate on the fly and yet is a useful indicator of performance.

For these reasons you'll see me continue to use OPS. However, in order to get a baseline here are the average OPS numbers for 2003 (for non-pitchers):

AL 762
NL 771

As a result, an OPS below 700 is bad, above 775 is pretty good, and above 875 is very good. These numbers have been pretty consistent since 1993.

A week or so ago I blogged about batting order and quoted The Hidden Game of Baseball as poo-pooing the idea that monkeying around with the batting order actually produced more runs. One thought that occurred to me after posting was that Palmer and Thorn had used the average statistics at each position in the majors from 1969-1971 along with simulation in order to come to their conclusions. In fact, here are the averages for the nine positions they used in their simulation:

What should be immediately obvious and what I hadn't thought of when I wrote my first post is that the variability in an individual team's lineup is almost always going to be greater than that in the league as a whole. And so in the case of the Cubs, batting their worst hitter (Ordonez) second and their best on base percentage hitter (Lee) 7th regularly has a greater impact than a simulation using the average at those positions would show. How big the difference is could be determined with another simulation but I'd be suprised if wasn't on the order of a couple of wins (around 20 runs).

Friday, July 16, 2004

Here's an interesting article published on Baseball Prospectus by Rany Jazayerli. The article basically explains how BP calculates Pitcher Abuse Points (PAP), something the Cubs and Dusty Baker might keep in mind given the high pitch counts Mark Prior (113.3 pitches per start, tops in the NL in 2003 and now perhaps headed to the DL for the second time with elbow trouble), Carlos Zambrano (106.4 pitches per start in 2003), and Kerry Wood (110.7 pitches per start in 2003) regularly incur.

More interesting though are the recommendations that come from their analysis of PAP. Basically, they advocate going back to the historical four-man rotation but monitoring pitch counts more carefully and then using the starters in relief on their regular throwing days. In all this would give starters more innings pitched (280-290) while not adversely affecting their arms and since your starters are generally your best pitchers, would lead to more victories.

I've seen this theory proved out anecdotally in baseball simulation games which my brother enjoys. He regularly uses elements of this approach and still wins with average teams. However, in the real world I don't hold out much hope that any team will actually try it in the near future. Baseball is by its nature a conservative institution.

For another view on the conservative nature of baseball check out Alan Schwarz's excerpt of his new book The Numbers Game, which I'm going to run out and pick up very soon. Here he recounts how the new sabermetric knowledge first widely published in the 1964 book Percentage Baseball was shunned by almost everyone in the industry and of course is still largely ignored by those in the inner-circle.

Wednesday, July 14, 2004

So during the All-Star game respite I thought I'd give an assessment of both the Royals and Cubs.

Royals (31-54 15.5 games out in the central)After starting 4-2 its all been down hill... The poor performances of Anderson and May in the starting rotation and the lackluster hitting by everyone except Ken Harvey and Carlos Beltran quickly sunk any hope the Royals had to contend in a weak AL Central. The only bright spot has been rookie Zack Grienke who will start the 2nd half tomorrow night against the Twins has a 3.86 ERA (56 IP, 49 H, 12 BB) in 9 starts. Lack of run support has led to his 1-6 record. His only weakness thus far has been a tendency to give up the long ball (9 so far). Once it was clear the Royals had no hope they unloaded Curtis Leskanic, Jason Grimsley, and Carlos Beltran. I would look for Joe Randa and Matt Stairs to be next and perhaps Scott Sullivan.

Injuries have also played a big part in the disaster - 10 players are currently on the disabled list including Juan Gonzales, Benito Santiago (60 day DL), Jeremy Affeldt, Aaron Guiel, and Joe Randa - all players the Royals had hoped would contribute. This isn't to mention the pitchers Kyle Snyder, Kevin Appier, Miguel Asencio, Mike MacDougal, and Runelvys Hernandez, all of whom it was thought would contribute this year. In fact, other than Sweeney, Berroa, and Harvey, none of the projected starters will likely contribute much more this season. It doesn't make sense to play Gonzales or Santiago anymore this season either even though Santiago has another year on his contract. Hopefully he'll retire after the season.

Things to look forward to in the second half are seeing if David DeJesus can take over the centerfield job, seeing if newly acquired John Buck can actually hit at the major league level (he's looked overmatched thus far), and hoping that the Royals can finally make an assessment as to whether Dee Brown will actually have a major league career. They're also going to give former prospect Ruben Mateo some extended time to see what he can do although when Aaron Guiel comes back soon they'll have to split time amongst the four. On the pitching side Mike Wood (also obtained in the Beltra deal) so far doesn't look too promising and everyone else other than Grienke and Affeldt are just spare parts.

As I've said before I think the Royals front office did a fine job last winter. What's gone wrong (with the exception of trying to make Affeldt a starter) has largely been out of their control. Recently David Glass said that he expected the payroll to remain in the $50M range. And so with Beltran's salary freed up as well as that of Gonzales, Stairs, Randa, Grimsely, and Leskanic, the Royals should have some money to work with to try and sign a young pitcher perhaps or a corner outfielder.

Cubs (47-40, 7 games out in the NL Central, 1 game out in the Wild Card)The story of the first half for the Cubs has been injures and unexpected contributions. In terms of injuries Mark Prior, Kerry Wood, Mark Grudzielanek, Mark Remlinger, Sammy Sosa, Alex Gonzales, and Joe Borowski have all been or are still on the disabled list. Aramis Ramirez is also now injured although not on the DL. Where this has hurt the Cubs the most is at shortstop where they've been forced to play Rey Ordonez and Ramon Martinez. Rumors are flying that the Cubs are working a deal for Nomar Garciapara, which would certainly help the struggling offense.

However, they've gotten great contributions from Todd Hollandsworth, Tom Goodwin, Jose Macias, Glendon Rusch (4-1 4.06) and particularly Todd Walker (.863 OPS) off the bench. Walker's contribution was so vital that I question putting Grudzielanek back in the lineup. At least they should be platooned with Walker taking most of the at bats and perhaps playing third when Ramirez sits. Other starters including Moises Alou (.839 OPS) and Ramirez (.924 OPS) particularly have picked up the slack as well. Derrek Lee (.891 OPS) and Corey Patterson (.772 OPS), after slow starts are now hitting as well.

Overall the outlook for the second half is all good. With the big 5 in the rotation the Cubs should be able to look forward to not having to score that many runs thankfully and still be competitive night in and night out. I don't hold out great hopes for winning the NL Central although Cardinals pitchers Suppan and Carpenter are vulnerable for let downs. A key for the Cubs in the 2nd half is being able to beat the Brewers and based on the 3 game sweep they endured last week it might be an uphill battle. With their right-handed dominated lineup they're are vulnerable to hard throwing right-handers like Ben Sheets. The Cubs start the second half in Chicago tomorrow night with Prior against Doug Davis. Can't wait.

I've added a new baseball blog to my list of links on the right. Wild Pitch is written by fellow SABR member Ray Flowers who, in addition to his blog, writes primarily for Drafthelp.com and athomeplate.com. He has a nice piece on Carlos Beltran and park effects here as well.

This famous epigram translated "he snatched lightning from the sky and the scepter from tyrants" neatly summarizes the two sides of Franklin (1706-1790) that predominate in this recent biography of Franklin by Walter Isaacson. I had wanted to read this book when it first came out largely because of the relatively negative view of Franklin I had gotten from reading David McCullough's John Adams.

Although not as well written as the McCullough book Isaacson does a good job of covering the bases detailing Franklin's life in a strictly chronological portrait beginning with his family's background (and the interesting derivation of the Franklin name) in England before his father immigrated to America in 1683. He then traces Franklin from his earliest apprenticeship at his brother's print shop at the age of 10 to his decisive move at 17 from Boston to Philadelphia and his somewhat obsessive drive to make something of himself through industriousness. From there his life as an entrepreneur and "leather-apron" (a middle class merchant) in the printing industry takes Franklin to London and back as he grows in both wealth and influence through the publication of his yearly Poor Richard's Almanak starting in 1733 and his often satirical essays and letters. Isaacson details many of the pithy sayings Franklin became famous for and notes how many of them (more than 90% in Franklin's own estimation) were ancient sayings that Franklin was able to recast in his homespun manner. During this time Isaacson recounts how Franklin tried to bring organization to all his activities and how he acted out his personal theology that "serving your fellow man is the best service to God" in organizing public services projects that started with his own philosophical and business club called the "Junto" which later spread to starting a fire brigade, paving and lighting streets, creating a lending library, starting a hospital, and the University of Pennsylvania.

Franklin retired at the age of 42 to pursue his intellectual and scientific pursuits (he never had a formal education being denied attendance at Harvard by his frugal father) and there is a great chapter on his experiments with electricity and lightning, musical instruments, stoves, and other inventions. Of course Franklin's greatest fame came from his theories on electricity which resulted in his proposal for the invention of the lightning rod in 1750. One of the interesting aspects of the story well recounted in the book is that Franklin's proposal was translated into French after being published in England and was verified on May 10, 1752 a month before Franklin's own famous verification using a kite and key. Isaacson also does a good job explaining the significance of an invention that we all take for granted that saved large amounts of property and lives and even had theological implications. Although Franklin's scientific endeavors had theoretical impact, Isaacson stresses that Franklin was not a theorist and was best at discovering practical uses for his discoveries.

From Franklin's interest in public service projects (and his own self-interest in the circulation of his newspaper and almanacs) he became involved in the Pennsylvania Assembly and postal service and eventually became postmaster (a job that would expose him, more than any of the founding fathers, to the expanse of the colonies and give him a perspective on their unification). From here his political career was launched and he was first elected to the assembly in 1751. While in the assembly his natural distaste for authority led him into conflicts with the proprietors of Pennsylvania, the Penn family, over taxes and defense from Indians. This led eventually to his being appointed by the assembly as their spokesperson in London where he traveled in 1757. From that point on Franklin would only return to America twice, once in 1763-1764 and again at the crucial juncture 1775-1776, living in London and then in Paris until finally returning home for good in 1785. His dealings in London included his work for Pennsylvania and later Georgia with the court and parliament and his role in repealing the Stamp Act before his final humiliation before parliament in 1774. His role at the Continental Congress in 1776 is well told as well as his interaction with Jefferson and Adams in drafting The Declaration. His return to Paris in 1776 as one of the agents of the new Continental Congress was filled with intrigue and dissension with his fellow American commissioners as he was tasked with extracting money from the French court and eventually in 1783 helping broker the peace between Britain and the new republic. After returning home to great fanfare he played the wise sage and helped create the compromises necessary in forming the Constitution during the convention of 1787. Through it all, Isaacson portrays Franklin as a pragmatist who stood on a few solid principles but who was willing to negotiate.

A few of the other interesting tidbits in the book were:

Isaacson describes Franklin's often flirtatious and long-running relationships with younger women such as Catherine Ray and juxtaposes these with his contentious and short-lived relationships with men including his own illegitimate son William who was a Tory and royal governor of New Jersey

Isaacson does not act as an apologist for Franklin's rather shabby treatment of his wife, Deborah. Although Deborah refused to travel to Europe Franklin needlessly delayed his return to America until after Deborah had died despite her constant pleadings and reports of her poor health. He paints their relationship as one of mutual help but not intimacy. He also seemed to treat his faithful daughter Sally in a similar fashion

Isaacson also shows how Franklin setup for himself in London and again in Paris "surrogate families" complete with a doting maternal figure and a young lady on whom Franklin could dote and several small children including his own grandsons Temple and Benny

Another interesting thread that runs through the book is how Franklin's philosophy of conversation (a "velvet-tounged and sweetly passive style") which included asking questions rather than direct confrontation, an indirect style or persuasion, and using silence wisely. Isaacson argues that this style, while effective, sometimes garnered Franklin a reputation for being duplicitous in his dealings

Throughout the book Franklin's religious feelings and actions are brought out. He was a Deist but believed that God does work in the affairs of men (per his famous quote at the Constitutional Convention). Yet he seemed to think very little about the issues of salvation and redemption, instead preferring his pragmatic creed of helping others. He developed these beliefs very early in life (by the mid 1720s) and appears to have never wavered from them. This is the same undogmatic approach that identified him with the Enlightment thinker Voltaire, whom he famously met in France, and that served him well in his negotiating

The discussion of Franklin's role at the constitutional convention is interesting and it was enlightening to note that several of Franklin's preferred ideas including a unicameral legislature, the idea that office holders should not be paid (strangely undemocratic but one which Isaacson argues Franklin made to avoid the corrupting influence of money in politics), and an executive council instead of a President he gave up in order to reach consensus

As I expected the book contains a much more positive portrayal of Franklin's conduct in Paris than John Adams reported and was consequently reflected in McCullough's book. Adams viewed Franklin as lazy and loose with money (contrary to the ideals he preached in his almanacs), which Isaacson sidesteps instead giving the impression that Franklin was simply endearing himself to the court and French ambassadors. Certainly, Franklin's popularity in France (his face was put on everything from medallions to snuff boxes and in one instance on the bottom of a chamber pot presented to a courtier by Louis XVI) is what assisted America the most and without him the French monetary and military assistance and the peace treaty with Britain would not have been successful. Franklin though seemed to have a bit of blindspot for France but when it came time to work a deal for peace he deftly sidestepped the French and dealt directly with Britain to secure better terms

One of the final chapters interestingly reviews Franklin's image over the past 200 years and how it has changed with the times

I'd certainly recommend this book to anyone who wants to get a mental portrait of Franklin and although certainly positive in its overall take, it appears a fair portrayal. It has a good mix of his career, his personal relationships, and his intellectual pursuits.

Saturday, July 10, 2004

From time to time I've criticized Dusty Baker for his lineup construction and he's now switching things up in an attempt to perk up the struggling offense (5 total runs during the current 5 game losing streak). In fact, in yesterday's 6-1 loss to the Cardinals weak hitting Rey Ordonez batted second. While I still think that's a mistake I was interested to read some lineup analysis done by Pete Palmer and John Thorn in their 1984 book The Hidden Game of Baseball. I bought a used copy on Amazon this week, having never read the book.

In the chapter "The Book...and the Computer" Thorn and Palmer analyze the average production at various lineup positions in both leagues from 1969-1971 which produced 4.141 runs per game. They then discuss how they used a computer simulation to test various lineups to see which produced the most runs. What they found was that a maximal number of runs could be score, 4.154 per game, when the traditional order was changed to 1-3-4-5-6-2-7-8-9. Since this equates to only 2 runs over the course of a season it seems to have little effect. In fact, the worst order (9-8-7-2-1-6-5-4-3) which no manager would ever employ produced 4.003 runs, a difference of 24.5 runs over the season. As a result Thorn and Palmer conclude that "All the time managers put into masterminding a winning lineup is so much thumb twiddling, and they are hereby granted an additional hour's sleep a night."

While I don't disagree with their analysis it still seems reasonable that a manager construct his lineup in descending order with "table setters" first, "all around hitters" second, and "runner advancers" last. The other factor to consider is that each spot higher in the order will garner an additional 18 plate appearances over the course of a season. As a result, hitting someone like Ordonez 2nd instead of 8th over the course of a full season gives him 108 more plate appearances. But in the end this analysis probably shows that rather than lineup construction, having a productive hitter at each position is vastly more important in scoring runs - especially in the National League where one or two weak hitters in addition to the pitchers creates a situation where over a third of your offensive innings are poisoned with weak hitters.

And speaking of weak hitters, the Cubs offensive woes continued today in a 5-2 loss to the Cardinals. One of the things that is particularly frustrating about watching the Cubs is how frequently they fail to score runners that reach base. In a discussion brought to my attention by my father-in-law you can rank teams by their efficiency in bringing runners around to score. A quick and dirty way to tabulate this "Run Scoring Efficiency" is:

RSE = (R-HR)/(H+BB-HR)

In other words this formula gives a rough indication of how effective a team is in being able to bring runners in to score. What this formula lacks is how often a team can score without the aid of a homerun since even though homeruns are subtracted from both the numerator and the denominator the runs component still includes runners who reach base and then subsequently score on homeruns. As of last Thursday here were the top 10 worst teams in the majors.

If data were available to calculate the more correct version I'm sure the Cubs would rank at or near the top because of these teams they've hit the most homeruns (107). Another rough indication is what percentage of the team's runs are the direct result of a homerun calculated by dividing homeruns by runs scored. Here are the top 10 in that list as well.

Here the Cubs rank at the top and once again a better statistic would account for runners on base when a homerun was hit (I recently heard that the Cubs ranked first with 47% of their offense coming from homeruns but I don't have a complete list). Of course it helps if your team hits alot of homerun which the Cubs do and the Expos don't but still it shows how one-dimensional the Cubs offense is right now since they rank 22nd is on base percentage, 28th in stolen bases, and 10th in strikeouts.

Thursday, July 08, 2004

This should be required reading for anyone sucked into seeing Fahrenheit 911. I've not seen the movie, only clips and an interview or two with Moore. I put the movie in the same intellectual category as alien abductions, the grassy knoll shooter, the Clinton Files, the CIA marketing drugs in the inner city, and the belief that O.J was innocent.

A week or so ago I was watching an Astros game on ESPN when David Justice, former player and the analyst on the broadcast, began talking about Roger Clemens. In particular he noted that Clemens had recently said that 1) the inclusion of the pitcher in the lineup made pitching much easier in the National League and that 2) pitching to the 7 and 8 hitters in the National League was much easier than pitching to the 8 and 9 hitters in the American League.

Although I wasn't suprised by the former statement I was by the latter. I was skeptical of Justice's comment since my reasoning had always been that with the inclusion of the pitcher National League teams would generally prefer better hitting over defense in the remaining 8 positions. I then assumed that Clemens likely perceived the 7 and 8 hitters as weaker since he was able to pitch around them more frequently in order to get to the pitcher. Of course, the other side of the coin is that NL teams play in a more restrictive run environment and so are more prone to choose good defense over good offense at shortstop, second base, and center field.

To try and put together some numbers to answer the question I turned to Retrosheet. In less than an hour over lunch I did the following:

1. Downloaded the event files (play-by-play data files) for the AL and NL for 1992 (the most recent year available)
2. Generate comma-delimited files using the BEVENT.exe utility also on the site
3. Take the CSV files and loaded them into a SQL Server database on my laptop
4. Run a few simple queries to see how various positions in the batting order performed

What I found is that from this data (again, only for 1992) it appears that Justice may be on to something although the differences are not monumental. In fact, it could be argued that since the NL numbers likely include many instances for the 8 hitter in particular of being pitched around, the NL numbers are at least on a par with the AL. On the other hand, the number of walks drawn is not significantly different. It should be noted that on base average (OA) was calculated without the intentional walks (IBB) added in.

This then led me to consider how other positions fared and I was able to tweak my queries a bit to get the second data set. Once again nothing earth shattering here although the NL 3 and 4 hitters outperformend the AL 3 and 4 hitters which may indicate that the AL lineups have more balance.

Wednesday, July 07, 2004

The lack of offense the Royals have exhibited in recent weeks is now bordering on the unbelievable. After being shut out tonight 12-0 by the Twins for the third straight night the Royals are now on a 1-12 streak during which they've scored 27 runs (2.08 per game) and given up 91 (7 runs per game). The Royals have also now scored in only 2 of their last 45 innings. Overall, the Royals are now second to last in the AL in offense with 340 runs scored in 82 games (4.15 runs per game) leading only the Mariners at 336 runs.

Speaking of woeful offenses, the Cubs are also now scuffling having scored 2, 0, 2, and 0 runs in their previous 4 games. All their hitters look terribly impatient and are not hitting the few strikes they do swing at. Meanwhile they entered tonight's game 1 game in front in the Wild Card race and 5 games behind St. Louis. As I type this St. Louis is about to win again and the Dodgers are beating up on the Diamondbacks which will pull them even in the Wild Card and put them 6 games behind the Cards.

One of the interesting functionalities of the Pocket PC is its Notification API. Unfortunately for .NET Compact Framework developers for now this API is in managed code only. As a result, a developer could use the OpenNETCF Smart Device Framework which contains a Notification class. However, if you don't wish to download and reference the entire SDF you could create a wrapper in the Compact Framework to add notification bubbles to your applications. Just such a class was written in C# by Seth Dempsey of the CF team and then posted by Gokhan Altinoren on the web. Well, here is the class rewritten in VB .NET.

'For SHNP_INFORM priority and above, don't display the notification
'bubble when it's initially added the icon will display for the
'duration then it will go straight into the tray. The user can
'view the icon / see the bubble by opening the tray.
Private Const SHNF_STRAIGHTTOTRAY As Integer = &H1
'Critical information - highlights the border and title of the bubble.
Private Const SHNF_CRITICAL As Integer = &H2
'Force the message (bubble) to display even if settings says not to.
Private Const SHNF_FORCEMESSAGE As Integer = &H8

_
Private Structure SHNOTIFICATIONDATA
Public cbStruct As UInt32 'For verification and versioning
Public dwID As UInt32 'Identifier for this particular notification
Public npPriority As UInt32 'Priority
Public csDuration As UInt32 'Duration of the notification
Public hicon As IntPtr 'Icon for the notification
Public grfFlags As UInt32 'Flags
Public clsid As Guid 'Unique identifier for the notification class
Public hwndSink As IntPtr 'Window to receive command choices
Public pszHTML As StringPtr 'HTML content for the bubble
Public pszTitle As StringPtr 'Optional title for bubble
Public lParam As UInt32 'User-defined parameter
End Structure

'/ '/ This function removes a notification.
'/ '/ The class of the notification to remove.
'/ The unique identifier of the notification to remove.
'/ Do not set the value of this parameter to 0 that option is not implemented.
'/ If the value of this parameter is set to 0, the function will return an
'/ error.
Public Sub Remove(ByVal clsid As String, ByVal ID As Integer)
NotificationRemove(New Guid(clsid), ID)
End Sub

'/ '/ This function removes a notification by ID. The value of the guid property
'/ will be used as the class of the notification.
'/ '/ The unique identifier of the notification to remove.
'/ Do not set the value of this parameter to 0 that option is not implemented.
'/ If the value of this parameter is set to 0, the function will return an
'/ error.
Public Sub Remove(ByVal ID As Integer)
NotificationRemove(m_clsid, ID)
End Sub

'/ '/ This function removes a notification by class. The value of the ID property
'/ will be used as the ID of the notification.
'/ '/ The class of the notification to remove.
Public Sub Remove(ByVal clsid As String)
NotificationRemove(New Guid(clsid), 0)
End Sub

#End Region

#Region "Others"

'/ '/ This function removes the last notification added, if any.
'/ Public Sub RemoveLast()
NotificationRemove(m_clsid, 0)
End Sub

#End Region

#End Region

#Region "Public Properties"

'/ '/ A number when tagging notifications with an ID. If no ID
'/ specified during an Add(), ID property is incremented by one, used as ID
'/ and saved back into ID property.
'/ Public Property ID() As Integer
Get
Return m_baseID
End Get
Set(ByVal Value As Integer)
m_baseID = Value
End Set
End Property

'/ '/ A String that contains a GUID
'/ If this property is not set,
'/ {A6EF1E8E-5BC0-49d9-B03F-08BC163DADC8} is used as GUID. '/ Public Property clsid() As String
Get
Return m_clsid.ToString()
End Get
Set(ByVal Value As String)
m_clsid = New Guid(Value)
End Set
End Property
#End Region

End Class
End Namespace

In rewriting this code in VB .NET one of the major differences is that the C# code
used the unsafe and fixed keywords to handle string pointers within the notification
structure. These keywords are not available in VB and so a managed string pointer
class was created (as documented in our whitepaper on MSDN) as shown here.

' Allocates a block of memory using LocalAlloc
Public Shared Function AllocHLocal(ByVal cb As Integer) As IntPtr
Return LocalAlloc(LPTR, cb)
End Function

' Frees memory allocated by AllocHLocal
Public Shared Sub FreeHLocal(ByVal hlocal As IntPtr)
If Not hlocal.Equals(IntPtr.Zero) Then
If Not IntPtr.Zero.Equals(LocalFree(hlocal)) Then
Throw New Win32Exception(Marshal.GetLastWin32Error())
End If
hlocal = IntPtr.Zero
End If
End Sub

Tuesday, July 06, 2004

As I've mentioned before in this space one of the benefits of being a member of SABR is receiving the The National Pastime: A Review of Baseball History published each year by SABR and which includes articles written by SABR members. I've just received my copy and was delighted to read an article by Richard Peurzer titled "The Kansas City Royals Baseball Academy". In this post I'll summarize the article and then provide a little analysis from a sabermetric perspective.

In the article Peurzer recounts the history of the academy, the brainchild of late Royals owner Ewing Kauffman who established the franchise in Kansas City in 1969 and opened the academy in Sarasota Florida in 1970. The 121 acre facility included five fields, dorms, offices, classrooms, and a swimming pool and was staffed with a combination of ex-big leaguers, college coaches, and experts outside of baseball in fitness and weight lifting, track, ophthalmology, and psychology. Syd Thrift, who was already employed by the Royals scouting department was tasked with running the academy.

Kauffman established the academy because "he was disenchanted with the conservative nature and the resistance to change found in the baseball establishment." He wanted to apply innovation to baseball in the same way he applied it in his business ventures with Marion Labs where organizations used technology to continually refine their practices. Indeed, the establishment of the academy was quite revolutionary as Peurzer compares it to other improvement efforts in the history of the major leagues, none of which were so extensive or required so much investment.

What most interested me was the recruiting methodology used. It was the goal of the academy not primarily to refine baseball skills but to reach untapped talent. In fact, it was Kauffman's belief that "an athlete did not necessarily have to play baseball all his life in order to be a good baseball player" and that there were lots of athletes who were not scouted or that never played much baseball. And in that vein the tryouts that were held across the country selected 43 out of 7,682 candidates to be the first class at the academy. In particular, four physical attributes (and several psychological ones) were used to make the selections:

Excellent running speed (specifically run a 60 yd dash in less than 6.9 seconds)

Exceptional eyesight

Fast reflexes

Superb body balance

These attributes were apparently arrived at by testing 150 players, many of whom were already within the Royals organization. These 43 athletes from 23 states were then sent to the academy for 10 months during which they would play 150 games against other pro teams in exhibitions as well as the Gulf Coast League. The players would receive detailed instruction (and attend classes at a local junior college) every day. Some of the innovations created by the academy included the use of pitching machines for both hitting and fielding, weight and strength training using resistance techniques such as rubber bands, stopwatches, and visualization.

These innovations seemed to payoff in wins and losses as the first class from the academy went 40-13 in the Gulf Coast League, easily wining the championship. In particular the team stole 103 bases in 119 tries, 48 more stolen bases than the second best team. Despite the apparent success the second academy class was scaled down to 20 players and supplemented with Royals minor leaguers for short stints. Still, the team went 41-22 and stole 161 bases. The third class included only 14 players and the team went 27-28, however still leading the league in steals by a large margin. The academy was shut down probably both because of its economic impact, it cost $1.5M to build and $700,000 annually to maintain, and the fact that the old school baseball people running the Royals such as manager Charlie Metro just didn't believe in the concept and coveted the money "which could have been used in the traditional player-development programs."

AnalysisIn the end it seems to me the academy was a success in two primary ways. First, it graduated 14 players to the majors out of the 77 players, most of whom would never have even been drafted. That seems a pretty high percentage when compared with the draft and includes Frank White, Rodney Scott, U.L. Washington, and Ron Washington.

Second, the academy was clearly successful in concentrating on advanced training techniques such as weightlifting and some innovative techniques that are still used today. For example, it can be argued as Peurzer does that the academy's focus on base stealing as a skill helped start the revolution in base running in the late 1970s and 1980s. For example, at the academy they determined that an average runner could take a 12-foot lead off first and a 27-foot lead off second and also used stopwatches to time the delivery of pitchers and release times of catchers to calculate the probability of successfully stealing bases. This clearly contributed to the academy teams outstanding base stealing success. Kauffman's analysis that technology was not being used effectively in baseball was largely born out.

On a higher level though the academy failed in two ways as well. First and most importantly, although the academy graduated 14 players to the majors in just 3 years, the methodology of selection, concentrating on raw athleticism and untapped talent in inner cities, made it a forgone conclusion that those graduates would likely be light-hitting middle infielders who were good base stealers and not offensive producers. This is the case since the academy was (perhaps unwittingly, Peurzer doesn't address this) selecting for the positions that required the most athleticism and the fewest refined skills. So while Kauffman was correct that applying technology could make a difference in baseball, it was applied at the academy without the critical information as to what actually wins baseball games. The sabermetric revolution started to pick up speed at roughly this same time with the publication of The Baseball Encyclopedia although it would be almost a decade before the publication of The Hidden Game of Baseball by Pete Palmer and John Thorn and wide distribution of Bill James' Baseball Abstracts and a further twenty years before big league GMs would start applying some of the principles as documented in Moneyball. As a result, the academy teams were basically one-dimensional teams (although still successful overall). And so lacking this information, the academy probably would not have been successful in producing top-level major league players.

Second, the academy failed because it didn't convince the entrenched baseball establishment to adopt its approach directly. Kauffman remained a believer and later stated "that he believed that the Royals would have been better off keeping the Academy alive." The failure to make believers out of the front office might be linked to the fact that no super prospect was immediately found, background racism based on the makeup of the classes, or simply to what Charlie Metro called the "crazy instruction". In any case this really can't be laid at the feet of the academy since baseball has historically and continues to be an extremely conservative industry.

Oh boy! Since Carlos Beltran last appeared on the field for the Royals on June 24th, the Royals are 1-10 having scored just 27 runs (2.45 per game) while their opponents have scored 75 (6.82 runs per game). During this period Joe Randa, David DeJesus and Mike Sweeney have all been sidelined for part of the time as well (they now have 10 players on the disabled list and have used 47 players this season - a new major league record before the All-Star break!). As a result the lineup is packed with weak hitters with no extra base power who don't get on base. Yikes. In an earlier post I said that the Royals would probably come back to win around 75 games. For them to do that they'll need to go 46-37 the rest of the way. With this offense combined with the poor pitching and defense that is unbelievably unlikely.

With Sweeney now out for 4 days and counting it would have been a good time to call up Calvin Pickering. However, the Royals have made clear that they don't feel Pickering is a bona-fide prospect any longer at 29 years of age. Pity - because he's a better hitter than anyone else on the team and he's earned the right to be in KC with his performance in Omaha.

So what the Royals did do this week was tade for former prospect Ruben Mateo from the Pirates for cash and send Byron Gettis back to the minors. Mateo is a "tools guy" but unfortunately he's never used his tools at the major league level. He had a good stint at Louisville (AAA) last year hitting .327 with a .408 OBP in half a season but has never shown any patience at the big league level drawing only 46 walks in over 800 plate appearances. If he can find some patience the Royals are the one team he could stick with. As for Gettis (24), I'm glad he's being sent back down since he wasn't getting any playing time (only 39 at bats since being called up on May 27th).

Friday, July 02, 2004

I haven't written anything recently about the Cubs. Initially it was because I was discouraged with how the team was performing and all the injuries and recently I've just been too busy. Anyway, heading into the July 4th weekend series against the White Sox (which I'll be enjoying with family in Iowa) the Cubs are holding their own, second place in the division 3 games behind the Cardinals.

The Cubs had a good stretch in June winning 9 of 10 against their stiffest competition of the season (Angels, A's, Astros) and going 15-12 for the month. Add to that the fact that both Mark Prior and Sammy Sosa returned from injuries and there is good reason to be optimistic. What is most pleasing for Cubs fans, however, is:

That Derrek Lee got hot (as Steve Stone kept saying he would) in June with an OPS of 1.117 and really hit the ball well until the end of last week. It now puzzles me even more as to why Lee is hitting 7th much of the time instead of 2nd. On the other end of the spectrum Michael Barrett had an OPS of just .544 in June after getting hot early. Dusty is flip-flopping them around apparently diving the batting order based on their biorythms or something.

That Corey Patterson appears to have regained his form from before his injury last season and now has his average up to .287. More importantly he has walked 25 times and so his OBP is .347, not grand but a far cry better than what he was doing early. In watching his at bats he is being much more patient and seeing more pitches. This was brought out in yesterday's WGN broadcast which showed how in June he saw 413 pitches in 112 plate appearances good for 3.69 p/pa while in April and May combined he saw 660 pitches in 204 plate appearance good for 3.24 p/pa. He's also hitting .294 against lefties and so bringing in a lefthander to face him does not often have the desired effect. Add to that the fact that he's not committed an error yet (although a few days ago he got a real gift from the official scorer when he clearly dropped a ball) and it's all upside for the Cubs right now.

Kerry Wood will make a start at Iowa next week and then they may be back to full strength. I'd still like to see the Cubs find a shortstop who can hit a little so they don't have to play Rey Ordonez. They could also move Grudz to short since Walker's bat is missed when he's out of the lineup.

One of the interesting aspects of this season is that the Cubs will be finished playing the Cardinals before the end of July and the Astros by the end of August. That means that games against Milwaukee, Pittsburg, Cincinnati, Florida, and the Mets in August and September will tell the tale for the Cubs.

Well, since the season is a bust it's time to look at 2005. Here's what I think the Royals should do now to prepare:

Bench Desi Relaford. He's a great utility player and can hit lefties off the bench but will likely never be a regular. Stop wasting his at bats and give them to Jose Bautista at third to what he can do

Bench Matt Stairs. I like Matt but again, he's on a 1-year deal and so he's gone after this season. Instead give all of those at bats to Dee Brown and Byron Gettis

Move Ken Harvey to left field. It would be interesting to see if Harvey can play some outfield if it looks like the Royals won't be able to move Sweeney. This would give the Royals some flexibility in the offseason as well

Bring up Calvin Pickering. He's earned some PT and should be put at first base with Sweeney DH'ing

And then Gettis would be your 4th outfielder. While this lineup won't score that many runs and its not any better defensively it can't really be worse than what we've seen since the Beltran deal.

On another subject I scored last night's 3-2 loss to the Orioles. I was quite impressed with Zack Grienke as it was the first time I'd seen him live. He really changed speeds well, located his fastball, and in one at bat threw a 93mph fastball and a 63mph curveball. He's the real deal. Not sure if he simply got tired but his location was bad in the 7th when Palmeiro took him deep and then he gave up back to back doubles.

Thursday, July 01, 2004

Great post by Ron Hostetter on the Royals where he points to The Stat Guy columns in the KC Star. As noted in the column the Royals defense has been poor as evidenced by their Defensive Efficiency Rating (DER) (the percentage of balls put in play that were turned into outs) and Zone Rating (percentage of balls hit into certain zones that made into outs). To pile it on it should be noted that they're also third from the bottom in giving up unearned runs with 50 (half as many as the Devil Rays - the team being compared to the Royals in the Stat Guy column) behind only Arizona (51) and Boston (60) and have committed 65 errors. This was brought to my attention vividly yesterday while scoring the game as they gave up 4 unearned runs with three errors.

As of today the runs scored and runs allowed looks as follows:

RS RA ER
Royals 328 418 368
Rays 344 365 340

This gives the Royals a pythagorean winning pct. of .381 and the Rays .470 when in actuality the Rays are at .507 and the Royals at .387. So the Devil Rays have also gotten more out of less than the Royals have. All other things being equal (which they're not since luck plays a role) it seems that team speed is the X-factor in this case allowing a team to take better advantage of misplays and errors that squeek out a game every now and then, something the Royals are definitely lacking but that the Rays have. However, even taking into account only earned runs the Royals still have a deficit of 44 runs with the Rays (16 on offense and 28 on defense) which makes a bigger difference than you might think since winning percentage varies with the square of runs scored per the pythagorean formula. So while I agree that the Royals defense is bad (particularly in the outfield corners and the right side of the infield) the hitting and pitching have been equally as bad ranking near the bottom in most categories and just as a good team is more than the sum of its parts (e.g. the 1998 Yankees), a team that is bad in all aspects is less.