Monday, February 22, 2010

* I should tell you upfront that what follows is offered tongue-in-cheek and is beyond lame.

WAR has really exploded in popularity thanks to its implementation at Fangraphs and Chone's site. The acronym is becoming fairly ubiquitous, and as such becomes somewhat stale. WAR sounded really cool at first; now it's overplayed.

Of course there are some minor differences out there--you have WARP, which adds "player" to the end, and VORP, which switches the metric name from what it is in baseball terms (wins) to what it is in asset terms (value). One could come up with any number of different twists on the same basic name.

How about getting away from the use of "replacement" and "average" though? You could use "freely available player" or "marginal" in place of "replacement", and of course "mean" can replace "average. The latter gives you Runs Above Mean (RAM) and Wins Above Mean (WAM), which are both great. RAM! WAM! Thank you ma'am.

One could also substitute in one of my favorite old baseball words for "replacement". I have been enthralled with this word ever since I first read it in Bill James' original Historical Baseball Abstract nearly fifteen years ago--yannigan. As Jim Baker explains:

The term "yannigan" was popular as a the name for any rookie, replacement or second-line player. It has a certain negative connotation to it, like the modern "scrubeenie"; it just *sounds* derogatory.

I see that someone else on the net was a big fan of this term as well. I never used it for a blog, but I've considered it as a user name for message boards and the like, and I named a team in my OOTP league the Miami Yannigans.

But now I've finally found the place to use it--in a sabermetric stat name. Runs Above Yannigan (RAY). Wins Above Yannigan (WAY). The latter is great, as it can take on a cult-like aura if you use it properly, which will drive the critics of sabermetrics crazy..."According to the WAY...", "I am the WAY, the truth..." It's as good as the names that Barry Codell comes up with for his stats.

Okay, I'm going to stop now before I either get struck down by lightning or taken away by the men in white coats.

* I have started posting guest scoresheet submissions on my scorekeeping blog, Weekly Scoresheet. So far, there have been two posted, and a third will follow next week. I realized shortly after I made my appeal for scoresheets that I should have waited until the season was in swing, so I'm hoping I'll get some more submissions at that time. The first three submitted scorecards have all had some interesting feature, and I can't wait to see what else is out there.

* Lee Panas of Tiger Tales has published a book called Beyond Batting Average. I had the opportunity to review the section on run estimators and offer feedback, so I can tell you that it is an intelligent and fairly comprehensive primer on sabermetric concepts and metrics. As such, its target audience is probably not most readers of this blog, but Lee's straightforward approach and knowledge will make it a good resource for those who are just getting into sabermetrics. The link to purchase the book is here.

Sunday, February 14, 2010

There has been a lot of chatter in the sabermetric community about SIERA, the new DIPS-style ERA estimator using batted ball data developed by Eric Seidman and Matt Swartz for Baseball Prospectus. Before even commenting on the metric itself, I think it is worthwhile to applaud BP for their recent hires (including Swartz and Seidman, but also Tommy Bennett and Colin Wyers) that have reinvigorated the sabermetric aspect of their operation--and for their openness in sharing the development of SIERAinafive-partseries.

SIERA is innovative in that it does not consider line drives (except to the extent that they also represent a plate appearance), citing the low correlation in year-to-year line drive frequencies; it considers groundball rate with a denominator of plate appearances rather than balls in play; it treats flyballs and pop-ups equally and as offsets to grounders in a (GB - FB - PU)/PA term; and it is based on a regression equation with quadratic and interactive terms.

It is the latter property of the metric which makes someone who approaches sabermetrics from my particular viewpoint and biases blanch a bit. My personal reaction to any sort of "ugly" regression equation is to wonder if there is a way to accomplish the same objective with similar accuracy while using a more intuitive (to me) model. Naturally, whenever run estimation is involved, that leads me to Base Runs, which is the most intuitive simple model of team run scoring (IMO--of course, this blog is always just my opinion and nothing more, but I'm being extra careful today).

So I quickly threw together a Base Run estimator, using only batted ball types (GB, FB, LD, and PU), walks, and strikeouts. I further divided batted balls into just two categories--groundballs (G) and non-grounders (nG = FB + PU + LD). This is not exactly the same as what SIERA does by using (GB - FB - PU)/PA, but it does embrace the not-so-fine breakdown of batted ball types that SIERA introduced.

I then estimated singles, doubles, triples, homers, and outs from G, nG, and K. To do this, I used the data published by Colin Wyers here on event rates by batted ball types , and figured a weighted average for non-grounders by using the GB, FB, PU, and LD data published by BP. This posed a bit of a problem, as Colin used Retrosheet definitions to derive his figures while BP uses a different data source; such is the nature of working with batted ball data without a DIY-approach.

From there, I ensured that estimated singles equaled actual singles for the 2009 major leagues (and doubles, etc.) and simply plugged everything into BsR:

NOTE: As commentator Bryan points out, this formula is riddled with errors. I'll leave it here as to not hide my stupidity, but please see comment #4 for a corrected set of equations.

It is imperative to point out that what I did was clearly cheating from the standpoint of testing a formula--I calibrated it against the very dataset against which I am going to test it, rendering the test essentially useless. I did this simply for ease--my point in this entire exercise is not to supplant SIERA but just to raise the question of whether a different model can incorporate Seidman and Swartz's findings.

There is also a great deal of phony precision (i.e. ten-thousandths place decimals) on display, and a great deal of simplification that could be done to the terms (everything could be written in just four variables--G, nG, W, and K, but unless one is actually intending to use the formula on a regular basis doing so is a waste of time). Also, a better BsR B factor could have been used--I stuck with one I've used forever despite being less than optimal.

I then applied Pseudo-SIERA to all pitchers with >= 200 PA in 2009. The RMSE in estimating ERA for the real SIERA was 1.00; for Pseudo-SIERA, 1.06. Of course, as I already made clear, Pseudo-SIERA had the advantage of being calibrated specifically on this dataset.

Of course, the more telling test of a SIERA-type metric is how it does at predicting future ERA, something that I have obviously not tested here. There's really no need, at least with this implementation of the pseudo formula--I have no expectation that it would outperform SIERA. My goal here was just to incorporate a couple of ideas from Seidman and Swartz's work into a BsR model, and demonstrate that such a model has potential to be used in conjunction with those ideas. Nothing more.

On a final, unrelated note, I posted the first guest scoresheet contribution to Weekly Scoresheet yesterday. I have two more ready to go, and hopefully there will be more to come. I realized shortly after sending out my initial request for scoresheets that I should have waited until the season was underway and people would be more likely to have scoring on the brain (and scoresheets sitting around where they could be easily procured). Oh well.

Saturday, February 13, 2010

In 2009, Ohio State won the Big Ten regular season title and finished second at the Tallahassee regional. However, OSU's 37-6 loss to Florida State in regional final game one exposed the team's glaring weakness--one which remains the only significant question mark about its potential to repeat as champs in 2010. That weakness is pitching depth.

Last year, with the Big Ten's return to three nine-inning games per weekend series, OSU was able to ride a handful of pitchers (Jake Hale, Drew Rucinski, Alex Wimmers, and Dean Wolosiansky) for the key conference innings. Beyond them, the staff was shaky at best, and it showed in an unusually poor showing in mid-week non-conference games.

Hale, the 2009 closer, has exhausted his eligibility, leaving the aforementioned three hurlers to lead the way this season. Wimmers is the unquestioned ace of the staff--he took home Big Ten Pitcher of the Year honors in 2009 and is a possible first-rounder in the June draft. Behind him, Wolosiansky should be the #2 starter. His numbers are not impressive (7.02 RA), but he was solid as a freshman in 2008 and should be a reliable innings-eater in his junior campaign.

It remains to be seen whether fellow junior Rucinski will be used as a starter or as Hale's replacement at the back end of the pen. It seems as if the latter is more likely. Rucinski should be one of the staff's key assets in either role.

Senior lefty Eric Best could also be the #3 pitcher or closer, but he underwent arm surgery and may not be ready out of the gate. Junior lefty Andrew Armstrong, who would have been a starter last season, is still hampered by injuries and may not pitch at all in 2010.

That leaves sophomore Ross Oltorik and freshman Brent McKinney as the likely combatants for #3 starter and middle relief duty. Oltorik walked 26 batters in 31 innings in his freshman campaign and will need to sharpen his command to take on a key role. McKinney was impressive in fall practice and my guess would be that he will get the first crack at starting.

Juniors Jared Strayer and Theron Minium figure to be the extra pitchers, getting work primarily in mid-week games and hoping to impress their way into more important roles. Minium, a left-hander, could get a crack at starting. Junior Eric Shinn, true freshman Cole Brown, and walk-ons Brian Bobinski, Paul Guey, and Drew Malley don't figure to see much action.

While the pitching staff may be thin, the Bucks may have a hard time finding enough playing time for all of the worthy position players. Junior Dan Burkhart will start behind the plate; he was the team's best offensive player last year (.354/.438/.589) and was named Big Ten Player of the Year. Senior Shawn Forsythe will be his backup, with true freshman Steele Russell (son of Pirate manager John Russell) an intriguing possibility to inherit the position down the line.

Junior Ryan Dew is a low-K, low-W, high BA guy with a very un-Ichiro body type. While he is unlikely to hit .388 again, he'll move in from the outfield to play at first base, vacated by last year's senior captain, Justin Miller. Junior Matt Streng, who showed surprising power when given the opportunity to play last year (8 homers in around 200 PA), will share time at first and DH with Dew.

Senior on-base machine Cory Kovanda will start at second, while junior Tyler Engle returns at shortstop. The two made a fine defensive combination, and Engle showed much improve plate discipline (28 walks in 158 PA) to become an offensive contributor in 2009. He bumped '10-senior Cory Rupert off short; Rupert now figures to start at third, as he did in the latter part of the '09 season. He had a disappointing year at the plate (.279/.329/.388) and could consequently lose time to redshirt freshman Brad Hallberg. Redshirt freshman Ryan Cypret (son of assistant coach Greg) will be a general purpose infield backup.

The outfield will feature senior Zach Hurley in left, senior Michael Stephens in center, and junior Brian DeLucia in right. Hurley was a late pick of the Marlins in the June draft and will attempt to move up draft boards for 2010 by following up on a very good season as OSU's leadoff batter (.346/.421/.510). Hurley has the fielding chops to handle center, but those duties fall to Stephens, who provided much-needed power (leading the team with 14 homers, 63 RBI, and a .608 SLG). His only weakness is drawing walks (just 11 in 248 PA).

DeLucia missed much of last season with a broken thumb, and moving from third base to the outfield should take pressure off that digit. He hit two longballs in just 11 AB prior to his injury and will hopefully be another power source in the lineup. Previous right fielder Michael Arp, now graduated, was the offense's weakest link in 2009 (.295/.345/.405), and so DeLucia is a good bet to match the Bucks' previous production from the position.

Senior Chris Griffin will serve as a reserve outfielder; his value lies in fielding and baserunning and not in his bat. Other outfield reserves are sophomore David Corna, redshirt freshman Joe Ciamacco, and true freshman Hunter Mayfield of Tallahasee, Florida. I expect to see Mayfield get much of the available playing time for reserves as he will be expected to fill one of the two vacated outfield spots in 2011.

The Buckeyes will open the season next Friday (February 19) with a weekend in Jacksonville against North Florida, Florida A&M, and Richmond. The following weekend is the Big Ten/Big East challenge, in which the Buckeyes will face South Florida, Notre Dame, and Cincinnati at various ballparks in the Tampa Bay area. The first weekend of March will see the team hosting a tournament in Port Charlotte against Duquesne, St. Louis, and Fairleigh Dickinson. Then OSU travels to Tennessee's tournament; in addition to the host, the Buckeyes will face Marshall and UConn.

From March 19-March 25, the team will be on its annual spring break trip, facing Bucknell, Eastern Illinois, Army, Cornell, Bethune-Cookman, Dartmouth, South Florida, and Webber International. On March 31, OSU will finally be at home, once again facing Toledo in the home opener. The other mid-week opponents will be Xavier, Akron, Marshall, Louisville, Ball State, and Pittsburgh, all of which will be single game engagements except for a pair against Louisville.

In conference play, OSU will alternate between road and home series: @Northwestern, Indiana, @Michigan St., Penn St., @the heart of darkness, Illinois, @Iowa, Minnesota. The final series with Minnesota looms large on paper as the two nines are the consensus conference favorites. The only conference foe the Buckeyes will not play is Purdue.

The Buckeyes are a veteran team (of the nine projected positional starters, five are juniors and four are seniors) that figures to have a good offense. With any semblance of pitching depth, they should contend for another Big Ten title. If everything breaks their way, they could contend for a lot more.

But there's no need to get ahead of oneself, and start setting expectations (specifically talking about going to a certain mid-sized Midwestern city) such that the successes that are well within reach are made trivial. With its tradition and athletic budget, OSU should endeavor to be a perennial Big Ten championship contender, and that is exactly what the program has been during Bob Todd's tenure. Another Big Ten title, be it regular season or tournament, would make the season a success. It would be foolish to explicitly predict one, given the nature of baseball and the inherent uncertainty of life, but it would also be foolish to not give this team your attention.

Tuesday, February 09, 2010

It seems as if this is a post that I write in one form or another every year. As such it may seem as if it is a topic that captivates me, but it really doesn't: I don't put the resulting figures to much if any use in my own comparisons of pitchers. Still, there are some points that I think are simple, yet oft-ignored, and as such they bear occasional repeating.

Mainstream fans and writers remain invested in the notion that win-loss records for individual pitchers are important; they certainly don't value them as much as their fathers did twenty years ago, but W-L records have yet to be completely discarded as a comparative tool and likely will not be for some time. Accepting that this is the case, how can one go about using them in the most effective way?

It should be obvious to everyone, not just sabermetricians, that W-L records include a lot of noise from factors other than the pitcher's performance, most importantly the performance of their team's offense when they happen to pitch. Performance by relievers and fielders certainly play a role as well, but data on those aspects of the game is scarcer, particularly as one moves back in time.

Historical run support data is available, but it is often ignored in favor of just looking at the team's overall winning percentage (usually the team's W% when the given pitcher does not get the decision). The pros of this approach are:

1) it is simple
2) it does (but only to some limited extent, largely drowned out by other noise) capture the bullpen and fielder effects that are ignored by run support alone
3) it allows one to dispense with league and park adjustments since W% is always anchored at .500

The cons are (beyond the issues that W-L record already brings to the table):

1) it does not isolate performance when the pitcher actually pitches; some will receive lousy run support despite pitching for good offensive teams
2) it allows the performance of the team's other pitchers to greatly effect the comparison; the classic example is that it was difficult for a Steve Avery to exceed the performance of Greg Maddux, Tom Glavine, and John Smoltz. In fact, comparing a pitcher's W% to that of his teammates implicitly assumes that all of the deviation from .500 observed in teammate W% is attributable to factors that benefited the pitcher in question to the same degree.

On the career level, some of these factors will wash out to some extent. A pitcher with a long career is more likely than not to pitch for teams whose deviations from .500 are somewhat balanced between being caused by runs scored and runs allowed. Over the course of a pitcher's career, it's likely that his run support will be about the same as his team's average runs scored.

So using teammate W% (which I'll call Mate, for kicks if nothing else) should be a reasonable approximation for a more in-depth examination of run support (and bullpen/fielding support), and should give us a better read on pitcher value than just looking at W-L without any adjustments whatsoever.

At this point, the natural inclination of most people is to simply subtract Mate from W%. This is what Ted Oliver did in his Weighted Rating System, and it's what Neft and Cohen did in early editions of the late, great Sports Encyclopedia: Baseball. However, given the fact that the average team is going to deviate from .500 in equal parts due to its offense and defense (defense being defined as pitching + fielding), doesn't it make sense to remove the defensive part of the deviation?

Of course, some of the team's defensive performance does move the baseline expectation for an individual pitcher from .500. However, since any metric based on W-L is going to be somewhat crude by its nature, let's just assume that the half of the team deviation from .500 that is attributable to its defense should be removed altogether for the purpose of setting the baseline W% for the pitcher in question. Let's also keep it simple and assume that team W% is a linear function of runs and runs allowed. Then we can say that:

Expected W% for pitcher = (Mate - .5)/2 + .5

What this does is take half of the team's deviation from .500 (the half that we are crediting to the offense) to estimate the W% of an average pitcher placed on this team. We can simply that equation to Mate/2 + .25. Thus, an average pitcher on a .500 team is expected to go .5/2 + .25 = .500, of course. On a .600 team, he should have a .550 W%.

This approach will actually lessen the strength of the adjustment for pitchers on non-.500 teams; if you simply compare W% to Mate with no adjustment, you will give a greater boost to pitchers on losing teams and a take a larger bite out of the records of pitchers on winning teams. However, I believe this adjustment (originally proposed by Rob Wood in By the Numbers) is more useful for the reasons discussed above.

Since fielding and relievers do play a part in determining the W% of an individual pitcher, we could make amount of regression something less than 50% to attempt to capture that--we could make it 40% towards .500, or 45%, or whatever value you'd like to make a case for. 50% is simple and easy to explain, though, and chasing precision in a metric built on W-L record is a fool's errand.

With this in place, we can figure Neutral W% (the W% we expect for the pitcher given that he was on a .500 team) as:

NW% = W% - Mate/2 - .25 + .5 = W% - Mate/2 + .25

A .600 pitcher on a .600 team will be given a NW% of .550, rather than .500 under the Oliver approach.

With NW% in place, it is a snap to figure wins above a baseline. Wins Above Team, used by Thorn and Palmer to denote wins above a .500 pitcher placed on the team, is:

WAT = (NW% - .5)*(W + L)

One could use IP/9 or some other method for neutralizing decisions, but I have decided to assume that the pitcher would receive the same number of decisions regardless of the quality of team he pitched for. This may not be true or completely fair, but I think it's close enough and it preserves simplicity.

Of course, any baseline can be substituted for .5 (average); I personally use .390 for replacement level, so it is very easy to figure Wins Compared to Replacement as:

WCR = (NW% - .39)*(W + L)

Now let's actually put this to use and talk about everybody's favorite pitchers, Bert Blyleven and Jack Morris. If we consider their records without any sort of adjustment for Mate, we have:

Morris' W% is higher, and he looks a lot better when compared to a .500 baseline. If you compare to a replacement baseline, it's pretty even--Blyleven's extra 97 decisions enable him to make up the gap in percentage.

When we consider the performance of their teammates, things will get a little bit closer. Morris' teams had a .538 W% when he did not get a decision (Mate); Blyleven's .495. That results in these neutral records:

Morris still has a higher NW%, but Blyleven is closer when compared to .500 and has the lead in WCR. Of course one can argue about the proper baseline for Hall of Fame comparisons, but I think it's fair to say that, when viewed in this light, neither pitcher clearly distinguishes himself from the other on the basis of W-L record.

I have posted a spreadsheet with career records for predominantly post-1900 pitchers who did not pitch in 2009 (with the exception of Randy Johnson, who announced his retirement). I believe that the list includes every 150 game winner during that period, in addition to a number of other pitchers who won 100 or more games. A technical note: Mate, NW%, WAT, and WCR are figured on a year-by-year basis, weighted by the pitcher's decisions in that season.

The 150 win pitchers with the highest Mate:

I'm sure someone will leave an anonymous comment complaining that I called him Miner Brown, like I did when I referred to "Hans" Wagner. I could have used Mordecai; Three Finger would force the first name column to be wider. Digression aside, all of these guys pitched significant portions of their careers for dynastic teams. Gomez had the best teammates by far, and is also a poster child for why the Wood approach is superior to the Oliver approach. Gomez beat his teammates' W% by just thirteen percentage points. Figured traditionally, he is only +4 WAT.

The 150 win pitchers with the lowest Mate:

As you know, Rick Reuschel is a pitcher who has been very much overlooked; Walter Johnson won over 400 games despite pitching for .460 teams without him; and Jack Powell is a good litmus test for just how strong your preference for value compared to replacement is.

Sunday, February 07, 2010

NOTE: This was originally posted at my scorekeeping blog, which will explain a number of statements that otherwise make very little sense.

I realize that no one visits this site, but in case anyone is out there and reads this, I would like to appeal to you to send me one of your scoresheets. This site is currently all about my scorekeeping, and I'd like to expand its horizon a little bit to showcase other people's scoring.

Don't worry about your scoresheet not being good enough or interesting enough or unique enough or whatever other excuse you might offer to be shy. There's no such thing as a wrong way to keep score, and I'd like this site to help display the myriad of ways in which baseball fans record the game.

All I ask is that the file be around 500 KB or smaller, in GIF, BMP, or JPG format, and that you also write a little bit about it (if it's larger, I might edit it a little to make it more manageable). It can be as little as a sentence or as long as a page or two. You can write about your method of keeping score, scorekeeping in general, your memory of the game in question--anything you like. Also, if you kept the game on a commercial scoresheet, please give the name of the company/designer so that they can get a little bit of a plug (and hopefully not send me cease and desist letters). If you have your own designed sheet that you'd like to offer for others to download, I'd be happy to post it on my Tripod scoresheets site.

I don't expect anyone to actually take me up on this, but I had to try. There's a dearth of scorekeeping information on the internet--the sites I link in the sidebar form a fairly comprehensive list. I'd like to make this site a place for the diversity of scoring systems and forms to be on display, and it can't be that as long as it's just my own sheets.

Monday, February 01, 2010

I'm going to present this with little comment; you can read this thread at Fangraphs and see what spurred this (there is also some similar data there from "rotofan" and "Eric R").

I found the player who appeared in the most games as a DH for each AL team, 1973-2009, and then found the average games for the team leader and the percentage of teams whose leader DHed in 75, 100, or 125 games. Remember that games played include

This chart follows the average over time:

This chart shows the percentage of teams with a DH who played in 75, 100, and 125 games:

Personally, I think the average is the most interesting figure here, and it's hard to identify any sort of DH death throes from it. The percentage of teams with DHs appearing in 100 and 125 games has declined a bit in the last few seasons, but the percentage appearing in 75 has increased. Also, remember that with just fourteen data points each season, something like Travis Hafner getting injured can cause a big nose dive, despite the fact that Cleveland fully intended to use him as an everyday DH. Of course, the strike-shortened seasons of 1981, 1994, and 1995 also throw things off--the 1981 Brewers' leader in appearances at DH was in just 24 games in that capacity, the lowest ever.