Chat: Nate Silver

Nate Silver: Greetings from Chicago, where it's colder than Paul Konerko in April. Looks like most of the questions in the queue are about PECOTA, which is fine and dandy, but feel free to fire away with other stuff too. I'll begin chatting in a few minutes.

Ben (Boston): Why haven't my giants made any moves this offseason? Don't they realize (a) that Grissom will never repeat his numbers from last year, (b) that neifi perez is not a major league hitter outside of colorado, and (c) that michael tucker is going to have a lot of trouble in PacBell? Related question: what's dustin mohr's potential, he's a giants fan's last hope in RF.

Nate Silver: Good question. The Giants played well above their Pythagorean projection last year; I think they've gotten spoiled by Barry Bonds and an improbable series of breakout performances by thirtysomethings and failed to realize how much gravity is likely to send them spiraling down toward .500.

The NL West looks like a crapshoot between San Fran, LA, Arizona, and the Padres; they're deeply flawed teams, all and it's possible that it will only take 87 games or so to win the division.

As for Mohr ... I like the guy on a team that's getting plenty of offense from other spots in the lineup and needs a cheap alternative in the corner, but I don't know if the Giants are that team. You take Bonds away, and the offense is inferior to the Tigers'. Seriously.

TB (Grafton, ND): Pecota Questions:
1. There are projections for Joe Mays and Billy Traber, and yet, there is little chance either will pitch. Why even project them?
2. Most of the higher breakout percentages (above 33%) were relievers. Any reason why as compared to breakout starters?
3. Most of the SP's number of starts is short of the number of games in which they are projected to appear. Do you really expect Clemens to toss two games in relief?? I saw only a few pitchers whose games projection equalled their starts projection.
4. Jesse Orosco! Projected at a 55% breakout as well as a 55% improve. Obviously now that's he retired we'll never know, but why would his numbers be so high in those areas?
5. Also Jamie Moyer...can we really expect a 59% chance to Improve at his age? Also, looking at the projections, his numbers are not looking like an improvement at all. Help?!
I may not understand the methodology in all of this, and I greatly appreciate your answers.
Thanks and keep up all the great work at BP.
TB

Nate Silver: 1. We project just about everybody in the first run of the PECOTAs. We'll purge or amend the projections for guys like Mays once we're closer to the start of the season and have a better read on injuries and playing time issues.

2. The relievers have higher breakout rates because they pitch fewer innings and thus have higher variance in their ERAs.

3. PECOTA has probably picked up on the fact that a couple of Clemens' comparables converted to relief late in their careers; that said, I'm not sure if we have the G and GS numbers just right. The IP projections I can vouch for.

4. Orosco has only a couple of comparables; dude is too old. We included his projection for the sake of completeness, but I wouldn't put much stock in it.

5. Pitcher improvements and collapses are not very much related to age at all; it's nothing like what you get with offensive players. Old pitchers often have high improvement rates since they tend to retire once their skills decline, unless they're David Cone.

Remangiii (Maryland): Looking over the Pecota spreadsheets only four players are projected to reach 600 AB's and AB totals seem low across the board. 32 Players had 600 AB's last year. Why the big difference?

Nate Silver: Yeah. I think it's easy to underestimate just how routine it is for guys to get hurt and miss playing time, even if they haven't had an injury history before. But part of this is an artifact of the way that PECOTA constructs its projections.

Say for example that a guy had 600 AB the previous season. The system will identify other players who had around the same amount of playing time (and who are also similar in other important ways), and project playing time based on how much those comparables played in their subsequent seasons. If, for example, a player had six comparables, and they had the following number of ABs:

600
600
600
600
600
100 (catastrophic injury)

The system will take the average of those numbers, which is like 517 or something.

Is that the best way to represent a guy's playing time? I'm not certain. It's a tricky issue, since playing time doesn't have anything resembling a normal distribution.

Jorens (Brooklyn): The PECOTA projections seem pretty lofty for Kaz Matsui, Hideki didn't reach his projections last year, do you expect Kaz to be Tejada-like?

Nate Silver: Clay Davenport did a lot of work on the Japense translations over the winter, the end result of which was to make them substantially more conservative for players coming from Japan --> US. (Clay's got a really interesting article on translating Japanese statistics in this year's book, BTW, yet another reason to pre-order if you haven't yet!).

Kaz still ends up with a really nice projection. He's been a fine player for a number of years, and has a good distribution of skills. I think it would be easy to overcompensate for the disappointing year that Hideki had.

A.J. Morris (Houston, Texas): How much stock should we put in the PECOTA projections for 2003 draftees?
For example, as a Ranger fan, I'd like to be fired up about how optimistic PECOTA is about Wes Littleton and Jeremy Cleveland.
But given that you are projecting off of, basically, a half-season's worth of data, is PECOTA really any more reliable than a Ouija board with these guys?

Nate Silver: I wouldn't pay much attention to projections for guys with less than half a season under their belts. Once we get the PECOTA cards up and running, you'll see that guys like Littleton and Cleveland have enormous distributions on their forecasts; their weighted means might look pretty good, but PECOTA really has no idea where on the spectrum they're going to end up.

Hambone (Pawtucket, MA): There's a rumor floating around that BP is going to launch a rotisserie-specific product based on PECOTA. Is this true? When will it launch? PECOTA doesn't forecast RBI, Wins, Saves, and other roto stuff, does it?

Nate Silver: This has about as much chance of coming true as Pete Rose does of showing up at the Mirage sportsbook on Sunday evening. Which is to say, keep your eyes peeled, dude.

dangor (new york): I loved getting the PECOTA numbers this early. I have to ask about one number that just jumped out at me - Vlad Guerrero, 21 home runs. What gives? Are you projecting that his back acts up again? In a snake draft, would you not even draft him in the first round if it was you?

Nate Silver: Two things about Vlad:

1. Montreal (er... Montreal / San Juan) has played as an insanely good offensive park for the past couple of seasons, with park factors on the scale of Coors. So while Vlad is simply an oustanding player, it would be easy to overstate his offensive contributions somewhat.

2. As a general rule ... if a guy missed a substantial amount of playing time in the previous season, odds are that he'll miss a substantial amount of playing time in the upcoming season; that's what PECOTA is picking up upon. That said, if you have *specific* information about a guy's injury status, that should override the sort of actuarial bent that PECOTA has. That's what we keep Will Carroll around for.

Dave 'Mediot' Kirsch (Chicago): Although PECOTA's great, it seems conservative for some younger as well as established players. It has Miguel Cabrera below last year's (rate) numbers and still is skeptical with Roy Halladay. Care to comment on this?
Also, who do you feel are strong candidates to surprise? I see that Edwin Jackson (LAD) is a young pitcher about whom PECOTA is confident ...
Finally, get your Texas Hold-em hat ready for this weekend :-)

Nate Silver: Hi Dave! Looking forward to our poker game this weekend.

PECOTA has an interesting projection for Cabrera. If you look at his 5-year forecast - I know we don't have those up and available just yet - he shoots through the roof in 2006 or so, becoming a HOF-caliber player. His top comparables include guys like Hank Aaron, Ken Griffey, A-Rod, Ron Santo ... just fantastic players. But it also expects Cabrera to take a little bit of time to get there, and stagnate a bit this year.

Halladay ... I'm pissed off that the system didn't give Halladay a better projection, but generally speaking PECOTA is smarter than I am. As it happened, he drew a lot of relatively unflattering comparables, guys like Chris Bosio and so forth, but there are also plenty of favorable names (everyone from Don Drysdale to Greg Maddux) further down his list. I'd certainly take the under on his ERA projection.

Randy Brown (Ann Arbor, MI): Nate,
The 2004 Pecota spreadsheet is seriously disrupting my ability to stay current with episodes of The Apprentice.
Some of the figures surprise me a bit, but one shocks me: Wily Mo Pena, .271/.340/.509 projection. Did Jeff Kent's projection get copied to the wrong line?

Nate Silver: Wily Mo Pena's PECOTA has already become a running joke on the BP mailing list. But to fend off the obvious question - no, it's not a glitch or anything - that's what the system spit out. Pena is very young, and PECOTA tends to be very impressed with very young guys with good isolated power.

That said, I still think it must have been hungover or something when it ran that projection.

Tom Gisriel (Baltimore): How much does a breakthrough (or fluke) year affect a player's future PECOTA forecasts? For example, in Baltimore several players (Luis Matos, Larry Bigbie, Melvin Mora) significantly outperformed their PECOTA forecasts (or perhaps more accurately performed above 90%) Does this cause the following PECOTA forecast to increase significantly? Is there a greater increase for Matos and Bigbie (players who are just establishing themselves) than for Mora?

Nate Silver: PECOTA doesn't put much emphasis on fluke years or breakout years as such. It develops its baseline forecast by looking at the totality of a player's experience during the three previous seasons. When developing the system, I did some research on whether one-year breakouts like the one that Mora had have any predictive value above and what you'd get by including those favorable numbers in the player's weighted, three-year average. It turned out that they did not for position players (pitchers are a different story).

One thing that it is worth examining, though, is the nature of the breakout season. Was the breakout batting average driven? (If so, regression to the mean is more likely.) If a player improved his power, did his plate discipline improve as well? (If not, he's likely to give something back).

Dave (Chicago): Which of the rookie eligibles does PECOTA really love this year? On the flip side, anyone who looks spectacularly overrated? Thanks!

It likes Joe Mauer down the road a lot, but thinks he'll take a couple of years to fill out as a player. I think J.J. Hardy might be a bit overrated.

Kris Arthur (Downers Grove, IL): What are the chances that C. Zambrano, Prior, and K. Wood all stay healthy this season? It seems the high workload on these guys would be devasting long-term to their careers. Which one in your opinion is the most likely to break down due to so many IP?

Nate Silver: Now's neither the time nor the place to b*tch about pitch counts, but suffice it to say that I think the Cubs could have stood to be a bit more careful. I'd love to see them sign Maddux simply because they could stick to a 5-man rotation without giving much back that way, getting the young guys some extra days off over the course of the year.

Of the three, Zambrano seems like the biggest injury risk to me, since he's younger than Wood, and doesn't have the uncanny mechanics that Prior does. PECOTA seems to agree.

Chuck (Washington, DC): After reading 'PECOTA takes on the field' and a recent Rob Neyer article on the Houston Astros, I'm wondering which is more predictive of future record: previous years Pythagorean record or some form of Runs Scored less Runs Allowed through PECOTA? It seems to me that the latter would be more accurate as it would incorporate roster changes, but I'm wondering if the difference is too marginal to bother.

Nate Silver: I don't think the differences are too marginal to account for at all. You think the Braves are going to win 96 games again this year?

jdouge (CA): I am trying to wrap my head around the relationship between "collapse" and "attrition." My primary difficulty lies in understanding how attrition could be higher than collapse (see, for example, Erasmo Ramirez in the new PECOTA projections file)-- shouldn't suffering attrition be a collapse? Unless collapse is quality of performance-focused and attrition is quantity of performance focused -- in the sense that a player could perform well, hence not collapse, but still get hurt or sent down and therefore suffer attrition?
And attrition for a player such as Noah Lowry, who hardly played in the majors last year -- does attrition for him mean an absence of major-league playing time or also a loss of playing time compared to his (mostly minor-league) "baseline?"
Maybe all I really need is a clarifying definition of attrition . . . :)

Nate Silver: You have it exactly right; collapse is based on QUALITY of performance and attrition is based on QUANTITY of performance.

And you're also correct that attrition doesn't necessarily mean an injury - but rather, any substantial reduction in playing time. That can be the result of injury, demotion (except for minor leaguers like Lowry; PECOTA makes an exception for those guys), suckiness, etc.

Michael J (Solvang, CA): Nate -- What's the best way to use the pecotas for my roto auction? If I figure out the dollar value based on the weighted mean, our categories, and inflation, do I use that, or do I assign different dollar values to each percentile grouping, and kind of diversify and mitigate my risk as we go through the auction?

Nate Silver: When we used PECOTA for Tout Wars last year, we created dollar values based on the weighted mean projections, but also kept Breakout and Collapse percentages in our profile for each player. Since the last four or five guys on a roto roster get turned over as frequently as Paris Hilton's bedsheets, I think it makes all the sense in the world to bid on guys with higher upside potential. On the converse, I'm very risk-averse when it comes to selecting my $30 players.

Paul (Winnipeg, CA): Hi. I was hoping you would do an article breaking down how many people broke into each 'percentile bucket' of the Pecotas. Any chance of getting to see the results of such a study at baseballprospectus.com?

Nate Silver: We've had a ton of requests for this, and this is something that we should have up and running within a couple of weeks.

Some might argue that a strong bullpen or "a good team chemistry" might directly influence a team's win-loss record during one run games. After some reflection, I found myself trying to come up with an equation that could somehow measure a team's success in close games. Perhaps a team's success in one-run games is a function of a good starting rotation (as measured by cumulative VORP, a strong bullpen (as measured by cumulative VORP), a good manager (as measured by the team outperforming its predicted run differential).

Any thoughts on what it takes to win the close games? Besides luck.

Nate Silver: There is some evidence that teams with good bullpens do better in one-run games. That's the only thing that has been demonstrated to bear a statistically significant relationship, so far as I am aware. I'm also very skeptical of the notion that W-L records in one run games is a good proxy for managerial aptitude.

beausox (Illinois College): Why haven't the White Sox done more this offseason, they threw money at Colon, when he went elsewhere they chose not to spend it. Their biggest offseason acquisitions have been Cliff Politte, A Japanesse Reliever, and the leftover remains of Marvin Benards never useful bloody carcass. How on Earth do they expect to win?

Nate Silver: Yeah, that was strange, wasn't it? I mean, there are a lot of ways to spend $12 million, a lot of which would be better investments than Colon.

I think that the White Sox:

1. Are frustrated because they've thrown money at free agents before and it doesn't seem to have "worked".

2. Really, honestly believe that replacing Manuel with Ozzie Guillen is going to solve a lot of their problems.

3. Recognize on some deeper level that their division is freakin' terrible and putting a .500 club on the field should be enough to keep them involved in a pennant race.

Scott (Seattle, WA): Nate dogg!
Love the PECOTA numbers. Last year a few guys blew away their midpoint projections - Javy Lopez springs to mind. Who would be your one dark horse your gut says could do the same in '04, even though the numbers don't back it up. Who could tank to the 10-20% level, on the other end?

Nate Silver: I still think Carlos Beltran is due for an MVP type season one of these days, though he's not a dark horse by any means. As for a truly random breakout ... I'll predict that Russ Branyan and Mark DeRosa both have banner seasons, and the Braves have one of the most productive 3B positions in the league.

Dan (Japan): I really like the PECOTA system, but I have a basic question about it. I was wondering why for some players the percentages for "breakout," "improve," "collapse," and attrition total more than 100%? And why for others the total is less than 100%?
For example, Bobby Abreu adds to a combined 69% and Rocco Baldelli is at a combined 120%

Nate Silver: Check out the glossary online. You shouldn't be adding the three; since Breakout is really a subset of Improve, and since a player doesn't have to fall into any category.

bobbailey (Montreal): Just curious on your take on walks. It's considered a major plus by the stat orientated crowd while at the same time a huge percentage of walks to players like Bonds are intentional (even if not officially). In a nutshell, opposing managers are walking batters to reduce runs scored against their team, while at the same time walks in the individual statline are considered a plus in run production. A contradiction? Or Not?

Nate Silver: It's an valid question. For the past several seasons, the Giants have scored substantially fewer runs than would be predicted by Runs Created, EQR, and so forth, and I've no doubt a lot of that has to do with inefficieny created by having Barry Bonds hitting immediately in front of Benito Santiago. At the very least, we ought to be considering regular walks and IBB separately.

But I think there's some more serious work to be done in this area. I know that the "myth" of lineup protection was supposed to have been debunked a long time ago. But sabermetrics hasn't revisted that question in a long time, even though:

1. We now have much better data to work with, e.g. easy access to play-by-play databases.

2. From the standpoint of run prevention, a rational pitcher would adjust his pitching style based on the hitter(s) due up next in the order, if he could do so without sacrificing efficiency.

As you might guess, this is something that I intend to take a look at in LDL this season.

Speaking of which ... yes, yes, my column should be back up and running very soon.

Nate Silver: I could really go for a burrito, so that's it for now. Thanks for dropping by, and remember to buy the book!