/m/cubs

Reader Comments and Retorts

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

What is sad is that the author rips PECOTA for being a computer (not accounting for the human element), and then give ZERO evidence about what human element would show that PECOTA does not; and in fact, he even admits the human staff at Sports Illustrated was far more incorrect about prognosticating the White Sox record last season than PECOTA was. It appears he is unhappy about the system's estimation that his team will decline by 8 games in 2013. Which, considering they improved by 13 in 2012, is not al that unlikely. Unless you blame 2011 on the departed Ozzie "sho ain't no compooter" Guillen

People make comments about not linking to bloggers for cheap "look at the idiot" threads, but the linked article was written by a professional sports media person. It is worse than seventeen Bleacher Report slideshows littered with fratty asides about Will Ferrell movies. It makes tina look literate and coherent. It is the worst thing you will read for a quite a while, if you click the link. So don't.

On the one hand, this guy is an idiot. On the other hand, doesn't BP have a history of being horribly wrong by under-predicting the White Sox?

It does.

An interesting article would have been "here's where PECOTA gets it wrong" that goes beyond "stat geeks are poopy heads."

I always find it interesting when people get bothered by predictions, particularly those spat out by computer models. It's not like PECOTA has a "IF White Sox THEN wins=predicted wins-10" line in it (though really, how awesome would that be?).

The White Sox being underprojected by PECOTA is intriguing and worth some consideration. It would be interesting to me to know what if anything was systematically happening to make that occur. What jumps at me is that the ChiSox have alternated winning and losing seasons since 2006. I've always assumed that the computer models aren't going to handle inconsistency as well as consistency. If a player hits .300-.300-.300 over three years I know what I'd expect for year four. If a different player hits .300 over three years but gets there hitting .270-.370-.260 well I'd be less certain about year four. Presumably the same thing is going to be a problem on a teamwide basis.

Why would the White Sox have these variances? Well, during most of this period they've been managed by a pretty volatile dude. Whether or not a manager can have that much impact is a question but I would expect an Ozzie-led team to be a bit more unpredictable.

The White Sox being underprojected by PECOTA is intriguing and worth some consideration. It would be interesting to me to know what if anything was systematically happening to make that occur. What jumps at me is that the ChiSox have alternated winning and losing seasons since 2006. I've always assumed that the computer models aren't going to handle inconsistency as well as consistency. If a player hits .300-.300-.300 over three years I know what I'd expect for year four. If a different player hits .300 over three years but gets there hitting .270-.370-.260 well I'd be less certain about year four. Presumably the same thing is going to be a problem on a teamwide basis.

Health is probably the biggest thing on their side. Year in and year out, if they don't lead the league in fewest days lost to the DL, they're usually in the top three. Screws with playing-time projections.

Well, it's certainly possible (or probable) that PECOTA continually underrates the White Sox because of their injury avoidance, but isn't it also true that over a span of several years, random chance would cause SOME team's projections to be repeatedly missed in the same direction? Maybe it's like the Angels, where it's possible (or probable) that they have an ability to outperform their Pythag record, but you would expect at least one team out of thirty to do so repeatedly simply by chance. Does anyone have the statistical expertise to go through PECOTA's forecasts and search for evidence of consistent bias for or against certain teams?

Edit: Come to think of it, I remember this piece at BP from 2011 that looked back at each team's PECOTA-projected versus actual records over the years. The data was striking: almost without exception, the teams that PECOTA had tended to overrate were the same ones that we consider "stat-savvy," and the teams it had tended to underrate were the ones we tended to dismiss as backwards and incompetent. I've never seen a clearer demonstration that the stathead approach is missing something important.

It shouldn't be that hard to see how PECOTA is missing them. If it's the same thing every year - if its under-predicting the innings pitched of their top starters consistently - then you'd probably want to build that into your model. If it's missing for different reasons - one year all the pitchers stay healthy, one year the White Sox make awesome deadline deals, one year they hit amazingly in the clutch, one year they go 30-5 in one-run games - then maybe you put it down to chance.

I always thought the team projections for PECOTA were always just done for a bit of fun. In other words, PECOTA was in no way designed to project team records, but one day they figured "hey, why not publish this for the f--k of it?" Then everyone started taking the team projections seriously for some reason. Like this d-bag, apparently.

I always thought the team projections for PECOTA were always just done for a bit of fun. In other words, PECOTA was in no way designed to project team records, but one day they figured "hey, why not publish this for the f--k of it?" Then everyone started taking the team projections seriously for some reason. Like this d-bag, apparently.

That's just it. They are a lot of fun to read and mess around with but that's all I've ever done. I've never read them as gospel and I doubt anyone really does. There's a common misconception that Baseball Prospectus touts their PECOTA projections all the time, but really they don't. I think they post it, shrug and occasionally bring it back up (like Austin's link).

I think the White Sox could actually win 90+ games this year if Danks and Peavy and Sale hold up. Yea that's a big if but with Cooper and Herm and their side it's as good of best as there is health wise.

Viciedo takes a step forward and puts up a 290/330/500 line while playing run neutral defense in LF. De Aza continues to be a 3 WAR player. Dunn and Konerko give 4 WAR between them. The pen adds another 2.

The Sox just need to not have their usual flaming meteorite crater of suck from 3B, DH and 2B. I'm not optimisic about Beckham, but whatever, just hit 260/300/350 and play good defense.

I'm optimistic about the Sox this year but I could be wrong. Median case scenario is still probably 82-84 wins.

The White Sox being underprojected by PECOTA is intriguing and worth some consideration.

It is, and what are the chances this would be true of one team, and it simply happens to be the White Sox? In short, what are the chances that a projection system will miss one team by x games over y years, especially given the volatility of team win projections?

My guess is there's an excellent chance of this story being written, if not about the White Sox then the Dodgers or Astros, and if not about underprojection, then overprojection.

Reminds me a little of The Fermi Paradox, where what is not found is taken as evidence that a thing does not exist.

I might not have described that well. I only meant that over eight years, given the volatility of team win projections, it's not all that unlikely one team will have over- or undershot its projection by something like seven wins. That team just happens to be the White Sox in this branch of the multiverse, and it's not as unusual an event as some writers are making it out to be.

If the story would be written about any team re Pecota, it's 30 times as likely as it seems. If the story would be written if the projection was over or under, it's 60 times as likely as it seems. If the story would be written if the margin was five games or more, the story is something like 200 times as likely as it seems.

I'm also fairly sanguine about the White Sox. Yes, ~.500 is a more likely outcome, but they have a realistic shot at the postseason. CAIRO at RLYW has them at an average of 77 wins and gives them a 15.1% shot at the division and an 8.5% shot at the wild card game. CAIRO also has the Royals and Indians projected ahead of the Sox, which I don't buy. So my fanboy prognosticating gets them up to .500 as a median expectation and a ~30% shot at some postseason slot or other. That's pretty good for a team that's seemed like it's needed a rebuild for ages now.

The Sox just need to not have their usual flaming meteorite crater of suck from 3B, DH and 2B. I'm not optimisic about Beckham, but whatever, just hit 260/300/350 and play good defense.

I think that, assuming his broken leg is healed, Keppinger is fully capable of putting up a ~100 OPS+ with averagish defense at 3B. That'll be vastly better then most of 2012.

Beckham can be a median 2B this year. Last year in the AL there were only 5 or 6 2B* who were significantly better than him. A few others were a little better, but not hugely. Sure, Ackley and Kelly Johnson are likely to separate themselves from Beckham a little more in 2013. But the point is that Beckham is pretty close to being a league-median 2B, and if he ever develops then he'll be the very definition of cromulence.

* Cano, Pedroia, Kendrick, Kipnis, Kinsler, and probably Carroll -- what is it with good 2Bs with the hard C sound at the start of their names?

@14: Yeah--the problem with black boxes is you can reverse engineer a lot of stuff that makes it look like you've nailed the right variables, but then it turns out to be something on the order of 'if the Redskins win, Democrats will win the White House.'

I thought an interesting scam would be to set up 256 distinct websites where you'd pick every win-loss or yes-no combination of eight very big things. One of your websites would be a huge winner (and you'd have some seven and one sites), and you could milk that brilliant prognostication for whatever you could get. You'd want enough events a year or two in the future for your second set of picks that you'd have time to monetize your brilliant forecastiong skills.

Of course, the time it would take to establish 256 sites different enough so that your scam wouldn't be too obvious suggests honest work to duller minds...

The White Sox being underprojected by PECOTA is intriguing and worth some consideration.

It is, and what are the chances this would be true of one team, and it simply happens to be the White Sox? I

Lets say that Pecota is "right" on 10 teams each year, over projects 10, and under projects 10. (and let's assume this is random)

Let's take the 10 teams that are under projected- each has a 33.3% chance of being under projected the next year making it two year in a row- meaning that likely 3 or 4 of the 10 will be under projected again...

I get roughly a 1 in 20,000 chance that a specific team will be under projected 10 years in a row- roughly 1 in 2000 that any of 30 teams will be under projected 10 years in a row (assuming projection errors are random)

Of course I doubt that projection errors are entirely random - if a system is going to underproject a certain type of team 40% of the time, the odds of that team being under underprojected 10 years in a row goes from 1 in 20,000 to 1 in 4,000,

Last year 8 of their 9 position players qualified for the batting title (min 502 PA). The only position where they didn't was 3B where they picked up Youk mid-season and he gave them 344 PA. To put that in perspective, there were only 82 qualifying players in the AL last year.

The Sox did have some injuries (and some suckitude) in the rotation last year but no more so than most teams. They got 99 starts out of their top 4 starters (because Danks got hurt and gave them only 9).

2011 was similar. They only had 5 qualifying batters (Pierzynski missed by 2 PA) but their top 9 all had 444 or more PA. Rotation health was excellent with only 11 starts from guys not intended for the rotation (despite Peavy missing half the season) ... and one of those extra guys did quite well.

2010 -- 6 qualifying guys and Beckham missed by just 4 PA. Only 10 starts made by guys not intended for the rotation again despite only a half-season from Peavy.

2009 -- this one was more of a grab-bag. Pretty typical playing time distribution on offense and a lot of spot starters. But if you look at the guys who made starts for the Sox that year, we might spot one key to their success. They ALWAYS manage to get veteran starting pitchers to fill in. The 2009 team had Buehrle, Danks and Floyd fully healthy (95 starts) and got 45 starts Contreras, Colon, Garcia and Peavy. They'll get caught off-guard now and then but they simply won't throw a Casey Coleman out there for 10-15 starts.

2007 -- 150 starts from Vazquez, Garland, Buehrle, Contreras and Danks plus 10 from Floyd. If I did it right, those guys provided 136 runs just in replacement value.

2006 -- 159 starts from their starting 5. And again with excellent position player health with 7 qualifying, Uribe 7 PA short and Brian Anderson with 405 PA.

2005 -- 130 starts from their top 4, 22 from El Duque, 10 from Brandon McCarthy.

For the AL 2005-12, there have been 307 seasons of 28+ GS, the Sox have had 29 of them. Detroit has had 28 but the Indians just 20. The Red Sox 23, Anaheim 26, Yanks 22, A's 20, Rangers 18, Rays 25. I'd be interested in seeing how well PECOTA has done for those teams, especially Detroit and especially in years where they had very durable rotations.

That 29 may undercount by a bit. When Peavy got hurt in 2010, the Sox didn't stick in some crappy AAA pitcher, they traded for Edwin Jackson. Humber held Peavy's spot at the start of 2011, then Peavy came back in May, then the Sox dealt Jackson at the deadline. The White Sox had Jackson for slightly less than a calendar year and he made 30 starts for them. Of course this sort of thing will have been true of other teams as well.

For the AL 2005-12, there have been 205 pitcher-seasons of at least 10 starts and an overall ERA+ of 85 or less; only three of those were for the White Sox (Liriano and Humber last year, Contreras in 2007).

If I counted right, in the AL last year there were 738 starts from pitchers with an overall ERA of 85 or less. That's nearly 53 per team. That's higher than most years where it's around 45 (it was 626 in 2011). From 2005-11, the Sox had only 60 such starts (in 2012, they had 45). So for most of this period, the Sox were getting 36 fewer crappy starts per year than the average AL team. That's about 7 wins right there.

Somehow, Kenny Williams almost never misfired on the rotation -- they were almost always quite durable (Peavy being the exception) and they almost never had truly crappy seasons.

Obviously PECOTA (or ZiPS or any of them) should ideally pick up on true quality differences -- if they are consistently underestimating the quality of Sox pitchers, that would suggest a flaw somewhere (type of pitcher, bad park effect, under-rating Sox defense); if they are consistently underestimating Sox pitcher durability ... well, ZiPS puts in the "not a playing time estimator" disclaimer for a reason.

I don't know how much they vary playing time assumptions when they run sims. I suspect if you looked at the Sox, you'd find they regularly run near the top of their playing time estimates, especially in the rotation but maybe elsewhere too.

@19: why are you going 10 years out? Otherwise the math looks good. I think I would factor in the possibility that a system missing on a team consistently is doing so because of a flaw in the system, the same way a coin that's handed to you, whose provenance you don't know, is more likely to be gaffed if it comes up heads the first 10 times you flip it.

I think I would factor in the possibility that a system missing on a team consistently is doing so because of a flaw in the system, the same way a coin that's handed to you, whose provenance you don't know, is more likely to be gaffed if it comes up heads the first 10 times you flip it.

1 in 512, with those odds there have been times where an unbiased coin has come up heads ten times in a row- but yes, if a random coin goes 10 for 10, the odds are pretty good that the coin is biased.

Do you remember offhand how the math works for figuring the possibility of the coin's bias in that case? It's been years since I figured that kind of thing, and of course you'd have a range of possibility rather than a single number.

There's a common misconception that Baseball Prospectus touts their PECOTA projections all the time, but really they don't. I think they post it, shrug and occasionally bring it back up

Well, they do make a big deal of their playoff odds report, which does project team records.

Why pick in PECOTA? I mean, it wasn't that long ago Marcel beat it, and I don't know that it's proven it's reliability since Silver left.

Yes, it's true that PECOTA has kind of sucked since Silver left. But the thing with projection systems is that the gap between the best and the worst (and even more so between average and the worst) is pretty small. It's small enough that we can dismiss it for our purposes in this thread. Personally, the reason I've been using it is that it's one of only two projection systems I know of that's been used to project 2013 team records so far. (Plus, it's the best-known projection system to regular-Joe baseball fans, which leads to articles like this one being written.)

I might not have described that well. I only meant that over eight years, given the volatility of team win projections, it's not all that unlikely one team will have over- or undershot its projection by something like seven wins. That team just happens to be the White Sox in this branch of the multiverse, and it's not as unusual an event as some writers are making it out to be.

I'd even say quite common!

If you knew for an absolute fact that every team was exactly of league-average quality, you'd still expect their final record to be off an 81-win mark by 7 games or more about 30% of the time over a 162-game season. 10 games or more, it's still 12%.

Oh no it's not! Somehow I turned 36 average starts into 72 fewer runs than a replacement level start. 'Tis a bit much! Must be on the order of 2.5 to 3 wins though.

If you knew for an absolute fact that every team was exactly of league-average quality, you'd still expect their final record to be off an 81-win mark by 7 games or more about 30% of the time over a 162-game season.

Yes, within tweaking. But to do it 7 years for a specific team is 2 in 10,000. The chances of it happening to at least one team in 30 is only 5 out of 1,000.

Now, are they always under-predicting? Always missing by 7 would be strange but if you're close to spot on across 7 seasons, especially for an inconsistent team, then you have a better argument for randomness.

Anyway, what's always hilarious about these articles is that nobody looks at how well the human "experts" do at stuff like this. Sportswriters generally do miserably and all they usually bother to predict is the order of finish, not the number of wins. I'd assume that somebody like Hawk Harrelson has a tendency to overestimate Sox wins on a regular basis.

In some sense the Cubs kinda look like those under-projected Sox teams. The Cubs starting pitching could be pretty good especially if everybody is healthy (and they have a bit of SP depth this year). The offense looks pretty horrible but Sox offense often looked pretty horrible on paper -- again the Cubs need good luck with health. Emulating the Sox, the Cubs have made sure they have offensive sinkholes at 3B and 2B. I could see the Cubs surprising with 75 wins.

The Cubs starting pitching could be pretty good especially if everybody is healthy (and they have a bit of SP depth this year).

The importance of that parenthetical cannot be overstated. The Cubs used the following starting pitchers in a total of 50 games last year: Chris Volstad, Justin Germano, Chris Rusin, Jason Berken, Brooks Raley, and Casey Coleman. The team's record in those games was 13-37. In games started by actual major league starting pitchers, they were 48-64, which is a 70-win pace. This year's team has roughly 7 actual major league starting pitchers (Jackson, Garza, Samardzija, Wood, Baker, Feldman, and Villanueva, plus possibly Vizcaino later in the year).

Of course, some of those guys might not work out, and some of them may be traded during the season. But that's why the depth is promising. I still don't think the Cubs will be an especially good team this year, but I have hope that they'll be less embarrassing.

"I thought an interesting scam would be to set up 256 distinct websites where you'd pick every win-loss or yes-no combination of eight very big things. One of your websites would be a huge winner (and you'd have some seven and one sites), and you could milk that brilliant prognostication for whatever you could get"

This is how sports betting scammers still operate, I'm sure.
You offer a "free" play to everyone.
You tell half the people to try one team, and half the other, against the spread.
The half who win will include a good pct of people who may pay for the next pick, plus others who will need more "proof." And some of the losers are dumb enough to be enticed by another free play.
The segment that wins again, whether paying or not, off the scammer are liable to pay for the next pick. And some of the loser/winner crowd will buy in as well.
The product you're offering - the prediction - costs nothing to make, beyond advertising dollars to find the marks.

So by BINOM (again, lazy), you expect them to be off by that much in a 10-year period 1-in-550ish by random chance. And a 5% chance that *one* of the 30 teams would play at their 0.2th percentile or worse.

Yes, and it used to be effective in e-mail blasts as well, before spam filters advanced. Get an e-mail list with 16 million e-mails on it (pretty easy to do), and send 8 million e-mails with one prediction and 8 million with the other. Whichever list got the right prediction sent to them gets a second prediction, with 4 million one way and 4 million the other. After 6 weeks you've got 250,000 people that you've sent 6 straight winning predictions to. That will sucker a lot of people in.

Isn't PECOTA making team predictions based on player projections. As the player projections themselves are kinds iffy, multiplying that by 25 makes this not very robust. No wonder BPro does not really take Team PECOTA projectiosn to heart. Player projections on the other hand, we can hold them more accountable as this is what they are advertising.