/m/mlb

Reader Comments and Retorts

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

A quick look at the team projections shows no "superteam" in 2014. For example, the projections have nine American League teams winning between 83 and 91 games - and no team winning more than 91. It also has no team (including Houston!) winning fewer than 70 games. A 21-win range between the best and worst team in the entire league would be quite a bit of parity!

Unless I'm missing something, and I may very well be so please correct me, these seem overly regressed for established players. For example, Miguel Cabrera is projected at .396/.535. His last three years are (most recent first):

442/636
393/606
448/586

Some other projections:

Steamer: 418/594
Oliver: 413/592
ZiPS: 404/581

Maybe all of those are wrong and this is right, but that seems a significant outlier forecast and I'd be interested in hearing why. Rinse and repeat for the Vottos, Tulos, etc of the world.

A 21-win range between the best and worst team in the entire league would be quite a bit of parity!

These projections are regressed to the mean. The average disparity between the best team and the worst team for each simulated season is going to be greater than the disparity between the best average projection and the worst average projection.

A quick look at the team projections shows no "superteam" in 2014. For example, the projections have nine American League teams winning between 83 and 91 games - and no team winning more than 91. It also has no team (including Houston!) winning fewer than 70 games.

Projections will always have a tighter range because they are mean projections. We expect 3 teams, on average, to perform to their 90th percentile, 6 to 80th or better, etc, but we don't know which ones they are yet.

sweet, the White Sox faired pretty damn well, speaks volumes to the work Hahn has done overhauling the roster.

In fairness, Keith Law says they were never good to begin with.

loathe as I am to agree with Law, I kind agree with your simplified take on law's take. They were never that good, maybe like 85 wins good, you can't win a 80% of your one run games very often, ask the 2005 White Sox. Doesn't mean we shouldn't appreciate it when it happens, but it's not repeatable.

Rather disappointing and horrifying that the Cubs are projected to be the worst team in baseball while the Marlins and Astros still exist.

As big a mess as the Cubs were when Thed took over, and as much as 90 losses seems near certain... It's awfully hard for me to see how this regime gets anything more than one more 90 loss season. I think I'm being more patient than most - I like the farm system, I think we're starting to see some real depth, and if Rizzo/Castro can hopefully rebound, the MLB cupboard isn't wholly bare - but really, you can't have more than 4 years of complete futility.

Rather disappointing and horrifying that the Cubs are projected to be the worst team in baseball while the Marlins and Astros still exist.

The Marlins had 4 starters last year who made 17 or more starts with ERA+ of better than 100 who are younger than 25 years old. They also have Giancarlo Stanton. You can only project so badly when you should have a solid rotation and a young superstar hitter.

The team with the most wins has 91 if I read it correctly. While the 67 for the worst team could be accurate, I might have put something in the model that made sure a team won at least 95 games. I can't think of a season in which there wasn't a team with at least 95 wins. Also, I wonder how much the remaining free agents would change things. I also wonder how much of this is based on macro data and how much is on player projections.

The team with the most wins has 91 if I read it correctly. While the 67 for the worst team could be accurate, I might have put something in the model that made sure a team won at least 95 games. I can't think of a season in which there wasn't a team with at least 95 wins. Also, I wonder how much the remaining free agents would change things. I also wonder how much of this is based on macro data and how much is on player projections.

That's not how projections (or basic probability) work. They're supposed to have a tighter spread because they represent the mean projection for each team. Those projections aren't saying that 91 wins will lead MLB, only that there's no team that has an *average expectation* of more than 91 wins. Obviously, some teams will perform to levels they only have a 10% or 20% chance of reaching (or falling to).

If teams were coin flips, the mean projection for every team would be 81 wins. But that's not the same as saying that 81 wins will lead the league because on average, you'd expect around 92 wins to be the average league-best in a league of 162 coin flips and 30 teams.

That's not how projections (or basic probability) work. They're supposed to have a tighter spread because they represent the mean projection for each team. Those projections aren't saying that 91 wins will lead MLB, only that there's no team that has an *average expectation* of more than 91 wins. Obviously, some teams will perform to levels they only have a 10% or 20% chance of reaching (or falling to).

If teams were coin flips, the mean projection for every team would be 81 wins. But that's not the same as saying that 81 wins will lead the league because on average, you'd expect around 92 wins to be the average league-best in a league of 162 coin flips and 30 teams.

With Tanaka off the board the Cubs are going to have to find some pitching if they want to avoid losing lots of games. After 2015 Jeff Samardzija is gone and all that will be left is the carcass of Edwin Jackson for another year and hopefully the resurrected Travis Wood.

That seems crazy to me. I'll be stunned if they fall below 70 and really think 85 is as likely as 75.

Well, I like your guess better than my own, but I have this hunch that the Jays are due for a bad year after two years of consistent performance. I think some of the stuff that's been working for them is going to stop working.

Well, I like your guess better than my own, but I have this hunch that the Jays are due for a bad year after two years of consistent performance. I think some of the stuff that's been working for them is going to stop working.

Well, I like your guess better than my own, but I have this hunch that the Jays are due for a bad year after two years of consistent performance. I think some of the stuff that's been working for them is going to stop working.

I didn't do it for the National league teams but if you total up the runs scored by the American League teams it comes out to exactly the same 10,525 runs that were scored by AL teams in 2013. It doesn't look like it's a reduced environment, just a more even environment. The same reasons for the more smoothed out win/loss projections that Dan lays out in #26 probably explain that.

Someone from that top 3-4 teams is going to score over 800 runs and someone from the group of Minnesota, Houston and Kansas City is probably going to score around 625.

I didn't do it for the National league teams but if you total up the runs scored by the American League teams it comes out to exactly the same 10,525 runs that were scored by AL teams in 2013. It doesn't look like it's a reduced environment, just a more even environment. The same reasons for the more smoothed out win/loss projections that Dan lays out in #26 probably explain that.

Someone from that top 3-4 teams is going to score over 800 runs and someone from the group of Minnesota, Houston and Kansas City is probably going to score around 625.

Right, you have to remember that every year ~33% of teams will exceed or fall short of expectations by 1 SD.

Projections will always have a tighter range because they are mean projections. We expect 3 teams, on average, to perform to their 90th percentile, 6 to 80th or better, etc, but we don't know which ones they are yet.

The only stats class I took was Intro to Stats, so bear with me, but if you ran enough simulations, would the results eventually be that every team finished 81-81? Or do they not regress like that?

Well, if you ran enough simulations you might eventually stumble across one where every team was 81-81. But it's not a process that moves in that direction. And the odds of 30 teams all hitting that mark are probably so small that you'd never see it happen in your lifetime.

If you start with all teams being equal 81-81 teams, then just by chance some of them will win 90 games, others will win 70 games.

I didn't do it for the National league teams but if you total up the runs scored by the American League teams it comes out to exactly the same 10,525 runs that were scored by AL teams in 2013. It doesn't look like it's a reduced environment, just a more even environment. The same reasons for the more smoothed out win/loss projections that Dan lays out in #26 probably explain that.

Someone from that top 3-4 teams is going to score over 800 runs and someone from the group of Minnesota, Houston and Kansas City is probably going to score around 625.

While acknowledging this is right, why do ZiPS and the other projection systems in #2 spit out consistently higher results? Obviously Dan is well respected around here, and justifiably so, yet it seems there is a materially different overriding approach in Clay's* projections.

*The systems agree on some players and has some normal variation. That said, the number of players that are projected lower in Clay's far outweighs the number projected higher in Clay's, particularly for established players. This isn't 'Clay doesn't like Miguel Cabrera/Tulo/Votto/etc', it's 'Clay is regressing Miguel Cabrera/Tulo/Votto/etc. more than anyone else'. I find the outlier approach interesting and worth exploring, especially for someone with Clay's track record, but barring a better explanation I'd rather just use other sources. Of course, a Dan/Clay debate would be the best outcome. Get on it, boys.

The only stats class I took was Intro to Stats, so bear with me, but if you ran enough simulations, would the results eventually be that every team finished 81-81? Or do they not regress like that?

As you run more simulation, the average of each team will get closer to their true mean (in the case of a coin that would be 81).

That's basically the progress by which these projections are arrived at. They take the average of thousands of simulations, to get the most likely outcome. But you have to remember that the most likely outcome, isn't very likely at all. If you looked at each individual simulation however, you would probably find at least one 95+ team in most. It's just evened out by the time that team finished with 85 in another sim.

Looking at it Sean's way, there is about a 6.26% chance of any individual "team" finishing at exactly 81. Not accounting for interdependency of results, the odds of that happening 30 times in a row would be 0.000079%, or a bit less than one in a million.

PECOTA usually includes a category called Average Win Total or something like that that I find very interesting. Basically it would rank the AL East teams (like with Clay's list) but it would show that while the Rays are projected to win 90 games right now the average AL East winner finishes with 95 wins in the various runs. Those numbers usually look a lot more like the real standings than the projected versions.

Yeah, I think so. Looking at in another way, the more games you play in your sim, the less spread in the results you'll have. Was that what was meant in the original question?

With 162 games, an average team will have a +1 SD result of 87 wins, or .540 winning percentage.

Play 1 million games, then +1 SD will be a percentage of .5005. With 1 million trials, a team playing .506 ball (the equivalent of 82 wins in 162) will be 12 SD from the mean, which means it pretty much doesn't happen. (an average team playing 12 SD above the mean in 162 games would be 157-5)

I think it's worth noting (and correct me if I say something wrong here) but there's also a difference between our best estimate of team quality and the true value of team quality. Our best estimate for the league as a whole is made by regressing every team to the mean, and while that will improve the estimate for some teams it will not do so for others. We have to do it for every team regardless because we don't know which ones should be regressed the full amount (or greater) and which ones shouldn't.

That was probably really unclear, so here's a thought experiment (and again, those who know this better can correct me). If you played the season a thousand times (actually playing the games, not calculations) without changing the teams at all (impossible, sure, but bear with me) then you wouldn't have to regress much at all because the observed value would be quite accurate. At that point the best team would have a lower average win total than the best team has in a given season, but it would have a greater average win total than even a very good projection.

Looking at it Sean's way, there is about a 6.26% chance of any individual "team" finishing at exactly 81. Not accounting for interdependency of results, the odds of that happening 30 times in a row would be 0.000079%, or a bit less than one in a million.

Just to be obnoxious, 6.26% is binomial. There are a set number of wins in a 2430-game season, so if you're not flipping the coin 2430 times, you want hypergeometric. So 6.37%.

PECOTA usually includes a category called Average Win Total or something like that that I find very interesting. Basically it would rank the AL East teams (like with Clay's list) but it would show that while the Rays are projected to win 90 games right now the average AL East winner finishes with 95 wins in the various runs. Those numbers usually look a lot more like the real standings than the projected versions.

I think SG does that in the RLYW blowouts.

Edit: Yes, he does. Looking back at last year's blowout is fun. Blue Jays at 29% for the division (Red Sox at 15), Angels at 40%, Nationals at 45%, Giants at 28%. Whoops!

Actually, there's a bigger nit to pick. The chance of something that is 6.26 (or 6.37%) likely to happen 30 times in a row is .0626^30, which is a number so big I'm not sure what to call it, something like 1 in a 1 followed by 36 zeros big.

I see the error, you got .00079% by taking .626 and raising to the 30th power. That's the equivalent of taking a likely event (.626 winning percentage is about 100 wins) such as the best team in baseball winning a single game. The best team in baseball winning 30 in a row? Very unlikely - less than one in a million. Take an unlikely event and make it happen 30 times in a row and the numbers get silly.

I see the error, you got .00079% by taking .626 and raising to the 30th power. That's the equivalent of taking a likely event (.626 winning percentage is about 100 wins) such as the best team in baseball winning a single game. The best team in baseball winning 30 in a row? Very unlikely - less than one in a million. Take an unlikely event and make it happen 30 times in a row and the numbers get silly.