Comparing 2014 Projections – ERA and WHIP

Yesterday I ran comparisons of several projections systems for an all-inclusive batting statistic, wOBA. Today I’m running the same tests, computing root mean square error (RMSE) and mean absolute error (MAE), for two commonly used fantasy statistics, ERA and WHIP. These tests are bias-adjusted, so what matters is a player’s ERA or WHIP relative to the overall average of that system, compared with the player’s actual statistic relative to the actual overall average. The lower the RMSE or MAE, the better a projection system predicted the actual data.

Not surprisingly the errors, even as a percentage of the average, are much higher here than for wOBA, as pitching performance is more volatile than batting performance. Will Larson’s projections did best here, followed by the Consensus, CBS Sportsline, and Steamer. All the models handily beat using 2013 data, albeit not as decisively as with wOBA, but but seven lagged behind Tango’s Marcel system. The other notable thing to me is that the average ERA for these players of every system is higher than the actual. Indeed 2013 actual data was slightly lower, so this shows how dependent systems are on older historical data. Aggregate ERA has fallen sharply since 2012, and the projection models are still reflecting that higher run environment to a large degree.

When you’re preparing for a fantasy auction, you care much more about how a system rates players relative to each other than how it rates them compared to the actual data. For these 75 pitchers, Steamer projected an aggregate 3.81 ERA, while they actually had a 3.43 ERA (weighted by actual 2014 innings pitched). But Steamer still ranks lower in errors than Fangraphs Fans’ projections, which came closest to the actual average of 3.44. This is the impact of bias adjustment: while the fans did better at projecting the actual league average ERA, Steamer tended to be closer on more individual players when adjusted for its league average.

To underscore that point, here’s the same ERA table as above, but this time doing raw errors, i.e. not doing any bias adjustment at all:

Source

Num

Avg ERA

MAE

RMSE

Actual

75

3.4261

0.0000

0.0000

CBS

75

3.5056

0.5900

0.7739

Zips

75

3.5525

0.6152

0.7820

Fans

75

3.4440

0.5901

0.7823

Razzball

75

3.4998

0.5887

0.7834

RotoChamp

75

3.5329

0.6043

0.7923

All Consensus

75

3.6435

0.6250

0.7993

Marcel

75

3.5315

0.6184

0.8069

Oliver

75

3.5351

0.6375

0.8271

AggPro

75

3.7284

0.6699

0.8273

ESPN

75

3.7014

0.6477

0.8278

RV Pre-Australia

75

3.6518

0.6385

0.8314

RotoValue

75

3.6512

0.6384

0.8317

Larson

75

3.7608

0.6769

0.8336

Davenport

75

3.5964

0.6460

0.8394

CAIRO

75

3.6321

0.6812

0.8546

Steamer/Razzball

75

3.8057

0.7031

0.8577

Steamer

75

3.8116

0.7054

0.8592

MORPS

75

3.6513

0.6995

0.8711

RotoGuru

75

3.8089

0.7216

0.8758

y2013

75

3.3790

0.7054

0.9537

Bayesball

75

3.9120

0.8315

0.9704

Now Fans ranks well above Steamer, which is near the bottom in this test. The lower errors here tend to come from systems that come closer to matching the actual league ERA. Even the naive 2013 data as a forecast now is no longer dead last by a wide margin, while CBS, Zips, and Fans are the best performing systems here. Comparing these two tables shows the impact of bias adjustment. What matters most for fantasy valuation is how players compare relative to each other, and not how well some system predicts the actual run environment players are in. So the first table is a better comparison of projections for fantasy purposes.

Both these tables are “apples to apples”, comparing only those players that each system projected. And it’s a small set of overall better than average pitchers, a group which overall is easier to project than a deeper set of MLB pitchers.

But of course if you’re in a fantasy league, it doesn’t help when a system doesn’t project someone you might care about. So this next table will use an ERA of 0.50 worse than the system’s league average for anybody not projected, and compare against a set of almost 700 pitchers:

Source

MLB

ERA

StdDev

MAE

RMSE

Missing

Actual

672

3.7395

1.4707

0.0000

0.0000

0

Steamer/Razzball

672

4.0077

0.5245

0.8759

1.4168

215

All Consensus

672

3.9776

0.5452

0.8773

1.4251

4

Davenport

672

3.8822

0.4985

0.8856

1.4270

198

Steamer

672

4.0087

0.5731

0.8883

1.4280

37

Oliver

672

3.9382

0.6424

0.9090

1.4418

31

Bayesball

672

4.0215

0.5219

0.9252

1.4427

222

Fans

672

3.7886

0.4485

0.9017

1.4473

465

AggPro

672

4.0252

0.3698

0.9071

1.4483

557

Razzball

672

3.8558

0.3791

0.9045

1.4485

524

RotoValue

672

3.8888

0.4810

0.9093

1.4488

32

RV Pre-Australia

672

3.9020

0.4852

0.9080

1.4505

42

Larson

672

4.2586

0.4348

0.9232

1.4540

463

ESPN

672

4.0331

0.5404

0.9056

1.4563

336

Marcel

672

3.7990

0.4901

0.9103

1.4587

136

Zips

672

4.0703

0.7450

0.9207

1.4640

69

CAIRO

672

4.0489

0.6991

0.9369

1.4758

94

CBS

672

4.0622

0.4768

0.9404

1.4799

462

MORPS

672

3.8850

0.6547

0.9492

1.4907

143

RotoGuru

672

4.3407

0.6741

0.9486

1.5056

150

RotoChamp

672

3.8623

0.6814

0.9398

1.5199

249

y2013

672

3.8355

1.7932

1.2610

2.2525

169

Here systems get more credit for projecting more players, so long as those projections are better than the default 0.50 worse than average. This shakes up the order quite a bit. Now Steamer/Razzball does best, followed by the Consensus, Clay Davenport, and Steamer. Will Larson, the winner of the test of fewer (and overall better) players drops to the middle of the pack, while CBS Sportsline and Zips now slip behind Marcel. Bayesball, which ranked worst in the earlier test, improves markedly. The overall errors increase quite a bit, as we’re now comparing projections for a much deeper set of players, many of whom have much less of a track record.

Finally, let’s take a look at WHIP. First, the “apples-to-apples” table comparing only the 75 pitchers projected by all systems:

Source

Num

Avg WHIP

MAE

RMSE

Actual

75

1.2036

0.0000

0.0000

CBS

75

1.2077

0.0874

0.1175

All Consensus

75

1.2360

0.0899

0.1205

Zips

75

1.2214

0.0943

0.1229

ESPN

75

1.2397

0.0963

0.1231

Larson

75

1.2562

0.0954

0.1239

Steamer

75

1.2616

0.0963

0.1246

Marcel

75

1.2238

0.0970

0.1254

Steamer/Razzball

75

1.2608

0.0996

0.1271

Fans

75

1.2030

0.0993

0.1271

RotoGuru

75

1.2364

0.0967

0.1272

Davenport

75

1.2487

0.0981

0.1284

RV Pre-Australia

75

1.2276

0.0977

0.1286

RotoValue

75

1.2275

0.0979

0.1288

Oliver

75

1.2456

0.0978

0.1303

RotoChamp

75

1.2090

0.1069

0.1335

MORPS

75

1.2464

0.1047

0.1336

Razzball

75

1.2012

0.1108

0.1388

AggPro

75

1.2412

0.1170

0.1415

Bayesball

75

1.2794

0.1134

0.1431

y2013

75

1.1895

0.1097

0.1469

CAIRO

75

1.2603

0.1203

0.1495

This time the CBS Sportsline projections wind up with the lowest errors, followed by the consensus and Zips. Marcel actually beats most systems in this test. As a percentage of the projected average, the errors are smaller for WHIP than ERA, which makes sense since WHIP stabilizes more quickly, but the spread in errors of WHIP between systems is much wider, so systems vary more in their projections of WHIP than ERA for this sample of pitchers.

Finally, here’s the table using a WHIP of 0.10 worse than the projected league average for missing players:

Source

MLB

WHIP

StdDev

MAE

RMSE

Missing

Actual

672

1.2746

0.2381

0.0000

0.0000

0

Davenport

672

1.3088

0.0951

0.1457

0.2253

198

Steamer/Razzball

672

1.3155

0.0955

0.1472

0.2256

215

Steamer

672

1.3148

0.1027

0.1480

0.2264

37

Fans

672

1.2716

0.0829

0.1462

0.2281

465

AggPro

672

1.3036

0.0693

0.1478

0.2286

557

ESPN

672

1.3082

0.1080

0.1467

0.2290

336

Razzball

672

1.2770

0.0846

0.1495

0.2304

524

Marcel

672

1.2838

0.0888

0.1504

0.2311

136

Bayesball

672

1.3262

0.0928

0.1537

0.2317

222

All Consensus

672

1.3163

0.1202

0.1487

0.2318

4

Oliver

672

1.3324

0.1364

0.1507

0.2330

31

Larson

672

1.3506

0.0857

0.1520

0.2332

463

RotoValue

672

1.2871

0.0968

0.1525

0.2335

32

RV Pre-Australia

672

1.2891

0.0974

0.1523

0.2336

42

Zips

672

1.3289

0.1501

0.1562

0.2379

69

CBS

672

1.3189

0.0962

0.1574

0.2380

462

RotoGuru

672

1.3139

0.1231

0.1576

0.2405

150

MORPS

672

1.3036

0.1243

0.1583

0.2406

143

CAIRO

672

1.3809

0.1890

0.1805

0.2593

94

RotoChamp

672

1.2640

0.1981

0.1757

0.2892

249

y2013

672

1.2950

0.2538

0.2035

0.3252

169

In this test Clay Davenport’s model edges out Steamer/Razzball and Steamer for lowest RMSE, with Fangraphs Fans, AggPro, and ESPN close behind. Marcel beats more than half the models, as in the test of the 75 players projected by all, but now for the first time in these tests it also beats out the consensus, which is usually among the best performing systems. But here, the wider spread among systems in errors might work against a crowd-sourcing approach which usually does quite well with other stats.

Projections systems vary much more on WHIP than they do on ERA (or wOBA), but in general they all perform much better than 2013 data. Yet while for wOBA, most systems usually beat the benchmark of Tom Tango’s Marcel system, for these pitching stats, Marcel is still quite good. Projecting pitching is harder than projecting hitting. Fantasy veterans know that already, of course, but these numbers support that conclusion also.