Why Oliver Loves Yu

It looks like Yu broke Oliver. That’s Yu Darvish; Oliver is the engine of The Hardball Times Forecasts. It’s not the first time it’s happened, but when a player so dominates his non-major league competition that that his derived major league true talent exceeds generally accepted norms, it offers an opportunity to examine the system and make some changes for the better.

Darvish’s performance against batters in Nippon Professional Baseball, the world’s second best professional league, is indeed mind-boggling: consistently low hits, home runs and walks, with more than a strikeout an inning.

Patrick Newman of npbtracker shows pitch type, velocity and usage rate for pitchers in that league. This past year, Darvish’s fastball sat at 94 to 95 mph, with a slider in the low 80s, and a high 80s change-up. He also mixes in a low 90s cut fastball, forkball, shuuto and slow curve.

Still the question remains, how accurately can that performance be projected into a major league equivalent? The standard process is to find as many players as possible who have played in both leagues, comparing their performance, as a group, in both situations.

If, for example, starting pitchers might translate differently from relievers, players can be divided into different groups that better fit their role and profile, but at the risk of having the comparisons based on smaller, and thus less reliable, sample sizes.

Oliver’s Japanese translations are based on the performances of 260 pitchers who have performed on both sides of the Pacific from 1998 to 2011. Of these, 185 have been North American players who have gone to Japan, with 75 Japanese pitchers coming here, but only 28 of those 75 appearing in the major leagues. Since 1998, only five pitchers who were starters in Japan were given starting roles in the majors.

Oliver is rule based. Given a supply of play by play and seasonal data, I write code that describes how different parts of the data relate to one another. If I believe Darvish’s translations are too strong, adjusting the code will also affect every other Japanese pitcher. Changes must be made in a way that balances the performances of all in the group. There did appear to be differences in whether the pitcher started his career in North America or Japan, and whether he was a starter or a reliever. After adjustments were made, Darvish’s projection hardly budged.

With a projected 2.57 ERA, give or take a few tenths, Oliver is putting Darvish ahead of every current major league starting pitcher. The Texas Rangers were willing to commit $111 million dollars over the next six years to procure his services, but can he realistically be expected to out-perform this projected list of 2012’s top 15 starting pitchers?

Let’s look at how Oliver’s past projections for Japanese starting pitchers compare to their actual performances. I will note that the major league performance is a weighted mean of the player’s first three seasons in the majors, with the first season weighted at 1.0, the second 0.7 and the third 0.5. This is the reverse ordering of how past seasons are used to generate the projections. No minor league data are included. Also, the projected ERA is based on the expected wOBA allowed, while the major league ERA is the actual, and not park adjusted.

Igawa was signed by the Yankees in 2007 and was expected to provide an above-average numbers of strikeouts, although accompanied by a few extra home runs. Maybe the pressure of working for George Steinbrenner was too much; Igawa allowed far too many walks and long balls and lasted only 12 starts that year and one the next before returning to Japan.

Ishii signed with the Dodgers in 2002, spending three years in their rotation. After one more with the Mets, he also returned to Japan. Wild in Japan, he walked even more here and also underperformed his projected strikeout rate, although the ERA projection was fairly close.

Kawakami joined the Braves in 2009 and had a respectable 3.86 ERA, but suffered through a 1-10, 5.15 year in 2010, then spent the entire 2011 season in the minors. He walked more and struck out fewer than projected (I’m beginning to notice a pattern).

The Japanese import everyone loves to hate, Matsuzaka did have two solid seasons, in 2007 and 2008, for the Red Sox, but injuries have kept him sidelined and/or ineffective for the past three years. Showing fine control his last two years in Japan, he’s issued an above-average numbers of walks in the majors.

I looked at three more pitchers – Hideo Nomo and Hideki Irabu from the 1990s, and Colby Lewis, who after never experiencing any success in the majors spent 2008 and 2009 in Japan before returning the past two years with the Rangers.

Irabu issued fewer walks but also fewer strikeouts than expected, and couldn’t avoid the long ball. Nomo was very wild in Japan but pitched much better than expected in the major leagues. Lewis’ strikeout rates were as expected, but his walks jumped up.

These last four were all primarily starting pitchers in Japan, but did most or all of their major league pitching out of the bullpen. All showed better-than-expected strikeout rates, with Uehara almost doubling his rate after the Orioles removed him from the rotation.

It is known that on average pitchers perform better out of the bullpen. Tango calls it his rule of 15: Home runs and walks down 15 percent, strikeouts up 15 percent. I believe I can improve the Japanese translation factors by adjusting the stats as starters and relievers to the same baseline before compiling sets of matched pairs. Where I have play-by-play data from Gameday I am able to tabulate how each pitcher has performed as a starter and as a reliever, which then needs to be regressed to the standard splits. However, the available seasonal level stats from Japan do not offer this breakdown. The number of innings pitched as a starter and reliever can be estimated, but the Japanese leagues have not published games started for the past three seasons.

The records for the eight starting pitchers above suggest that the translation factors currently being used by Oliver are too generous: As a group, the observed major league performances of the eight compared to their projections were 0.99 for base hits (BABIP), 1.11 for home runs, 1.24 for walks and 0.91 for strikeouts. But, how much more should we trust the record of eight starting pitchers in the majors compared to the 75 Japanese pitchers who have pitched in the minors and majors over the past 13 seasons? How much different should we expect them to be from the 185 pitchers who have left here for Japan?

The first line is Darvish’s current Oliver projection, while the second shows the rate stats adjusted for those eight starters (still very good).

These are Darvish’s top comparables using his current projection—a higher ERA than 2.57, but the top five still puts him right at the top with Kershaw and Strasburg, while a larger sample of comps still rates high enough to rank him fifth of sixth in the major leagues.

For the final set of comparable projections, I used a defense independent approach, using only groundball, walk and strikeout rate. Assuming that major league baseball has a slightly lower rate of ground balls than the Nippon league, I found Darvish’s top comps using a ground ball rate of 0.55, a walk rate of 0.071, and a strikeout rate of 0.248. There’s no difference between the different sized groups, each with a composite ERA out of major league baseball’s top 15, but much of the ERA difference between this and the previous sets of comps is in the home run rate, almost 50 percent higher here than in Oliver’s projection.

Yu Darvish is clearly a very talented pitcher, enough that the Texas Rangers were willing to put $51 million down and $60 million over the next six years to have him in their starting rotation. Just how well his future major league performances can be projected is a work of art, with different available methods where even small changes in estimated base hits allowed can vary the ERA estimate by a few tenths. Oliver has had a good record so far, such as with Stephen Strasburg and Ian Kennedy. However, players have some amount of natural variance each year as well as changes in their true talent.

Examining several sets of comparable pitchers shows an expected ERA for Darvish anywhere from 2.78 to 3.40, which is from excellent down to merely very good, but no recent major league pitchers have the combination of Darvish’s expected home runs, walks and strikeouts. Looking at those comparables and Darvish’s pitch metrics give me a personal opinion: I would compare him to Felix Hernandez with more strikeouts or Ubaldo Jimenez with fewer walks.

Meanwhile, as these customized estimates all gave a higher ERA projection than Oliver, I’ll retreat to my office, where first things on the drawing board are incorporating ground ball rates to give regression means for base hit and home run rates, and separately consider pitching as a starter and reliever.

About Brian Cartwright

In addition to writing for The Hardball Times, Brian has written for FanGraphs, consulted for a Major League Baseball team and invented the Oliver projection system. Follow him on Twitter @blcartwright.

Comments

Great article, Brian! However, Kei did not return to Japan. He still pitched in the Yankees org and is now a FA looking for work. He reportedly does not want to go back to the NPB and wants to play MLB.

Sorry, should have googled for Tango’s rule. 15 was in my head because that’s what my own research found, but I didn’t find as much effect on BABIP, which was -4% for relievers, HR -15%, BB +2%, SO +14%

The problem with Darvish is not his physical skills, but between his ears. One has to live in Japan to see just how entitled the players feel about themselves. Darvish is nowhere near Matsuzaka’s arrogance level, and he seems to be much smarter, but that really isn’t saying that much. He also gets away with a lot of mistake pitches that MLB players should tee off on. If Darvish really takes coaching well, he could be great, sure. The Rangers are a really good fit for him, especially with Nolan Ryan around.

If Darvish’s rate of getting away with mistake pitches is the same as other pitchers in Japan, then it’s part of the translation factors already. The problem in projecting is when a pitcher does something like that consistently differently than others.

It would be very impressive if Darvish can produce a 0.4 HR/9 in Arlington. His season ought to be fascinating to follow.

I love seeing that my NL-only fantasy team contains four of the top 15 projected MLB ERA leaders (Strasburg, J. Johnson, Latos and Wainwright), all for a total of less than $20. I sure hope those numbers come to pass.

Thanks for the great article and groundball info which has been impossible to find.

Here is an issue I see with the numbers, though. Your BB% and SO% for the Darvish projection in the middle of the page seem to be based on a different number of total batters faced when compared against the projection at the top of the page.

In other words you have him listed at 198 K’s and a 9.6 K/9, which would suggest around 729 batters faced to come up with a SO% of .272. However, the 41 BBs and 2.0 BB/9 rate suggest that it is based on a total batter count of about 708 batters to get the BB% of .058. Is this just an odd quirk of Oliver in the way that the percentages are calculated or are they supposed to be based on the same number of total batters and therefore incorrect?

The projected BB% and SO% will differ from the raw original data, as they’ve been adjusted for park and league. I then compared the translated data to actual MLB performance to check the accuracy of the translation.

NPB adopted a new ball standard in 2011, which droped the HR% to 64% of previous, so the .009 HR/BC in 2011 is equivalent to .014 in the other seasons.

When I do all the projections, I have a set of 260 pitchers who have pitched in Japan and the US from 1998-2011. The factors are calculated so that the average error is zero, where all the plus errors cancel out the minus errors. Regression helps reduce the total error (does not care whether high or low) by bringing all the projections closer to the center, thus reducing the outliers.