Tuesday, August 30, 2016

The Odds Ratio Method

So the other day I wrote about how I break projections down into eight binary components. I do this for everything: batter projections, pitcher projections, league averages, and all kinds of splits.

Then for each set of components except for batters, I'd convert them to a factor of the league average, where 1 equals league average.

The first component is $BB, which = (BB + HBP) / PA.

Jered Weaver has a $BB factor of about .86 (7.3% / 8.6%). Angel Stadium in Anaheim has a $BB factor of .98 (yes, ballparks can affect walk rates, too). Left-handed batters facing right-handed pitchers have a $BB factor of 1.02, and batters on the road have a $BB factor of .96.

Then all those factors are multiplied by the batter's rate to find the probability for the match-up.

Joey Votto has a $BB of 19%. So when he faces Jered Weaver at Angel Stadium tonight, we would expect him to walk in 16% of his plate appearances against him (19% * .86 * .98 * 1.02 * .96 = 16%).

OK. Let's try another example: the Orioles' Chris Davis vs. the Cubs' Aroldis Chapman. Davis' $BB rate is 14%, and Chapman's $BB factor is 1.29. So the $BB of their matchup would be 14% * 1.29 = 18%. We'll ignore other factors this time and leave it at that. 18% seems pretty reasonable.

Next, $SO. Chris Davis' $SO rate is 38%, one of the highest in the majors for a regular. Chapman has a $SO factor of 1.94 - nearly twice the league average. So the $SO for their match-up would be 38% * 1.94 = 74%. I'm sure a Davis/Chapman match-up would produce a lot of whiffs, but 74% seems a little extreme. And that's without multiplying in the lefty-lefty platoon factor of 1.11, which brings the $SO rate for their match-up up to 82%.

Now imagine a hitter even more prone to strikeouts - a player with a 60% $SO rate against the MLB average. Against Chapman, his $SO rate would be.....116%! Obviously, no batter, no matter how inept, can strike out more times than he has opportunities. Anybody - if I went up there against Chapman, if your grandma did - would have a less than 100% chance of striking out, even if it was 99.9%.

So my old system was breaking down at the extremes. High-strikeout batters facing high-strikeout pitchers were being underrated. Power hitters facing homer-prone pitchers were being overrated.

Back in the '80s, Bill James introduced the Log5 method to find the probability that Team A will defeat Team B, based on Team A's and Team B's winning percentages. Later, with the help of a colleague, he expanded on the formula to account for an average other than .500 (an average winning percentage), so the formula could be used to find the probability of any kind of match-up - batting average, on-base percentage, free throw percentage - where the league average is something other than .500.

Bill recently wrote about the method again on his website (again, subscription required), going into detailed description (as only Bill James can) of the logic behind the method and the steps involved in figuring it.

"So, you have a hitter with an OBP of .400 in a league of .300 facing a pitcher with an OBP of .250 in a league of .350, and they are both playing in a league (or park) where the OBP is expected to be .380 for the league average player. What’s the resulting OBP?

So if anyone more schooled in math than I am (or who can decipher the above equation) is reading this...help? Tango? (edit: I've since figured this out - Tango just typed the formula in a confusing way. The odds(environment) - the .380/.620 - is not in the denominator, as it looks like in the formula.)

...which is the formula I currently use in my spreadsheet. x = batter, y = pitcher, z = league. Gee, maybe I should test it to make sure it gives the same result as Tango's Odds Ratio Method.

Plugging the $SO rates for Davis and Chapman into it:

((.38*.43)/.22)/(((.38*.43)/.22)+((1-.38)(1-.43))/(1-.22)) =

(.1634/.22)/((.1634/.22)+(.62*.57)/(.78)) =

.7427 / (.7427 + .4531) = .7427 / 1.1958 = 62%

Ok, good. They're identical.

So from there, I multiply this match-up rate by all the other factors (platoon, park, etc.) like I did before. So with platoon factor multiplied in, the Davis/Chapman $SO is 62% * 1.11 = 69%.

This probably still isn't right. You're probably supposed to include the other factors in the rates for the batter or pitcher. Tango commented on Bill's Log5 article:

"Now, the power of the Odds Ratio form is that you can extend it to include other variables. Say you want to include the Home field advantage. That's a .540 record for the average team, or .54 wins per .46 losses, or 1.17 wins per loss.

"You can also use it for things other than a .500 baseline, and include even more things like batter v pitcher to include home field and platoon advantage, etc. It's a bit more complex to account for the non-.500 baseline, but it flows right in once you see it."

I don't see it...not yet anyway. But this method is more right than my old way of doing it.