Sleepy51, even if he would account for defense better, he wouldn't improve his model enough. As I pointed out the team adjustment brings his correlation coefficient to 0.94 and without that it is 0.71. He is justifying his model with that high correlation to winning. The defensive adjustment is basically justified via a residual. Well, Berri is not directly using a residual, but overall it is just that. It is also not accurate to say he got his whole model via regression, because he only did two regressions. The first regression he did was to determine A, B and C for the following formula:

Win% = A*ORTG+B*DRTG+C

A, B and C are presented on his page as the coefficients next to offensive efficiency, defensive efficiency and constant term.

What he is now proposing are those two formulas for PA and PE. At this point he is not using a regression to determine the marginal values, he is using league average values. The differentiation gives us:

The values for PTS, PE, DPTS and PA are the league average values in his data set. If you use a different data set you will get slightly different values.

In the next step he is using the marginal values and is assigning them accordingly to boxscore stats which are related to PTS, PE, PA and DPTS. You can easily see that by the numbers for free throws. (1-0.47)*0.032 = 0.017 and 0.47*-0.032 = -0.015. Well, by chosing 0.47 as the factor for free throws he set the break even point for free throw shooting to 47 ft%. Nicely done. That just means a player can shoot 48 ft% on his free throws and he will help his team winning more than a guy who is shooting 49 fg% on his two point field goals. In fact the latter will hurt his team while the other will help.

At the end he is coming up with the conclusion that a player has to shoot over 50% from the field on his two point shots to help his team winning games. That is above the league average.

Where does the problem comes from? The formula for possession employed works for the overall team, but not for the individual player. Adding up FGA+0.47*FTA+TO-ORB will not give us a good estimation of the possessions a player used. To understand that: If we applying this formula to a player like Love, we are getting right now a 152.8 ORtg for him. He scored 1111 points with 777 FGA, 351 FTA, 120 TO and 247 ORB. According to Oliver he has 123 ORtg. Now, if we look at a guy like Dirk Nowitzki who is scoring at a higher rate and with a higher efficiency while turning the ball over less, we get 114.4 ORtg, while according to Oliver he has 118. Or my new favorite upcoming franchise player Kris Humphries has 141.7 ORtg, if we apply Berri's formula for offensive efficiency.

As you can see the problem is not only defense, the problem is that the formulas Berri is using can't be applied on players. They will give you absurd numbers for guys who are getting offensive rebounds. Well, if you add those numbers up for each player on each team, it will for sure give you exactly the teams overall number, but that doesn't mean that the distribution among players is correct.

Btw: If we are using the same methodology as they used in the last GSW blog entry, we get 23.0 wins via WS/48 and 22.7 wins via my PRA as the prediction. Both are way closer to the reality than WP48. Not surprising at all. Both are also more consistent from year to year, the same goes for PER. I could calculate the average PER for the teams over time and make a regression to estimate the linear formula WIn% = A*PER+B, but well, I have the feeling it is also closer to the reality anyway.

mysticbb wrote: At this point he is not using a regression to determine the marginal values, he is using league average values. The differentiation gives us:

Well then he is a chowderhead.

Overall, I think I was more focusing on the point that ANY regression analysis model for understanding/predicting individual player impacts on team results would benefit from including differential defensive stats rather than boxscore team defensive stats. I would argue that there is a qualitative difference between how those two data sets can function in any model due to "how basketball works" issues.

Well, here is how you can calculate the marginal value for points scored (MVP) by using this seasons data:

MVP = A/PE

where A is 3.442 and PE = FGA + 0.47* FTA + TO - ORB

The values for FGA, FTA, TO and ORB are per game numbers for the league average, but we can use the total average numbers right now and multiply that with the amount of games. Currently an average team has 4160 FGA, 1267 FTA, 738 TO and 561 ORB. They played 51 games. That gives us:

MVP = A/PE*51

MVP = 3.442 / (4160 + 0.47*1267 + 738 - 561) *51

MVP = 0.036

He gets 0.032, because he is using a different data set. That's all.

What Berri NEVER showed is that the formulas can be used to evaluate players. He showed that it is doing a good job at "predicting" wins in hindsight, but that's all. But it is not doing a better job at this as scoring margin. Which isn't a suprise at all, because that's what he is trying to reproduce.

Paydro70 wrote: The notion that 5 Tyson Chandlers (or his small-man equivalent) would win 80 games just reflects a complete lack of understanding of how basketball works.

In fairness to WP, this will be true of any stat whose metric is some form of win shares. These win shares are situational.

If you look at the 5 best centers in the league they should be generating tons of wins. That doesn't mean that playing all 5 of them on a team at the same time is a winning strategy, and that fact doesn't in itself make the metric bad. The metric is about how much are these players helping you *in the context of their team, minutes, player combinations, etc*.

If we could somehow fix WP to assign credit more accurately, it would still fail the test you apply above, because that's not really a legitimate test.

take the box score and strip out all the info except minutes, points and FGA, and then do the same regressions and team adjustments that WP does to tune the results.

How well can this be made to correlate with wins in past seasons?

part II:

Now do the same thing again, but take out FGA as well.

Mostly I'm wondering if you can get an impressive correlation with wins this way. It would no longer amount to counting up possessions and then factoring in defensive and offensive efficiency, because we've stripped too much data out to achieve that. But it might correlate pretty well anyway, which would be interesting if true.

I'm thinking that counting up the points on the team, plus having the "team defensive rating" would basically be like having scoring differential, which is known to correlate well with wins.

floppymoose wrote:I'm thinking that counting up the points on the team, plus having the "team defensive rating" would basically be like having scoring differential, which is known to correlate well with wins.

Correct.

floppymoose wrote:nice. do you know how that compares to regular WP?

Uh, I had a spreadsheet for WP, but I deleted it, because it wasn't worth much anyway. I probably could make a new one and compare the results for players. But well, I doubt I make it today. If anyone has the desire, the model is there.

floppymoose wrote:And the other part fo this I wonder about is how the DEF_A is calculated. If it was tuned in some way to make WP the best it could be, perhaps it needs to be retuned in both of the above examples.

Well, I took 106.5-DRTG = DEF_A, which means I just compared the team defensive rating to the average offensive/defensive rating from 1979 to 2010.

floppymoose wrote:And it also doesn't look like you used minutes? I ask because I'm wondering about an apples to apples comparison with WP48.

Well, didn't saw the minutes part, but it will not change much anyway, because I controll the model basically via ORtg and DRtg, thus the minutes and FGA will get skipped. And if you want to go by minutes, you calculate the win%, which means that is per game (or basically per 48 minutes anyway).

Well, it is the ranking of the biggest volume scorers in the NBA. It is doing pretty much a perfect job at this. :)

But overall it just shows that you can control the correlation via a defensive adjustment and can even get a "good" ranking out of this without using more informations than minutes, pace, points and DRtg. 0.97 correlation coefficient to winning, why should we questioning "our" model?

mysticbb wrote:But overall it just shows that you can control the correlation via a defensive adjustment and can even get a "good" ranking out of this without using more informations than minutes, pace, points and DRtg. 0.97 correlation coefficient to winning, why should we questioning "our" model?

Indeed. That was what I suspected, and why I proposed the "thought experiment".

Which you turned into an actual test, and verified my suspicions. You get the awesome award!

Did anyone notice that dberri replied? In his response, he took a shot at floppymoose's username, and then said correlation to wins isn't everything Wins Produced is about -- in fact, other factors went into producing it (which he doesn't mention, but he is referring to the regressions).

He has since deleted his reply, probably because he realized how passive-aggressive it sounded and also because most of his drones follow his model because, as he likes to repeat every other day, it explains 90 - 95% of wins. Clearly, FM48 explaing 97% of wins threatens his cred. Better to "take the high road" and ignore the riff-raff so the Wages of Wins myth can continue to be perpetuated.