The absolutely most incorrect way to judge a decision is solely by the outcome.

I’m pretty much stealing this straight from Dan Ariely. So if you want a quality author’s take on the subject I suggest you hit up his blog and come back.

Alright I know this sounds crazy. When we evaluate a decision we should look at a few things beyond the outcome of the decision. For example…

What did you know when you made the decision.

What did you think would happen?

What actually happened?

Why did what happened differ from what you thought?

I’ll give a basketball example. The Memphis Grizzlies took on Zach Randolph’s terrible contract in 2009. The end result is Zach Randolph has been one of the premier power forwards in the league the last two seasons, and furthermore, the Grizzlies were a power in the playoffs. However, here’s what we could have said in 2009 with all of the data we had on Randolph:

Zach Randolph was 28, meaning he was unlikely to improve past what we’d seen.

Zach Randolph’s performance had been marginal at best in the past.

Zach Randolph was an expensive player

Zach Randolph had issues on and off the court.

So when we judge the Zach Randolph decision should we rate it highly because things turned out ok? NO! It was a terrible decision. Rewarding it is rewarding luck. Unlike good management, there is no guarantee luck will continue in an organization.

When we evaluate a analyst’s predictions we have to evaluate why they said what the said. Was their information good? For instance, Arturo thought Portland would be very good. Unlike other analysts he didn’t know Roy’s knees were so bad they could not be insured. Additionally, Arturo thought Przybilla, Camby and Oden could stay at least healthy enough between the three of them to be a force. Oden and Pryzbilla were lost for the season and Camby had injuries. Anyway to sum up:

If we judge based on results — without looking at context — we are not doing good work and we may just be rewarding luck!

Make sure your test of a model actually tests the model.

Alright recap time. The Wins Produced model begins with the individuals box score stats, and uses this information to statistically measure a player’s production of wins. This model is very good at explaining wins (unlike Player Efficiency Rating and NBA Efficiency) and is fairly consistent year to year at a player level (unlike plus-minus and adjusted plus-minus).

Okay, let’s have a pop quiz!

What should we use to evaluate this model? If you said an individual player’s contribution you are correct! I’ll actually show an example of Wins Produced being used for predictions in a way that I think is fair.

Last season our analysts tried to determine which players would win specific awards (using the Wins Produced metric as a barometer) and guess what? The analysts actually did a good job guessing which individual players would be good in specific categories. In fact as a consensus our analysts were top three for every major award using Wins Produced.

So using a metric that judges individual’s performances to predict an individual’s performances seems to work.

Now pop quiz number 2! Tell me all of the things that are involved with how well a team plays. Here’s just a few I could think of.

Individual performances

Trades

Injuries

Breakout seasons for young players

Breakdown seasons for older players

Minute allocation by coaches.

Of this list, Wins Produced (as well as metrics like Win Shares, WARP, etc…) attempts to handle only one of these factors.

Pop quiz number 3 (final one I promise) does the Wins Produced — or any other metric designed to measure player performance — address any other factor on the list. If you answered no, you’re right! Estimating an individual player isn’t too difficult. Estimating 400-500 of them and a bunch of other things is. So when you try and judge how well a metric designed to measure one thing does when applied to many other values, you are really not testing the model. For example Win Shares is listed. The “prediction” was made based on a bunch of simulations run on the league as it stood at the end of the 2010 season. Comparing it to a different league with many changes is not a good test of if Win Shares is a good metric.

I want to make it clear that I greatly enjoyed wiLQ’s post. I think it is a lot of fun (or painful when I’m reminded of my guesses) to look back at predictions analysts make. What I want to avoid is the notion of anyone reading that this is somehow a good test of one metric versus another.

Summing up

The Wins Produced metric is very good if you want to:

Look at which players to reward for good performance on your team (e.g. sign players to new salaries)

Determine which players from last year will likely be good this year (e.g. free agents)

Determine which players may or may not be overrated/underrated (e.g. trades)

In short, the Wins Produced metric is an excellent tool if you’re a GM or fan and you want to explain or evaluate parts of basketball. Now we love this metric around here. It was a life changer for those of us watching Melo and A.I. put up 25+ points per game and still not contend. We’ve also sold it as an easy to use single number. That said I want to make one thing clear in case we haven’t already:

Just because you use the Wins Produced metric does not mean you should ignore other information!

We’re against things like Adjusted Plus Minus and PER because they’re bad information (i.e. they are inconsistent across time or they do not explain what the purport to explain). That said, we’re fully behind good useful information. Should you have a crack set of medical trainers that can evaluate players and help them if they get injured? Absolutely! Should you have a psychiatric team (Arturo’s idea btw.) to make sure players on your team are sound and to help them if they have issues (e.g. Beasley) You betcha! Should you analyze your coaches to see if they play the right players! Yes! All of this information should be considered.

As long as you evaluate information properly and use it correctly it’s useful. When you don’t do this… well, then what you are doing isn’t quite as useful.

24 Responses to "How to judge predictions"

I think marginally testable wins produced predictions begin like this:

“If players X, Y, Z stay healthy and play N minutes…”
or
“If player A gets lots of playing time..”

I don’t think predicting individual awards is a useful test of the WP model AT ALL. For instance, my pick for player most likely to win “Most Improved” is DeAndre Jordan. This is not, in fact, because I think Jordan will improve a lot, but because I think that a) he’ll improve a little and b) every beat writer in the NBA will be watching (and c) few of them will be watching James Harden play every night).

One could tell similar stories for the other awards. Predicting these accurately says nothing about the model, but does indicate perhaps that you have your finger on the pulse of the mainstream media.

To make it clear, my intention was not to judge or test models because for me such predictions are a simple version of fantasy games: even though it is just for fun you really try your best to nail it and you should be able to improve your results with experience just by eliminating most frequent mistakes.

I don’t think predicting individual awards is a useful test of the WP model AT ALL.

I’m not sure that was the context. The WoW network prediction wasn’t who would be CHOSEN MVP, but who SHOULD BE MVP, based upon the WP48 stat. Happy to be corrected!

I think that the power of a metric is best tested by its predictive behaviour for in-season trades. A good example why is David Lee – a WP48 star who fell off after a FA signing because of this: http://sports.cbsimg.net/images/nba/photogallery/DavidLeeelbow.png the most gruesome picture I can imagine. I can;t imagine anything measurable a pl;ayer can do, from rebounding to shooting, and blocks and steals, that isn’t affected by a dodgy arm.

In recent history, WP48 has done very well predicting result after a number of trades:

1. Chauncey Billups for AI.

2. Gortat, VC and Pietrus for Hedo, JRich and Earl Clark.

3. Tyson Chandler to Dallas.

4. Gerald Wallace to Portland.

Season to season performance is affected by many things, but in season trades are very good apples with apples comparisons.

Just a quick note (that doesn’t apply to the people who have posted so far on Dre’s column)… back when we were wagesofwins.net or dberri.wordpress, we had a comment policy. We will get that back up in our new home soon. But for now, let me summarize: If we deem your comments to be obnoxious and we think you are a troll, then we throw your comment in the trash. If you do not like this happening to your comments, learn how to behave like a grown-up and it will stop.

One should note… we prefer people posting who can be identified in some way. If you are getting around our crude filter by creating a wordpress blog that you don’t use… well, you are even more likely to find your comments in the trash.

Patrick,
The awards test was “whose Wins Produced will be the highest in each of these categories” So Motherwell is right there. In essence we said “Based on their performance last year can we predict performances this year” and the answer is yes.

wiLQ,
I loved the post and would love for it to expand. I think keeping track of things analyst predict vs. reality is an awesome endeavor.

motherwell,
Trades and free agency seem to be the most applicable use of the metric. Surprising that a metric developed by a Sports Economist seems to have the most application in the economics of sports huh? And thanks for remembering David Lee!

I have one problem with this kind of thing. It seems to assume player productivity is stable in different conditions. That may be a tendency, but it’s not always true.

Assume a statistical model has player “X” rated at a certain level of productivity.

Then he gets traded, a new coach is brought in that changes the system etc… and he winds up being way more or way less productive than in prior years.

IMO, you can’t give yourself a pass on that. It could very well be that the trade was made specifically because the new team or coach knew that the player was being used incorrectly, was playing with other players that diminished his value etc… or vice versa and that outcome was predictable by sophisticated analysis.

For example, I am predicting that Landry Fields will be less productive on a team with Tyson Chandler and Melo (at C and SF) than one with Turiaf and Gallo at those positions. That is and should be part of my analysis of the Knicks. And if I am wrong, it’s a fail.

The things that are unpredictable are injuries and mid season trades and to a lesser extent minute allocation and the rate of development of rookies and other young players etc… IMO those are things you should get a pass on. Otherwise, a model should be able to project wins fairly well.

ltk,
On the one hand I want to say avoid the exception becoming the rule. Player productivity is pretty stable year to year so it would be the top of my list for analysis as a GM.

That said, you’re absolutely right. When you make a trade for a player and you have a coach that has a system that won’t use that player or something that makes their skills redundant (e.g. New York has a top rebounder in Fields and they signed another top rebounder. I love the move but it may not be as great as they expect)

Basically I think you start with WP (is the player good) then factor in the other information and then make your decision. Frankly I’m not 100% sure how GMs make decisions currently.

I do not have a facebook, twitter, or gmail account. Yes, that may sound surprising, but it’s true. I made a wordpress so I could comment. I am not a troll. I have provided facts.

For instance, in the laker prediction post, I mentioned that Ebanks is likely to start and Barnes not get much time. Mike Brown named Ebanks the starter and I was right. Not to mention, he was incorrectly listed as a PF. Therefore, the prediction is invalid. I was trying to HELP the post.

Dre, I believe it was him, asked me to prove my position that Murphy’s defense negates all his offense/rebounding. And I did. It was censored.

I am not trolling. I am trying to help here. If I make actually trollish posts or obnoxious ones, then fine, censor them. But I don’t be doing that. Please allow my comments until that time. And yes, my response to you was harsh originally, but you attacked me first. It is unfair to attack me and then censor the response, which apparently led to this post here.

Metta,
Quick response on Christmas Eve (hope you are having a Merry Christmas)…

You did not “prove” your point. What you are doing is consistently exaggerating the quality of your argument (for example, we have gone through the many problems with plus-minus data in the past). And when you start offering an obnoxious tone to go with your relatively poor arguments… well, it is easy to just send your comments into the trash bin.

I am not going to waste my time reviewing arguments that have been made here over and over again in the past (again, we have covered plus-minus extensively). You are free to disagree. But if your tone is found to be offensive (to anyone charged with reviewing comments in this forum) don’t be surprised if your comments vanish.

I won’t bring up that post again in here, since this is a different one. I am new to these blogs so if I rehash old arguments, I hope people won’t mind undercutting them again. We should all be in good spirits as basketball finally returns!

Hello, i agree with your point completely. i just thing Zach Randolf is a bad example to prove it. i am not sure what he did as a Trailblazer, but he was very productive in a WP’s sense as a Knick, so his production could to some extent I believe, be expected..He rebounds unbelievably wel and scores productively around the basket. As long as he stays away from long jumpers his scoring has been productive. And the Knicks gave him away for nothing. Another bad trade. Okay, I am off the point. Merry Christmas all

Dan,
Thanks for the reply and an opportunity to brag about the new Wins Produced numbers on the site – http://wagesofwins.com/wins-produced/ – Randolph’s best season prior to 2009 was in 2004 when he was average in limited minutes. Anyway long story short Wins Produced has thought Randolph was overrated all the way until 2010.

Randolph improved his scoring efficiency by reducing his long jumpers. It’s possible that also accounts for the slightly better rebounding numbers because he hit a peak in OREB per 36 minutes last year also. When you are down in the post you have a better chance to get your own misses than if you are hoisting bricks from outside the 3 point line.

in my econometrics class, we practiced forecasting for future number, and I was wondering, can WoW specify the win produced model in such a way to do that on the team level? Maybe run the regression with additional lagged variables from seasons before and use all that this website knows about aging to make a correct economic prediction—it sounds like a fun idea, maybe I’ll try it!

Punkmoncrief,
When it comes to teams using efficiency differential (points per 100 possessions vs. points allowed per 100 possessions) works well. In terms of prediction the thing is some stuff is very predictable (player performance year to year) while other stuff is not.

I’d say team wise it is definitely worth building a model and throwing in as much stuff as you can think of. I’d be very interested in the results. Some of the difficulty may come from very predictable stuff (player performance) vs. hard to predict stuff (player minute allocation, effects of age, etc.)

Sorry for the late reply, but just getting to this now. The issue I have with pleading that there are extenuating factors (player injuries, minutes distribution, etc.) that could lead to this being a poor test of a model is that every other model has to deal with the same factors. So with the ‘gamechanging’ events that occurred, the higher performing models did better. I suppose one can attribute their success entirely to luck (better alignment with unpredictable events) or more correct assumptions, but in the latter case, and even otherwise, there is an element of expertise that is being rewarded.

And if, for instance, Hollinger’s performance is consistently good, we must ask is it PER, is he lucky, or is he better at making the necessary corollary assumptions (divvying up minutes played, etc.)? My first take is that the ability to predict how well a team will perform is argubly nearly as important to a GM’s position as is player valuation.

Westy,
First I’d say Hollinger’s performance is not that impressive. This year he essentially tied Vegas. In prior years he was close to the top but still lost to Vegas. Now sure if you set the bar as Vegas (the goal of which is to maximize the betting line) then he’s done alright. Again though last season he needed a 13-32 game window for each team to be “accurate”, two seasons ago he needed a 20-45 game window for wins. Ask yourself about that. . . Hollinger was accurate because he got 95% (29 teams) right within a 45 win window. . .

Gamblers are happy with a 53% margin. Essentially if you can get sports games to be a little better than chance you can make money. That’s great if your goal is to bet on all games. If you’re an individual team that’s not that helpful and if it was my advice would be to just use Vegas’ set odds each year. And again Hollinger is not predicting an individual team (which is what GMs want) he predicting the league as a whole and by being a little better than chance he is showing he is about as good as Vegas.

Whether beating Vegas is a high bar or not, didn’t he do better than everyone else? And if Vegas is the best, shouldn’t we be asking what Vegas is doing? If basketball is predictable, we should be able to beat ‘chance’. That nobody does it is an intriguing observation. I would argue that equalling Vegas places you far better than ‘chance’.

[...] Thunder have made some good moves and I will say they have been better than an average franchise. The problem is we are lured by the results and not the decisions themselves. The biggest factor in the Thunder’s success is luck and [...]