I'm not sure how position would be implemented reliably from the box score, height/weight or what?

It is relatively easy to determine position strictly from the box score. No, height/weight would not be included.

It seems kinda circular at that point. Not to say it wouldn't work. Would you lean towards a small or large number of positions/roles considered? (eg guard/wing/big vs Whitehead's 23 offensive roles - though that's pbp stuff) How many clusters do you get when you group guys by box score stuff? Never looked into it myself.

It is relatively easy to determine position strictly from the box score. No, height/weight would not be included.

It seems kinda circular at that point. Not to say it wouldn't work. Would you lean towards a small or large number of positions/roles considered? (eg guard/wing/big vs Whitehead's 23 offensive roles - though that's pbp stuff) How many clusters do you get when you group guys by box score stuff? Never looked into it myself.

No, it would be a separate regression, likely set up to produce values between 1 and 5. It would not really be circular for the purposes we would use it here.

Instead of treating minutes played as a linear variable, what about a step-wise function? Maybe treat players over 25 minutes the same or closer to the same than in the linear approach. 15-25 minutes per game and a high % of games when health as the second step. Then third and fourth steps. The argument here is that the LEVEL of how much you play is determined by relative quality but the exact level of minutes is also influence by team need which is different. I dunno of this makes much difference but I float it as a possibility.

Why not include FT rate in current or future BPM? If 3pt rate can give bonuses, I'd want FT rate to give bonuses and penalties. It affects "space" and overall team scoring rates.

Personal fouls are not included at all? What was the analysis that lead to that? Any re-think? What about deductions for technicals and ejections?

Any consideration of when a player shoots in the shot clock and during the game? Adjustments for carrying more or less of clucthtime and crunchtime shooting?

Any consideration of including on/off data (a la PIPM or metric blends)? Raw or RAPM. If not, what is the rationale or defense?

Should blocks against be treated as worse than regular misses? Or better? What does the data show for recovery rates, opponent points off those that result in change of possession an own points on second chances?

Charges taken? You probably aren't going to do because it probably isn't available deep in past. But does BPM 2.0 have to go deep into past? I mainly care about now and future.

If versatility is important then it would seem that positional versatility is important too, in general or especially with regard to defense. Would you consider using Knarsu3's data on that?

If minutes are considered, what about age? Both have correlations. Which is stronger?

If minutes are considered, what about salaries? Both have correlations. Which is stronger?

Minutes relative to age and salary?

Instead of straight height, did you ever consider using height relative to league average for main position? Weight is probably shakier and less likely but not including is a choice.

FT% as a proxy of "true" shooting talent?

Any adjustments for "luck" of any kind?

Rewards or penalties for high or low usage beyond the broad mid-range? Linear or step-wise?

Bonuses for major award votes?

Being traded?

Draft pick #? College recruiting rank? I assume some GMs and analytic staffs use them before draft. How much predictive value do they for early career? And by that I mean, for this purpose, predictive for things not in the boxscore right now?

I think with boxscore is fine and very helpful to those who hasn’t have play by play or another tracking system, BPM still rolling the models to make predictions so I believe is pretty good.

I totally agree. There are many applications for Box Plus/Minus beyond the NBA that do not have comprehensive play-by-play data. I want Box Plus/Minus to be as accurate as possible for non-NBA applications.

That is one of the reasons I am looking at improving outlier performance, because in other leagues and situations, as evidenced by the NCAA block issue, there may be more outliers than in the NBA where there are such balanced teams and schedules.

Does this mean you are retreating from the use of "advanced" stats like Reb%, Usg%, Ast% ?
These are pretty simple ratios of raw player totals, and team / opponent totals, minutes adjusted.

ORtg and DRtg are slightly more complex, just including possession estimates.
Either O/D or O-D can be a basis for boxPM, with refinements from other inputs.

Team/Opp. totals and rates are just as boxscore-basic as players'; without them you lose context, and other terms struggle to compensate.

Does a team full of per36 16-7-4 guys have better BPM (before team adjust) than a bunch of 2-dimensional players with the same team/opp aggregate? Should they?

As you say, those basic advanced statistics are just ratios of box score data. Those are absolutely on the table.

As to the last question, I'm not sure what the correct answer should be. The intent is to optimally split up the credit for what actually occurred. This is not a predictive stat as much as an explanatory stat. So I intend the stat to take what actually happened and divide up the credit amongst the players that were on the court.

Should we adjust for luck? I.e. opposing shooting percentages? I know that PIPM does that. I think that is probably too complex for what I want BPM to be.

Point Guards are systematically undervalued
Post players are not estimated as accurately (this could be because more of their value is from defense).

The median DRtg for C and FC is 108, vs the league avg 110.4. It should help to incorporate that stat.

If you can estimate the fraction of a player's points which are unassisted, and give more credit for that, you will help the PG and other "shot creators". Catch-and-shoot guys would get relatively less credit.

Also, it's a pretty straightforward boxscore stat, at either the team or player level, to see how many assists they get on the road vs at home. A Clippers player may have gotten 10% more assists this year (per team FG) than he'd have gotten with the Magic, thanks to home scorekeeping.

(1) If PGs are undervalued in BPM compared to a six-year TOTAL RPM, what positions are correspondingly overvalued?
(2) How does this undervaluation divide between O and D (for all positions)?
(3) Can you provide summary statistics for average (possession weighted? Or minute weighted?) O, D, and T RPM and BPM, overall and by position?

I think one thing that I would suggest is using relative box-score statistics to league average for that season for most of the rate stats. I know in the current formulation you use relative TS% for the team that the player is on. I assume that this is because it's more predictive than using simple rTS% (TS% relative to league average). I don't know that aside from TS% that using team level averages makes sense, but I understand the logic of it in the scoring context specifically. I think Ben Taylor has mentioned having success in his own version of BPM in utilizing numbers relative to league averages.

I also wonder if there's a way to cap your interaction terms like AST*TRB, for instance, at the highest values in the sample set so that if a future season comes along that is an outlier beyond the parameters known to the modeling set that it doesn't break the metric. I think this would have dealt with Westbrook's MVP season being wildly overrated by BPM because his combination of AST% and TRB% was completely unprecedented in the modeling set. Probably something similar for the USG*AST term should be done, as at some point the benefit of "defensive attention" warping the defense sees diminishing returns. You'd have a hard time convincing me that James Harden (40% usg) attracts more attention than Stephen Curry (30% usg), for instance.

I might also suggest using Kevin Pelton's method for estimating assisted versus unassisted baskets to separate out an individual's shot usage based on assisted v. unassisted shots.

While data on assisted field goals is now available at the invaluable 82games.com, to rate players from the pre-82games era on a level playing field, what I have done is used assisted field goal data to create a regression to estimate it using the share of his team's assists the player distributes, the team's assisted field goal percentage, the player's usage percentage, his offensive rebound rate and the percentage of his shot attempts that are three-pointers.