Mini-MSFT Suggestions for the Annual Review Process

I’ve been on 2 of the 3 sides of the Microsoft review. I’ve been an individual being graded by the system and I’ve been a lead that has participated in the dreaded “Stack Rank”. I’ve never had the opportunity to see how it works once the review model moves past the leads and into a larger “model”, but I liked Mini-MSFT’s complaints and suggestions for improving the process in his latest post “Microsoft’s 3.0 (or, How I Learned to Stop Worrying and Love The Curve) ”

Mini on the stack ranking process… “A good lead will fight, yell, scream, beg, cajole, and even threaten to get the scores he believes his guys have earned, only to have those scores crapped on by upper management and their curve…Argument against is what if you really have seven 4.0 performers but the model says you can only give three 4.0 review scores? Well, if you are a weanie Mgr. you screw four people over….most get pissed but stay anyways and now join the ranks of disgruntled employees who are no longer passionate about their work. Work product begins to suffer, crappy products get shipped, who cares any more?? If you are a principled Mgr. you take on the system and go to bat for your seven key employees but invariably you will get shut down and most likely commit a career no-no…now you too are inside the bell curve. “

This process clearly does not lend itself to an “all for one” mentality with team leads. 🙂

I really liked his suggestions, particularly the ones that brought back a sense of “team” to the system. I might go so far as to suggest that every review should come back with two scores. One on how you did relative to your peers and a second on how your team has performed relative to it’s peers. The second score, as Mini suggests, could be used to reward/punish teams as a whole. “Don’t ship? Have a bunch of bugs? Customer concerns not being addressed? Security breaches created by this team? Then maybe 50% of your team gets 3.0s and we want some percentage of 2.5s. “ This second score would also help explain to people that earned a lower score why the curve was harsher on their team.

I’ve also seen, as Mini suggests, that the review resolution has been reduced over the years. From the outside it might sound like we have a 1-5 scale. However, most teams only use 3.0, 3.5, and 4.0. Lets stretch out that resolution. Mini suggests a real 100 point scale…

“Increase the resolution on the curve. Instead of our A / B / C simplistic bucketing, bring on the 100 point scale and have a finer curve with appropriate compensation fitting in. Still lavishly reward your super contributors, yes. But don’t go and bugger someone because they fell just within the 3.0 line. A 79’s compensation should be very close to an 80’s. A 70 would then be a strong message that we think you’re just squeaking by. “

I’m concerned this would lead to too much debate over a few points that are hard to determine in apples to oranges comparisons, but it would be nice if we encouraged the full use of the 1-5 scale.

If you aren’t reading Mini-MSFT yet… you probably should be. Does HR know the Mini-MSFT blog exists? Are they listening? How long will it be around? He (or she?) concludes…

I LOVE this company, but I hate The Curve. This is not how the great teams we do have should be rewarded. I certainly feel that if a morale-busting brain-dead review systems goes on too long, we might find ourselves with barely motivated contributors creating mediocre features that may or may not ship…

The "review score inflation" problem is endemic to these sorts of things. The way the Army dealt with it, for officer ratings, anyway, was to keep a lifetime score record for officers, and view ratings through the lens of the lifetime scoring trends. In other words, if you were a hardass commander and rated everyone 1-3 out of 5, then your "3" would more or less be viewed the same as a "5" from someone who only used 3-5 on a regular basis.

One thing that helps this work for the Army, however, is the fact that reviewers tend to stick around for the really long term. There aren’t many 3-4 year career hops between different militaries. Of course, the effect of a negative officer review in the military is pretty much "end of your entire life’s career," which makes reviewers pretty reluctant to hand them out.

An enormous flaw with this system is that it has numbers involved at all. I’m fine with reducing complex behaviours to simple metrics — there’s no other way to manage this complexity. But as soon as you make the categories numbers, all hell breaks loose. Making the system more granular by introducing MORE meaningless numbers is a terrible idea.

Sure, ditching the numbers would be another alternative, but in a world with numbers the resolution problem is a bad thing precisely because of how teams use 3.0 to either mean… your doing not so well or you are just average. There is not enough use of the distinction between the two statements.