Archives

Categories

Meta

On “normalizing” data

I was teaching a class today on the visual analysis of data, which included using my favorite teaching aid, the statapult (it’s a miniature catapult, in case you haven’t seen one.) We were talking about how to effectively present information and I was on a tangent about why simply counting defects was a bad idea. I explained to the class that just counting defects ignores the “opportunity” for a defect. Simply put, it’s common sense that if you deliver a bigger project you will deliver more defects (all other things being equal.) So, how you measure the size of the opportunity matters. You can’t be completely arbitrary with it. Your choice matters.

The statapult provides a good chance to showcase what it means to normalize data. The adjustment on the statapult for fine control is the pullback angle. So I had the class try a bunch of pullback angles and measure the distance of the result. I then had them divide the distance by the pullback angle. This “normalizes” the data to inches per degree and you can easily see that value is consistent for all levels of pullback angles.

But, you can’t normalize the distance by dividing by the number of people on the team. It fails to serve the purpose of correcting for anything that had to do with the statapult. One person can fire it, or a team of three or five or even more. So dividing distance by people is arbitrary at best.

And yet, while that makes perfect sense in a classroom setting, we often fail to do anything as sensible in the real world. In order to be a good normalizing factor, the divisor must be correlated to the numerator. Calculating defects per function point makes sense. Calculating defects per developer probably doesn’t (because developers can work for many months on something, just counting the team size is arbitrary.)

Don’t be arbitrary with normalizing data. Dividing one value by another random value does not result in a sensible calculation.