Good metrics vs bad measurement

My former colleague Chris Moran has lots of sensible things to say about what makes a good metric, as do the many people he’s enlisted in that linked post to talk about the characteristics they value in measurement. I wanted to build on one of them: the capacity for people to actually use the metric to change something.

Functionally, plenty of things aren’t easy to measure. Some are almost impossible — much as Chris says, I have lost count of the number of blisteringly smart people working out how to measure things like quality or impact when it comes to journalism. Anything that involves qualitative surveys is probably too high cost for a small project. Anything that requires you to implement completely new analytics software is unlikely to be valuable unless it’s genuinely transformative (and even then, you risk the business equivalent of redesigning your revision timetable rather than actually revising). Anything that relies on people giving an unbiased assessment of the value of their work — like asking editors to assign an “importance” score to a story, say, or Google’s now-defunct “standout” meta tag — is doomed to failure, because an individual can’t accurately assess the relative nature of their work in the context of the whole system. Key point from Chris’s post: if you were going to game your measure, how would you game it? Do you trust everyone involved to act purely in the interests of good data, even when that gets in the way of their own self-interest?

In one team I managed, I once ran an OKR that focused on making sure we were known and appropriately involved as internal experts by the rest of the organisation. We discussed how to measure that, and ended up deciding that we’d know if we were succeeding based on the number of surprises that happened to us in a week. We were accustomed to finding out about projects too late for us to really be helpful — and, to a lesser extent, we were finding that our work sometimes surprised other people who’d benefit from being involved earlier on.

How do you measure surprises? We could have spent weeks working that one out. But for the sake of just getting on with it, we built a Google form with three inputs: what’s the date, who was surprised, who did the surprising. Team leads took the responsibility of filling in the form when it happened. That’s all you really need in order to know roughly what’s going on, and in order to track the trajectory of a metric like that. But because we measured it — really, honestly, mostly because we talked about it every week as a measure of whether we were doing our best work, and that led to thinking about how we could change it, which led to action— it improved.

Conversely, if you don’t care about something you measure, it’s almost certainly not going to change at all. If you spend enormous organisational energy and effort agreeing and creating a single unified metric for loyalty, say, but then you don’t mention it in any meetings or use it to define success or make any decisions about your products or your output… why bother measuring it at all? Data in isolation is just noise. What matters is what you use it for.

So if you’re going to actually make decisions about quality, or impact, or loyalty, or surprises, the key isn’t to spend ages defining a perfect metric. It’s getting 80% of the way there with as little effort as you can pull off, and then doing the work. It means working out what information you (or your teams, or your editors, or your leaders) don’t have right now that they need in order to make those decisions. Then finding a reasonable, rational, most-of-the-way-there metric you can use that unblocks those decisions. Eventually you might find you need a better measure because the granularity or the complexity of the decisions has changed. But you might equally find that you don’t really need anything other than your first sketch, because the real value is in the conversations it prompts and the change in the output that happens as a result. Precision tends to be what data scientists naturally want to prioritise, but it’s usually missing the point.