Metrics – A little knowledge can be a dangerous thing (or ‘Why you’re not clever enough to interpret metrics data’)

Published 3 May 2012 3:23 pm

At RedGate Software, I work on a .NET obfuscator called SmartAssembly. Various features of it use a database to store various things (exception reports, name-mappings, etc.) The user is given the option of using either a SQL-Server database (which requires them to have Microsoft SQL Server), or a Microsoft Access MDB file (which requires nothing). MDB is the default option, but power-users soon switch to using a SQL Server database because it offers better performance and data-sharing.

In the fashionable spirit of optimization and metrics, an obvious product-management question is ‘Which is the most popular? SQL Server or MDB?’

We’ve collected data about this fact, using our ‘Feature-Usage-Reporting’ technology (available as part of SmartAssembly) and more recently our ‘Application Metrics’ technology:

Parameter

Number of users

% of total users

Number of sessions

Number of usages

SQL Server

28

19.0

8115

8115

MDB

114

77.6

1449

1449

(As a disclaimer, please note than SmartAssembly has far more than 132 users . This data is just a selection of one build)

So, it would appear that SQL-Server is used by fewer users, but more often. Great.

But here’s why these numbers are useless to me:

Only the original developers understand the data

What does a single ‘usage’ of ‘MDB’ mean? Does this happen once per run? Once per option change? On clicking the ‘Obfuscate Now’ button? When running the command-line version or just from the UI version? Each question could skew the data 10-fold either way, and the answers only known by the developer that instrumented the application in the first place. In other words, only the original developer can interpret the data – product-managers cannot interpret the data unaided.

Most of the data is from uninterested users

About half of people who download and run a free-trial from the internet quit it almost immediately. Only a small fraction use it sufficiently to make informed choices. Since the MDB option is the default one, we don’t know how many of those 114 were people CHOOSING to use the MDB, or how many were JUST HAPPENING to use this MDB default for their 20-second trial.

This is a problem we see across all our metrics: Are people are using X because it’s the default or are they using X because they want to use X? We need to segment the data further – asking what percentage of each percentage meet our criteria for an ‘established user’ or ‘informed user’. You end up spending hours writing sophisticated and dubious SQL queries to segment the data further. Not fun.

You can’t find out why they used this feature

Metrics can answer the when and what, but not the why. Why did people use feature X? If you’re anything like me, you often click on random buttons in unfamiliar applications just to explore the feature-set. If we listened uncritically to metrics at RedGate, we would eliminate the most-important and more-complex features which people actually buy the software for, leaving just big buttons on the main page and the About-Box.

“Ah, that’s interesting!” rather than “Ah, that’s actionable!”

People do love data. Did you know you eat 1201 chickens in a lifetime? But just 4 cows? Interesting, but useless. Often metrics give you a nice number: ’5.8% of users have 3 or more monitors’ . But unless the statistic is both SUPRISING and ACTIONABLE, it’s useless.

Most metrics are collected, reviewed with lots of cooing. and then forgotten. Unless a piece-of-data could change things, it’s useless collecting it.

People get obsessed with significance levels

The first things that lots of people do with this data is do a t-test to get a significance level (“Hey! We know with 99.64% confidence that people prefer SQL Server to MDBs!”) Believe me: other causes of error/misinterpretation in your data are FAR more significant than your t-test could ever comprehend.

Confirmation bias prevents objectivity

If the data appears to match our instinct, we feel satisfied and move on. If it doesn’t, we suspect the data and dig deeper, plummeting down a rabbit-hole of segmentation and filtering until we give-up and move-on. Data is only useful if it can change our preconceptions. Do you trust this dodgy data more than your own understanding, knowledge and intelligence? I don’t.

There’s always multiple plausible ways to interpret/action any data

Let’s say we segment the above data, and get this data:

Post-trial users (i.e. those using a paid version after the 14-day free-trial is over):

To choose an interpretation you need to segment again. And again. And again, and again.

Opting-out is correlated with feature-usage

Metrics tends to be opt-in. This skews the data even further. Between 5% and 30% of people choose to opt-in to metrics (often called ‘customer improvement program’ or something like that). Casual trial-users who are uninterested in your product or company are less likely to opt-in. This group is probably also likely to be MDB users. How much does this skew your data by? Who knows?

It’s not all doom and gloom.

There are some things metrics can answer well.

Environment facts. How many people have 3 monitors? Have Windows 7? Have .NET 4 installed? Have Japanese Windows?

Minor optimizations. Is the text-box big enough for average user-input?

Performance data. How long does our app take to start? How many databases does the average user have on their server?

As you can see, questions about who-the-user-is rather than what-the-user-does are easier to answer and action.

Conclusion

Use SmartAssembly. If not for the metrics (called ‘Feature-Usage-Reporting’), then at least for the obfuscation/error-reporting.