Numbers Lie: Collection Bias

The most common ways numbers will lie to you is through lies of omission. Your metrics might look great, but if there is missing data behind them then they are not telling you the whole story! If your metrics tools don’t collect all the necessary data to help you make informed decisions, I call it Collection Bias.

Lest you think that Collection Bias could never happen to you, consider these common examples:

Website metrics. If you are tracking visitors to your website using popular web analytics tools, chances are you are undercounting your visitors and usage. Why? Many common ad blockers block analytics services as well as ads! One site’s analysis showed that anywhere from 5-25% of visitors might be invisible to hosted web analytics tools due to ad blockers.

Email Opens. If you send emails to your customers, you are likely tracking how many people open your emails (Email Open rate). This tracking is done by inserting a small pixel image into the email and recording every time that image is loaded (the email opened). However, many email clients will block or inhibit these tracking pixels! Gmail, for example, will only load the pixel once even if the user opens the email many times which can lead to drastic under counting.

There are many more examples ranging from implementation bugs to database integrity problems. The reality is that it’s unlikely that you don’t have some kind of collection issue in your metrics systems.

Wow, that is scary

Yes, I agree. The good news is that if you assume that Collection Bias exists you can proactively avoid it using a few simple steps:

Understand your tools. All analytics and metrics tools will disclose collection issues in their documentation. If they don’t, email their support and ask them about known data collection issues so that you are aware of them now.

Test your data like your product. Most companies will test new features of their products and services whenever they make changes. But what about your data? You should get into the habit of testing your metrics and analytics just like you test your features, whenever anything changes.

Avoid collecting everything. The more data you collect, the harder it will be for you to be sure it’s correct. While it is great to track everything, as soon as you find a collection bias problem in one metric your team will view all of your metrics with suspicion! This undermines your efforts to be data driven, so limit your metrics to those you can be sure of their accuracy.