A data mess outduels the pie-chart disaster for our attention

The background: a smartphone monitoring company Crittercism compiled data on the frequency of app crashes by version of mobile operating systems (Android or Apple iOS). The data is converted into proportions adding to 100%.

If we spend our time trying to figure out the logic behind the ordering and placing of the data (e.g. why iOS is split on both sides? why pieces are not sorted by size?), we will miss the graver problem with this chart - the underlying data.

***

Here is a long list of potential issues:

Crittercism sells app monitoring tools for app developers. Presumably this is how it is able to count app crashes. But who are their customers? Are they a representative set of the universe of apps? Do we even know the proportion of Android/iOS apps being monitored?

There is reason to believe that the customer set is not representative. One would guess that more crash-prone apps are more likely to have a need for monitoring. Also, is Apple a customer? Given that Apple has many highly popular apps on iOS, omission of these will make the data useless.

The data wasn't adjusted for the popularity of apps. It's very misleading to count app crashes without understanding how many times the app has been opened. This is the same fallacy as making conclusions about flight safety based on the list of fatal plane accidents; the millions of flights that complete without incident provide lots of information! (See Chapter 5 of my book for a discussion of this.)

The data has severe survivorship bias. The blog poster even mentions this problem but adopts the attitude that such disclosure somehow suffices to render useless data acceptable. More recent releases are more prone to crashes just because they are newer. If a particular OS release is particularly prone to app crashes, then we expect a higher proportion of users to have upgraded to newer releases. Thus, older releases will always look less crash-prone, partly because more bugs have been fixed, and partly because of decisions by users to switch out. iOS is the older operating system, and so there are more versions of it being used.

How is a "crash" defined? I don't know anything about Android crashes. But my experience with PC operating systems is that each one has different crash characteristics. I suspect that an Android crash may not be the same as an iOS crash.

How many apps and how many users were included in these statistics? Specifying the sample size is fundamental to any such presentation.

Given the many problems related to timing as described above, one has to be careful when generalizing with data that only span two weeks in December.

There are other smartphone OS being used out there. If those are omitted, then we can't have a proportion that adds up to 100% unless those other operating systems never have app crashes.

***

How to fix this mess? One should start with the right metric, which is the crash rate, that is, the number of crashes divided by the number of app starts. Then, make sure the set of apps being tracked is representative of the universe of apps out there (in terms of popularity).

Some sort of time matching is needed. Perhaps trace the change in crash rate over time for each version of each OS. Superimpose these curves, with the time axis measuring time since first release. Most likely, this is the kind of problem that requires building a statistical model because multiple factors are at play.

Finally, I'd argue that the question being posed is better answered using good old-fashioned customer surveys collecting subjective opinion ("how many crashes occurred this past week?" or "rate crash performance"). Yes, this is a shocker: a properly-designed small-scale survey will beat a massive-scale observational data set with known and unknown biases. You may agree with me if you agree that we should care about the perception of crash severity by users, not the "true" number of crashes. (That's covered in Chapter 1 of my book.)