Thursday, December 05, 2013

For the love of #Statistics - Get the numbers, right?

In statistics, results depend on the data they are based on. The data is selected in many ways but it starts with the questions: "what is it that we want to know" and then "when we have this information what is it that it tells us".

Consequently, when it is the considered opinion that Wikidata will be mainly be used by bots, its page views are not relevant. When Commons is considered to be the repository for use in WMF projects, we do not consider downloads and other usage from outside the Wikimedia Foundation.

When we change these assumption, we will evaluate existing data differently and end up with statistics that will be different. Our own reports show where our assumptions result in a problematic representation of how things are. The "Report card" for instance does not include Wikidata as one of the biggest projects when we measure individual contributions.

An increasing number of Wikipedias are using Wikidata to find information be it Wikidata based, Commons based or even Wikipedia based. Initial reports are that people like it. We don't know how often this functionality is used. We do not know to what extend people are adding statements or labels in order to improve the disambiguation. We do not know how many people reach Commons and find an image they are looking for and download for their own use.

We do not know to what extend Wikidata provides the only results when searching a Wikipedia. We do not even know what people are looking for and fail to find.

Statistics are meant to be actionable. It is not that hard to change the existing software and identify that pageviews were the result of Wikidata based functionality. It makes sense to do this when the data is considered useful. The best motivation to do this is when statistics are given added importance when the results are actionable.

This is why we want these numbers and improved statistics. This is why we need the WMF statisticians to share this journey with us.
Thanks,
GerardM