Thursday, June 05, 2008

Lies, damned lies and statistics

Statistics are one of the more important tools to inform about how things develop. The Wikipedia statistics are wonderful they are informative, they show trends. Sadly the numbers for the English Wikipedia are out of date because the number crunching requires bigger hardware..

The problem with statistics is in their interpretation. When numbers are given an implication that is not warranted, the consequences are proverbially based on lies even damned lies. The number of articles is often seen as all important; even the Wikipedia portal is based on these numbers.

What the Wikimedia Foundation is about is providing people with information; encyclopaedic information for Wikipedia. So when you want to rate the relevance of the wikipedias, it is much more relevant to know how many people actually use a Wikipedia and rate the projects accordingly.

Once it is decided that the relevance of a Wikipedia is in the number of people actually reading a Wikipedia, the fight for a high ranking becomes the kind of fight that we welcome. In this way the number of articles becomes only a factor in attracting more people. When the current emphasis on raw numbers of articles changes in the numbers of visitors, the current energy in improving the statistics will take a change for the better. People will want to understand what factors help gain more visitors, things like relevance of articles, quality of localisation. These two factors can be quantified as well. They are respectively the number of articles read and the articles that are not available and the localisation statistics.

We have had some bad fights as a consequence of the way the current statistics are understood. When we can agree that the current numbers are not really relevant for achieving our goals, we should turn to the numbers that do matter.
