Thursday, June 05, 2008

Lies, damned lies and statistics

Statistics are one of the more important tools to inform about how things develop. The Wikipedia statistics are wonderful they are informative, they show trends. Sadly the numbers for the English Wikipedia are out of date because the number crunching requires bigger hardware..

The problem with statistics is in their interpretation. When numbers are given an implication that is not warranted, the consequences are proverbially based on lies even damned lies. The number of articles is often seen as all important; even the Wikipedia portal is based on these numbers.

What the Wikimedia Foundation is about is providing people with information; encyclopaedic information for Wikipedia. So when you want to rate the relevance of the wikipedias, it is much more relevant to know how many people actually use a Wikipedia and rate the projects accordingly.

Once it is decided that the relevance of a Wikipedia is in the number of people actually reading a Wikipedia, the fight for a high ranking becomes the kind of fight that we welcome. In this way the number of articles becomes only a factor in attracting more people. When the current emphasis on raw numbers of articles changes in the numbers of visitors, the current energy in improving the statistics will take a change for the better. People will want to understand what factors help gain more visitors, things like relevance of articles, quality of localisation. These two factors can be quantified as well. They are respectively the number of articles read and the articles that are not available and the localisation statistics.

We have had some bad fights as a consequence of the way the current statistics are understood. When we can agree that the current numbers are not really relevant for achieving our goals, we should turn to the numbers that do matter.
Thanks,
GerardM

2 comments:

Waldir said...

This topis is exactly what has been discussed for a while on meta. The discussion was coming towards a generalized consensus that the amount of traffic each wikipedia receives would be a better indicator for ranking them in the portal; However, the discussion died off precisely when a proposal for a vote was made, which would include opinions gathered from all language editions of wikipedia, to validate the discussion's conclusions. If you are so inclined, I'd love to have your collaboration in restarting the discussion to move forward with the vote. So, what do you say?

Anonymous said...

I cannot agree more. With FritzpollBot any Wikipedia language project can have 2 million articles soon. Comparison of wikipedias by number of articles has become an anachronism.

Even better in the long run might be: number of readers per time unit that flag an article as satisfactory or better.