Wednesday, July 28, 2010

Failing #statistics (finally)

Now that Erik Zachte announced issues with the statistics as published, it is a good moment to reflect. It is the aim of the Wikimedia Foundation to double its reach in five years time, doubling our traffic. The expected result is expressed numerically and consequently we require hard numbers.

There are numbers of Wikimedia's traffic by the likes of Alexa and comScore, so there are alternative numbers providing us with a second opinion. Their numbers while good are no alternative for the numbers needed for our own purposes.

The numbers are used in many ways and for many audiences. They are important for the GLAM's that contributed material to us. These same numbers provide the arguments to other GLAMs to work with us. They are used to learn how a competition is doing. They provide background numbers when we talk to the press on many subjects.

Our statistics are vital. When I asked for a slot for a panel discussion at Wikimania about statistics, the numbers ended up being quite different. I am now at a loss how to appreciate the numbers we have. I understand that some statistics will be approximated to what they should have been. Other numbers will not receive such royal treatment.

This mishap is painful and I really hope it is felt that way. As we have several people working professionally on statistics, as many studies are done based on the numbers we provide, as the Toolserver is another resource that relies heavily on us accruing the right numbers, it is fair to call statistics one of our primary processes.

For our other databases we have redundancies, I hope that we will learn from those responsible for the accumulation of data that our statistics are based upon how our data collection will be made more robust in the future.
Thanks,
       GerardM
Post a Comment