Saturday, November 26, 2016

The problem with #science explained with #Wikipedia

It is a recurring theme. People study a subject and reality is different. The science is flawless, the results are impressive and indeed important strides are made forward. The study of heart disease is a great example; many studies resulted in an improved life expectancy for men. Particularly white men. The Dutch Hartstichting is raising funds for new research because of this existing bias in research. For women in the Netherlands, heart disease is the number one killer because heart disease is different in women; it was not noticed before because heart disease in women was not studied.

Wikipedia as it is commonly known in research has the same problem. It is not Wikipedia as we know it, it is English Wikipedia. My contributions to Wikipedia have not been to English Wikipedia; they went to the Dutch Wikipedia and I will not be noticed as one of the most prolific contributors to Wikimedia projects because my contributions to "Wikipedia" are hardly significant..

As I blogged before; scientific papers do not publish when it does not involve English Wikipedia. The consequence is that when people quote research, their quotes include this bias and strictly speaking it is not necessarily true when you consider Wikipedia. The problem with biased research is that the policies of the WMF are based on the known "facts".

Nothing new so far. We all know it when we are honest. So what can we do to remove some of the bias? The first thing is to devalue any and all research that is English Wikipedia only. It only covers less than half of what we do.The second thing is to evaluate research for its algorithms. When both the algorithms and the data are available, it is possible to run the algorithm on a more inclusive data set and check the validity. With the quality of Wikidata data as a source on all the Wikipedias improving, such an approach is increasingly feasible. The last thing is for the Wikimedia Foundation itself to address this bias, With English Wikipedia being less than 50% of its traffic and workflow, it would be good when a similar percentage of its efforts is focused on the bigger half of what we all do.

So what is the harm? We expect all Wikipedians largely to do what "Wikipedians" do. However, we are not all English Wikipedians. The need other people have is not discussed, not taken seriously. We have seen wonderful examples of potential functionality showcased but it is not taken further, not taken in production because it does not fit the preconceived ideas of what we do, it is not part of the road map. The projects in Wikidata are not about Wikidata but about how to make us all in one big data glob and USING the data is only seen in relation to Wikipedia articles. We do not know how much Wikidata is used, some studies are done but they are in relation to "Wikipedia" and that is not relevant to me. We find that Wikisource gains more and more content that may be valuable to our readers but we do not market this data because we never did marketing for Wikipedia. There are several websites that only do this in a way that could be much improved if we took Wikisource seriously.

It hurts us to only consider English Wikipedia and this bias in research and policy is more damaging than the bias that is considered by the English Wikipedians.
Thanks,
       GerardM
Post a Comment