Tuesday, July 14, 2015

#Wikidata - appeasing the #Wikipedia crowd

To understand #quality, you have to make assumptions. For Wikipedians, quality is in the source. This source is a statement somewhere that a given fact is true. This makes sense in the text based Wikipedia.

One problem Wikidata faces is that many Wikipedians are sceptical about the quality of the information at Wikidata. "There are no sources" is the oft repeated mantra. Who checked the fact, who approved... Why should we trust the information? This resulted in a culture at Wikidata where "because of the Wikipedians" things that should be obvious are no longer obvious. When the Freebase data is imported, it is not imported it is kept in purgatory, waiting for someone to say OK.

The consequence is an epic fail when you use tools to add information. Data that is in purgatory is not seen by the tools and consequently it is added again, without the sources and without the additional qualifiers known in purgatory.

Google and the Freebase community spend an inordinate amount of time and effort on making this information right. It was used by Google in its products and WHY do we need to doubt its quality? Why should we think ourselves superior just because of the uneasy sentiments of Wikipedians?

The results are an epic fail because;
  • the newly added information is not seen in tools
  • there is no way to automatically compare it with other sources and thereby verify it
  • it demonstrates mistrust where it is not needed.
When we want to import data from external sources, it makes sense to verify its quality. However, it is best done at the gate, before information is added to Wikidata. When a source is of such a quality that we want it, we should import it. Just like we did for all the settlements in China, Just like we are doing all the time from incomplete data from many Wikipedias.

We should do better and when we forget about Wikipedians for a moment, we know better.
Post a Comment