Sunday, August 17, 2014

#Wikidata - sources or confidence

At this time Wikidata has more than 36,396,372 statements these statements are associated with some 15,335,451 items. The majority of these items have less than five statements and even worse for many items it is not known what they are about.

When you consider the quality of this data, there are two schools of thought. There are those who insist on sources with every statement and, there are those who have confidence in the validity of the data because they know where it came from.

Either way, when you want to assert that a specific approach is superior, it becomes a numbers game and, understanding the relative merits is what it is all about. When something is sourced, you can be confident that it is highly probable at the time of the sourcing. There is however no certainty that the data remains stable. Confidence can be maintained by regularly comparing the data with what the source has to say.

When the data is regularly compared, it does not matter that much if Wikidata has source information itself. The source is typically one of the Wikipedias and they are said to have sources, this may provide us with enough reasons for confidence. The comparison of data increases this confidence particularly when multiple sources prove to be in agreement.

Practically, the basic building blocks to start comparing exist. It has been done before by Amir and he produced long lists of differences. Three things are needed to establish new best practices:
  • a well defined place needs to found where such reports may be found
  • communities need to understand that it raises confidence in their project
Thanks,
   GerardM

No comments: