Sunday, January 20, 2019

@Wikidata - #Quality in a #Wiki environment

What quality is, quality in a data environment has been studied often enough. Lots of words are spend about it but one notion is always left out. What is data quality in a Wiki environment. How does that translate to Wikidata.

First of all; Wikidata serves many purposes. The initial purpose of Wikidata was to replace the in-article "interwiki" links. They were notoriously difficult to maintain, often wrong. A single Wikidata item replaced the links for a subject in all Wikipedias and this brought stability and a high level of confidence in the result. Over time the quality of the "interwiki' links went down; there are fewer people involved adding and curating these links and it is seen as a quality issue when new items are generated for new articles; they do not have statements and are often not linked. There have been protests against these new additions.

A second purpose is the use of Wikidata statements in Wikipedia templates. Assessing data quality becomes complicated as there are micro, mesa and macro levels of quality at play. The micro level: is sufficient data available for one template in one Wikipedia article. The mesa level: is sufficient data available for one template in Wikipedia articles on the same topic. The macro level: is the same data available for all interested Wikipedias and do we have the required labels in those languages.

Quality considerations are driven by this approach. On a micro level you want all awards for a scientist to be linked on an item. On the mesa level you want all recipients of an award to be linked to their item. On the macro level you want all awards to have labels in the language of a Wikipedia and have all local considerations been met.

Standard quality considerations in a Wiki environment are not helpful; they are judgemental. People contribute to Wikidata and all have their own purposes. A Wiki is a work in progress and when quality assessments are to be performed, the question should focus on the extend a specific function is supported. What people seek in support also changes; as long as there was no article for professor Angela Byars-Winston it was fine only to know about her for one publication. Now that Jess Wade picked her for an article, it may be relevant that she is the first and so far only person known to Wikidata who was a "champion of change" and that more papers are identified for her.

Wikidata includes many references to scientific papers and authors. However, so far it serves no purpose. Allegedly there is a process underway that imports papers used as citations in the Wikipedias but it is not clear what papers are used in what Wikipedia article. So far it is a big stamp collection, a collection with a rapidly growing quality. A collection that highlights authors who are open about their work and who share the details of their work at ORCID. In effect, this data set indicates that the relevance of a scientist improves by being open.

Wikidata invites people to add/curate the data that is of interest to them. Particularly the esoteric data, data about subjects like African geography, Islamic history need a lot of tender loving care. It is where Wikidata and the large Wikipedias are weak. For as long as Wikidata is largely defined by the large Wikipedias it will reflect the same biases and these biases will be hard to assess and curate.
Thanks,
      GerardM

No comments: