Tuesday, May 21, 2013

Most of the data in #Wikidata is curated

I got into an argument. Wikidata it was said should not exist because its data is overwhelmingly in need of curation. It is an argument I have seen before and I positively hate it. First of all, it is not true and second of all many people just do not get what Wikidata is about.

The bulk of the information of Wikidata is replacing the old interwiki data. Good riddance to old unwieldy and hard to maintain data. Everybody who used to be involved in the interwiki connections is now involved in Wikidata. This means that there is an existing community doing the same old thing but in a more efficient way.

Much of the information that is accumulating in Wikidata is data imported from other sources. When the German Wikipedia has an article on a particular person, it is linked to an external source identifying this person with a number. This external source has a lot of data that may find its way into Wikidata. Data that has been researched in the past for its validity. When data is imported from such trusted sources, it saves our community from adding other data.

The data in Wikidata is licensed with a CC0 license. Anybody may use it. As data not present in the other sources finds its way into Wikidata, there will be people who feel strongly that this data has to be present and has to be correct. When you feel strongly about a specific category of knowledge, you can organise a workshop to add data and find sources to back up what you claim to be true.

This is what I do.

I strongly believe that information boxes that use data that is not from Wikidata is foolish. The only valid argument to have them is because Wikidata does not include all the attributes that are necessary. This argument is becoming more and more irrelevant as time goes on.
Post a Comment