Saturday, September 12, 2015

#Wikidata's embarrassment of riches

Wikidata is improving its content constantly. Proof may be found in people pointing to issues and the follow up it generates. They add data, change data and remove data; Wikidata is better for it.

With the official Wikidata Query being live, it is even easier for people who understand SPARQL to query, compare and comment on Wikidata's content. As mentioned before, it is in the comparison of data that it is easiest to improve both quality and quantity.

For this reason it is an embarrassment how a rich resource that is Freebase is treated; it might as well not exist. It lingers in the "primary sources tool" a lot of well intentioned work is done. In Q3/2015 there may even be a workflow to include even more data in there.

Probably, this tool is only relevant for static data and, that is not necessarily the best. Actively maintained data is much to be preferred.  When I understand things well, people may tinker with it in this data dungeon and it is then for the "community" to decide upon inclusion in Wikidata. It is not obvious what its arguments could be. It is not even obvious how any data will compare to the quality of Wikidata itself. Its quality is not quantified for quality either.

Once data is included, there are many ways to curate the data. It is done by comparing it against other sources. It is obviously a wiki way because it invites people to collaborate.

