Sunday, April 05, 2015

#Wikidata - #statistics

This chart is really interesting; it was made from the last dump (2015-03-30). Item numbers are subdivided into intervals of 100,000. So for instance (X,Y) = (192,73720) represents the interval [Q19200000, Q19299999] having 73,720 out of 100,000 items without a single claim.

It is interesting because it indicates to me that items are imported in batches. As the items in those batches are similar, it follows that even though many statements are added all the time, these items do not get similar attention.

Reasonator has a "Random item" feature and I used it to get a feeling what items do not have any statements. It seems to me that they are mostly items with a sitelink to "another" language and often they are about subjects that are in a relatively small class.

The items with a sitelink to "another" language are probably the most problematic. It is realistic to expect that many of them could be linked to another item when it was clear what it is the item is about. For that you need people who know the language and/or you need a way to figure things out using tools like Kian.

No comments: