Tuesday, December 24, 2013

#Wikidata - Statements per item

The two best #statistics for Wikidata show the statements per item and the labels per item. They are relevant because each in their unique way indicate the usefulness of Wikidata. Items become useful when they are related to other items and as more items get connected they make up a network of knowledge. The network is still immature as most items have none of only one statement connecting.

The items that are getting the most tender love and care are probably the ones that people are most interested in. At this time it is blind luck when these most popular items get the attention they need and become part of the best connected. Certainly the items that are most popular with the Wikidata contributors do get more connections. People similar to the Wikidata contributors will love what gets done in Wikidata. As our community becomes more diverse, the Wikidata data will become more diverse as well.

Who to recruit, what are we missing ... where is the data that allows us to be data driven?

The "labels per item" indicate how many languages have a word for an item. At this time most items are known to only one language. Many items have or should have additional labels; this is particularly useful when someone is known by multiple names.

The most popular subjects like countries and heads of state typically have over 100 links. Consequently they have over 100 labels because each article name does serve as a default label. A bot runs regularly to make sure that there is a label for each link.

You will appreciate that items with more than 10 labels are the best known subjects. They are likely the most sought after subjects as well. All other subjects need labels in many more languages. Among them are the ones that indicate cultural bias. Items with labels in languages like Farsi, Hebrew, Georgian, Chinese only are the ones that I fail to give labels in Dutch. There is a need to use dictionaries to populate Wikidata with more labels. This is still a new frontier.

One function of the "concept cloud" is to find related items and their labels in "your" language. It is probably the best way of adding missing labels to relevant items.

What we need is more people adding labels in more languages. We have some tools to keep them occupied. Knowing what items are most used and sought after is what we do not know. They are probably the missing items that have the most impact.

Again,  where is the data that allows us to be data driven in optimising Wikidata?

No comments: