Sunday, October 20, 2013

#Statistics; beware of the long tail ..

The start of #Wikidata was by including the interwiki links. It takes at least two articles to have such a link in the old system so it is weird when the majority of items in Wikidata have only one link.

The numbers of labels is closely related to the number of links; one label is typically added when a link is created. So when you look at the growth of the labels, this means if follows that the articles with no interwiki links have been added as well.

Byrial created some really interesting statistics that are informative and useful. For many languages the number of links is higher than the number of labels. When the associated article names are added as labels, this will probably swell the "fat head" more than the "long tail" in the distribution of labels.

Wikidata becomes useful for a language when the high demand search items are included as labels. We need to know what people are looking for and fail to find. There are two categories; the red links and the failed search items.

It will be interesting to learn if the WMF statistics department knows where our search fails. When they do we know not only what labels to add but also what articles to write.

No comments: