Thursday, May 01, 2014

#Wikidata - #statistics; adding #human

The latest statistics are available. Many new items (445,778 or 3%) were created in Wikidata and the percentage of items without any statement rose. The challenge is to bring their number down from 4,946,696. For these items we only know there is an article about them.

There are two approaches to it:
  • add statements
  • merge items
In order to merge items, we have to know things about them. So adding statements is a great first strategy. Many articles are about humans so what is more obvious than stating that they are human when there is a reason to think so. A similar process is under way on the Italian Wikipedia where basic information about people is added including the fact that they are human.

The statistics are based on a dump from April 20th and since then some 176,729 humans have been identified. As more information gets added, it becomes possible to guess that items may be the same. For instance they are both about humans with the same name that died on the same date. Or had the same occupation. Or are in similar categories. Maybe they have a strikingly similar "concept cloud".. 

The likelihood of missing inter language links is quite high. Obviously with Wikidata this has improved a lot but the number of items with only one link, 9,028,632, is high and it is rising. The question is very much what mechanisms we can come up to find the items we should merge.

No comments: