Sunday, September 06, 2015

#Wikidata - #StrepHit, the package damaged the message

When a good idea is posted, the message of the announcement can completely blow it away.

First the good news. StrepHit has the potential of becoming a valuable tool for new content for Wikidata. It is all about Natural Language Processing and consequently it is all about harvesting facts from text. The idea is to harvest structured facts and provide references for statements and harvest references for existing statements. This is really welcome, it may prove to be important.

For the bad news, the plan is based on a number of awful assumptions that prevent it from being taken seriously at first glance.

The best thing the authors can do is appreciate that what they are building is a tool. A tool that analyses text, a tool that can be trained to do a good job. A tool that can be integrated with other tools. A tool that is not defined by particular use cases or assumptions.

When it runs in an optimal way, it is much like Kian. It runs and makes changes to Wikidata directly. This week it added 21.426 statements with a very high rate of certainty. Problematic data is identified and lists are created and this is where people are invited to make a difference.

Kian works in the Wiki way, it does its thing and it invites people to collaborate. It does not assume that people have to do this that or the other. Contrast this with StrepHit where the author suggests that people should not be allowed to add statements without references. If that is not enough, it will not even add data to Wikidata but considers the data it generates a "gift" and condemns its data to the "Primary sources tool". It is a sad place where valuable data lingers that is not finding its way into Wikidata.

StrepHit and tools like it may become valuable. Its value will be in a direct relation to how it integrates in other tools.  When it does it will be great, otherwise it will sit in its corner gathering dust.

No comments: