Wednesday, December 09, 2015

#Wikipedia Signpost? Yeah right!

First have a read of the op-ed on the Wikipedia Signpost. Then come back.

I love its notion of "one ring that binds them all", they apparently did not watch the movie; the ring dissolved in Mount Doom and that is the end of it.

The article suggests a lot of things, it is all based on what others had to say and it asserts that the quality of Wikidata is bad. This is only based on the assumption that because there are so little sources it must be bad.  In the whole article it is not acknowledged that Wikidata is a wiki and consequently its implication is ignored. There is no notion that quality is anything but "it has a source" and doom is predicted not only by using the "one ring" but also in "the tower of Babel" as an analogy. For me it qualifies as FUD.

When you strip away all that has nothing to do with Wikidata and its quality, there is not much left. There is no definition of quality, there is no notion of the current quality, it is suggested that it will get worse but not why. It is a sad piece of prose pretending to have an answer. It is all about elsewhere and others.

Quality can be many things. It can be an error rating in percentages, it can exist in comparison to what other sources hold, it can be in the way errors are dealt with, it can be in the completeness of the data. It can be in the way you connect to others and in how you deal with the work of others. It can be in how your data is considered by others. Finally there is this most restricted form of quality where everything has to be perfect. Quality of this type is so far away from what a wiki stands for that it only deserves a shrug, it is patently foolish even when it is what we aspire to. Ironically when Wikidata came into being, it started as an important quality improvement for Wikipedia. It largely solved its interwiki mess and made it manageable.

When people add data to any project, they make mistakes. This has been studied a lot and depending on the manner of the edit, quality is up or down. When people or processes check on what has been done some of the errors are easily spotted and remedied. This is one way of improving quality. Once data is fairly complete, it is the context of the other data that gives an indication of the likelihood of new edits. It is for instance unlikely that an American Football player received an international refugee award.

Wikidata is severely incomplete. The statistics show that things are improving. The biggest problem is with the number of items that are not identified for what they are about. When this is known a lot of additional work can be done.

One of the qualities of Wikidata is that people already find an application for it. VIAF, the database of the OCLC, for instance links to Wikidata in preference to the English Wikipedia. In this way they link books and authors in any language to all Wikipedias.

Originally, Wikipedia had operational values like "be bold". It was ok to stand on the shoulders of giants and make incremental changes or wait for other giants to continue where you stopped. It is these values that enabled the growth of Wikipedia. As a wiki, Wikipedia has degenerated and it now has policy wonks determined to impose their notion of quality. Wikidata is immature and it needs to fight for remaining a wiki. If anything it is this anti wiki sentiment that is holding Wikidata back.

When Wikidata is to have more quality, what can we do. We can import more data for instance from Freebase. Like in Wikidata, volunteers have spend considerable time adding data. It has been a sincere effort and it deserves appreciation. It may have its issues but these are the problems we have to deal with anyway. By doing this we finally reach out to the people from Freebase and recognise their effort. The least we can do is recognise their contributions that make it into Wikidata by citing Freebase as the source.

We can compare data from other sources. This is the most obvious way of finding what may be in error.  As a rule, the differences are where most likely an error can be found. This is where it makes sense to go find a source to identify what is likely correct. This is where collaboration from Wikipedians is helpful because the source for Wikidata is most likely a Wikipedia. Adding sources while curating differences is most effective and makes a real difference.

The argument for adding data is easily explained from set theory. When there is no data to show, we have a 100% failure. When new data is 90% correct and when when we have processes in place to compare sources, we have a 90% improvement and work to do. When of the 10% we can identify 80% as statements with issues, we can flag those and work on improving the data.

The beauty of Wikidata is that is may be used in many languages. This is not without its issues but as a consequence, any item in Wikidata can be found in for instance the Tamil Wikipedia. By only adding the label in Tamil the information may be nicely presented in Reasonator. In this way it is easy to start to build information building in a language and consequently work towards a Wikipedia article.

Given that Wikidata is based so much on Wikipedia data, it is obvious that Wikipedia has most to gain from quality improvements in Wikidata. Those American Football players are still in the Wikipedia article, there is more information in Wikidata for all the red linked winners. There are 14 Wikipedias that could already benefit from the data at Wikidata and for other Wikipedias it makes it easier to include this award because there is no longer this maintenance requirement.

The same is true for mayors, is the mayor or your town correct on every Wikipedia? Does it show who he or she is on every Wikipedia? It is not for mine.. Is the number of inhabitants still correct? Wikipedia has a problem with such facts, there are too many of them and given that facts may exist in 290-ish Wikipedias who is going to do it all for all the cities, municipalities wherever in the world?

What Wikidata has to offer is collaboration. It is growing fast, its data is improving constantly and arguably as more data becomes available, more eyes will see what is good and what is bad. As our tools become increasingly sophisticated, it is not that we do not people to make a difference it is that we are increasingly able to point to them how they can make a difference. This will increasingly make Wikidata a place where we improve the sum of all available knowledge and share it widely.

No comments: