Sunday, May 27, 2018

#Wikidata - No #copyright on common knowledge

The approach on copyright for text and data is imho utterly different. For your text you seek a reputable source, you cite it. All in all a lot of work.

A proper approach to data is that you seek confirmation on what you already know and it is encouraging when there are many sources that agree on what is common to all of them. When you add new data, typically most of what you care for you will find through links from existing shared data probably in multiple sources. This is not done by hand, too much work, it is done by bot and consequently there is not even some "sweat on the brow".

Arguably, common data exists as common knowledge. It is not proprietary to any one of these sources and consequently claiming copyright let alone a license is at least problematic.

When data is specific to one data source, it is inherently problematic. It may be wrong, particularly when it differs from what other sources state. It follows that there is a need for care before this data is used. You then get into a manual process of reconciling and curating the data, you may even decide to diverge from what all the others say. The confirmation and the creation of new data both is actually research. It is not the using of data from the other source. In my mind this means that there is no burden of copyright applicable.

When data is considered from Wikipedias for Wikidata, the same considerations apply. When you think about it, it is quite bizarre; you take expressions in words and convert it into a qualifier that represent said words, words that can be in any language. Words that may not even be what you see on your screen. The processing of texts may be automated and, it is easy to understand that from the input of all the Wikipedias alone a superset of data is created that is more than any one article. The notion that copyright can be legitimately claimed is problematic at best.

When you take all this on board, and the fact that individual facts cannot be copyrighted, it is obvious to me that the choice for a CC-0 licence for Wikidata is fortunate. A license implies copyright but it is given away with this licence. The claim of the copyright is at best a defensive strategy.

No comments: