Wednesday, October 09, 2013

Even more heady stuff about #Wikidata and ontologies

In an interview you ask questions and the answers are not yours. When you are as lucky as I have been, you get food for thought but not necessarily answers that close an argument. Given that it is my blog and I am commenting on Wikidata anyway, I decided to add some of my own thoughts as well.

Ontologies in their structure are representations of the world. Wikidata is used by over 280 languages and these represent many more cultures. When one upper ontology finds adoption in Wikidata it will not fit well with many of them. A good example is the conversion following the "main type (GND)" "person" controversy; it has been decided that this will become "instance of" "human" in stead. This means that every person will be identified as a specimen of the species Homo sapiens. This may be technically correct but I am sure that many people will feel alienated as a result.

For me it is very important to keep in mind what Wikidata is there for. It is first and foremost a repository of information that is to be used. The considerations of other data repositories are very much secondary to that. One key objective is to use the information as used in info-boxes. The box used for Ronald Reagan serves well to illustrate a few points. It is indicated that he was both a governor of California and a president of the United States. For both these offices, a start and stop date and a predecessor and successor are indicated. In a classic semantic annotation, each of these dates and other office holders would be on their own. As a consequence it would not be possible to tell what date applies to what and who preceded or succeeded to what office. This information can only be given in a relation and qualifiers are the tool that is available.

In all this "rigidity" is important. Ronald was an actor before he was a governor or a president. This rigidity is however dependent on the community of Wikidata; it is and remains a wiki. The logic associated with an actor or a politician should apply both to Arnold and Ronald in equal measure.

When it is indicated that somebody is a "human", it is obvious that he is born, may die and has a sex. These are obvious inferences and they can be suggested as easy and obvious qualifiers and give them a preferred order of presence. As they are easy and obvious, it should be possible to map information consistently to other resources like DBpedia.

Given that explicit links exist between Wikidata and an increasing number of external resources, it seems obvious that people will find a way to compare data. The big advantage for Wikidata is that it has an explicit purpose that will ensure that its data is going to be used. It is going to be used by the biggest possible public because the aim is to share the sum of all knowledge.

No comments: