Friday, February 07, 2014

#Reasonator - Johann Sebastian Bach in so many words

When you look at Wikidata, it Is very much like the proverbial glass. It is not even half full but It is filling rapidly. Unlike a liquid, every new bit of data becomes part of a web of data. At that it is more like a telephone network where every new endpoint makes the network more valuable to its users.

Like Wikipedia, Wikidata aspires to help people gain access to the sum of all knowledge. It already knows about more subjects than any Wikipedia but what it knows is incomplete, sometimes even wrong and often not accessible to everyone. Many of the Chinese villages and towns only have Chinese labels. Most of the locations in the USA do not refer to the lowest level of administrative unit they are in. For the towns and cities of so many other countries we do not have any data at all. Sometimes we may know they exist.

Like Wikipedia, Wikidata is a work in progress. Many articles are stubs and for many items there are no statements. Bots operate using list to generate more Wikipedia stubs.  Some clever programming uses all kinds of substitution to create a readable text and often the data in the list is the basis for an info-box as well. That is fine, when you can do that, more power to you.

Creating such texts is something that can be done once or multiple times. Magnus wants a better narrative the Reasonator produces. The beauty of this approach is that more text even better text will be generated when more information becomes available. Better texts will also become available when the routines that generate the texts become more clever.

For now the improved narrative is in English only. Improving the software is an iterative process. The first iteration is just to have something that works, the second iteration is where language specific code is separated so that code for a different language can be used when it becomes available.

It will be interesting to see how popular this functionality will be. For language technology students or professionals it will be a fun project to have a stab at this kind of stub language. What will be interesting is what resources they will ask for in order to make their text really expressive.

No comments: