Tuesday, December 18, 2007

Inter operability is important

Wikipedia and particularly the English language Wikipedia is a rich resource of information. The amount of information in it is staggering. Much of the information is duplicated in other Wikipedias and other websites. This is great. Because with more applications for the same data, more eye balls will find what is in error.

I am subscribed to the DBpedia mailing list and today I read about errors in Wikipedia that had to do with Wigan and Manchester City. Errors were found and the gentleman wrote that he can and will make the necessary updates. His question is when will the DBpedia reflect the changes.

When the data of Wikipedia is analysed with tools, and when the results are found to be of value, it adds relevance to what enables this collaboration. It typically relies on the availability of dumps. When the data is analysed, a new work emerges. When it has a completely different format, it is possible to mesh it with other data sources. This in turn will help establish the validity of the Wikipedia data and will allow for the extension of the data.

When multiple data sources are meshed, the issue of copyright and license raise their ugly head. You can create static and dynamic meshes. In a dynamic mesh you can build the mesh depending on what the person has access to. In a static mesh you can only include the data that is still available to the least privileged person who will get access to the data.

The consequence is that many people, organisations will mesh sources, manipulate data, publish and not indicate what all the sources are. They will not do that because they do not want to be bound by all kinds of licenses and because they do not want to be hassled.

This DBpedia example shows that the presentation of facts is important. It demonstrates that interoperability will result in a better Wikipedia. It is important for Wikipedia to be as open and engaging as it can be. Frankly, when people analyse our data in a similar way to DBpedia, it is a new work it should not be considered derivative. Best practice is to publish sources and this, more then the viral nature of a license like the GFDL or CC-by-sa, will drive collaboration and give Wikipedia more relevance.

Post a Comment