Tuesday, February 17, 2009

Domesday scenario

How do you call a project with over a million people collaborating on information about their country. A project that came to a successful end. A project with many units sold? A success.

What do you call the same project when twenty one years later only two working systems are left ?

The Domesday project is very much a project of its day and as a generation of British school children were involved, it has had a lot of attention to make the data available again.

There are lessons to be learned. Some of them seem to be obvious in our open content world. They seem to be obvious because we insist on Open Source and the use of licenses that are considered to be "free". There is however more to it. There are also the standards underlying the data. The basic standard we use is text, text expressed in Unicode. This standard is not perfect because some of the languages supported in the WMF do have characters not yet supported in Unicode.

In this text we often express information in a structured way. As long as it is seen as HTML, I can read it. When I look at the Wiki syntax I am lost. When people datamine Wikipedia, special software is written to parse these infoboxes and tables. The result is DBpedia and the DBpedia community does a great job.

The point is that it does not have to be this way. Were we to adopt Semantic MediaWiki for Wikipedia, we would adopt Open Standards that enable us to present our data in a way that is understood by other computers. This will help us achieve our goal of providing information to people because our data will be used to provide a better understanding. In this way we open up our data in a way that was not possible at the time when either of the Domesday books were written. We would open up our data and make it truly free because we make it available for innovative applications.
Post a Comment