Thursday, April 30, 2009

Je maintiendrai

Today was "Koninginnedag". It is a national holiday in the Netherlands. Today a crazy guy tried to drive his car through a crowd to crash into the open bus that drove our royal family. Four people died.

Our Koninginnedag is a celebration, our queen goes out to meet people in their cities and villages. People celebrate this day everywhere.

What happened today is horrible. But what would be worse is when this kind of crazyness makes it impossible to celebrate Koninginnedag in our traditional way.

Queen Beatrix is loved by the Dutch population, this sad affair makes it almost an act of bravery for her to go out in the future. What I hope for is that we will decide that we cannot be held hostage to this kind of terror. We should not allow terror to dictate our ways. I hope that Koninginnedag will be the traditional Koninginnedag we cherish.

Tuesday, April 28, 2009

Serbian localisation

This is information cropped from the group statistics at Recently a lot of great work has been done for Serbian in  the Cyrrilic script. Hundreds of messages have been localised and this will make a big impact once these messages become available on all the WMF projects.

Serbian is one of the languages where the content is transliterated in either Cyrillic or the Latin script. For all kinds of reasons this does not work for the Mediawiki messages. The technology to do this transliteration exists so I am wondering if it would be possible to have this implemented on

Sunday, April 26, 2009

New functionality on

When people take the time to help out with the localisation at, it is really important for us that this time is spend has a high impact. The other day Siebrand was happy to tell me that when you are localising a message, you can go to the messages for a particular extension. I did not get it. Now I do.

When you start from the language stats for a language like in this example for Welsh, you can select the "MediaWiki extensions used by Wikimedia". This gives you a list of messages that are not sorted in any way and consequently the localisations do not have the impact they could have.

The novelty is that once a message is selected, it is now possible to continue with the other messages for the same extension. Localising messages for the same extension has a much higher impact because a user is likely to find them together.

Thursday, April 23, 2009

Task Tracking with Semantic MediaWiki

I blogged about the first day of an extension new to WMV Subversion that was localised at I found a presentation about Semantic Tasks from a year ago. It is staggering to find all these gems that make MediaWiki an insanely great platform.

Wednesday, April 22, 2009

The World Digital Library

Under the auspices of UNESCO, the World Digital Library has made its début. This project aims to bring the cultural heritage of our world to the Internet. Many great institutions collaborate in this effort.

This resource is likely to become of immense importance to us because it aims to provide unrestricted public access, free access to this material. What I like best about the World Digital Library is that it aims to cover the whole world, all continents, all countries and all cultures. I hope that the WDL will help us to counter the current bias we have in Commons with Western oriented material.

As a present to the World Digital Library at its opening, Durova has restored this picture from Egypt. We hope for a great collaboration with this great new initiative.

Tuesday, April 21, 2009

Google Summer Of Code

Four MediaWiki projects have been accepted in the Google Summer Of Code. Congratulations to the four students and their mentors.

Bleeding edge is MediaWiki on the bleeding edge. When Brion brings out the latest and greatest functionality in MediaWiki, the software has already been localised in many languages. This is because is running the absolutely very latest software.

Sometimes this goes wrong and it has an impact on itself. It clearly demonstrates the need for testing and the need for a critical eye before software goes life.

Monday, April 20, 2009

When you query geo data, you want a geo data result

I have been asked to review proposed talks at Wikimania. One talk was about Semantic MediaWiki and the other about Open Street Map. Both are great subjects for a MediaWiki conference because both add important functionality to MediaWiki.

When you think about it, it would be really cool when the results of a SMW query that results in geo data is presented in a map. At the moment there are two extensions to SMW that do just that; Semantic Layers and Google Geocoder.

Given that much of the primary functionality can be found in either extension, it should be possible to come up with something that would work for Open Street Map. Having support for both SMW and OSM would present a paradigm shift; it would make it easy to mesh Wikipedia in the big data world.

Sunday, April 19, 2009

Payment where payment is due

I was looking for a picture that illustrates "payment" and I found a picture of Judas being paid for his betrayal as one of the top results. It can be argued that this event was essential because without it Jesus would not have died on the cross.

Many people believe that payment is anathema to the Wiki spirit. Payment is believed to be incompatible with a Neutral Point Of View and consequently many people are ready to fight whenever there is even a suggestion of payment.

For me it is obvious that payments are not only made with money. The Wiki coinage is very much acknowledgment, attention, time and reputation. Our best people have a vision on what our world is or should be. They realise their vision because of an intrinisc motivation. This is why an Incubator, a Wiktionary and a independently develop their own policies even software. People invest in these projects much of themselves and their return is first and foremost the growth and the quality of their project.

Sue Gartner shares what she considers relevant in her reader. This post she shared had me thinking about the subject of payment. I think that the WMF is making a great move by opening up the possibility for people to propose projects that may get funding. The financial aspect is important but in reality the most important coin the WMF has is attention, motivation and encouragement. WMF people like Brion are often in a position where they make the difference between something happening or not happening. Their time is a precious commodity because it is so highly valued and in short supply.

As the the time of the WMF staff is considered to be this valuable, improving the effectiveness of their time is the best investment the WMF can make.

Saturday, April 18, 2009

ar, de, dsb, fr, gl, gsw, hsb, ia, ja, ksh, lb, nl, oc, pl

A new extension "Semantic Tasks" became available. Now twenty four hours later, the localisation for fourteen languages has already been completed. In my opinion that is quite special.

Another thing I appreciate is that Semantic Tasks installs nicely in the Wikiation Extension Testing Environment. This is special for me because it is the first Semantic MediaWiki extension that just installs.

Semantic Tasks was developed for the CcTeamSpace is the internal task and project tracking system used at Creative Commons. All these things I learned today and in my opinion all this really demonstrates the relevance and the quality of the MediaWiki ecosystem.


When you write a Wikipedia article, you are expected to provide sources that prove the assertions made in the article. There are good reasons to do this, for articles it is not only considered "best practice" but even essential. Current practice on many Wikipedias is that articles that do not provide reliable sources may be put forward for deletion.

A picture proverbially paints a thousand words and when a thousand words in Wikipedia exists without any citations the least you can expect is a template indicating that citation is needed.

The illustration can be seen as an advertisement for what to wear to prevent troll troubles. It is however a picture that comes with excellent provenance; the original picture is referred to and urges American ladies to save their country like Joan of Arc did for France.

Both the original picture and the derivation are properly sourced and consequently you have the information you need to appreciate these pictures for what they are. This can be exceedingly important because it is the provenance that allow pictures to express their thousand words.

Falsifications are not new, and they exist for a purpose typically profit and propaganda. When you look up the word provenance, you find two aspects that are of relevance; the origin of something and the history of the ownership of something. Both are relevant to the illustrations that we use. Practically it would be wonderful when we always refer to where the original material can be found. It is then for the museums, archives and libraries to provide the complete provenance of the material we use as illustrations.

In my opinion, we should always indicate where the originals of our illustrations can be found. This has nothing to do with copyright or licensing and everything to do with us providing information that can be trusted.

PS we take orders for troll apparel..

Friday, April 17, 2009

Kurdish Wikipedia & Sorani

Kurdish is a macro language. This means that there are multiple languages who are Kurdish. When you read the Wikipedia article "Kurdish language", you find all the classic arguments about language and dialects. The Wikipedia article assumes that Kurdish is one language while it provides all the arguments why a differing opinion is sound.

The language policy of the Wikimedia Foundation requires for new languages that they are recognised as a  language in the ISO 639-3. The request for the Sorani language was quite special; the localisation for ku-arab we were told was actually Sorani but what topped even this was the request we received from one of the Kurdish admins to consider the 500 articles in the Arabic scripts in the Kurdish Wikipedia because they are in fact in Sorani.

Given that the Kurdish community itself supports the request for Sorani and given that all the requirements are in place when you allow for such special circumstance I consider the Sorani proposal to be extraordinary and ready for showtime when the other members of the language committee concur.

Wednesday, April 15, 2009

Africa helping itself on the Internet

In December I blogged about the Afrigen project. In this project people are asked to add CLDR information for their language. Now after some months there are results and, I am impressed. Many languages have made a start and the first languages have completed all the information that is looked for in this standard.

In my opinion having quality information in the "Common Locale Data Repository" is a litmus test for readiness of a language for the Internet. The Afrigen project makes completed data available in their subversion.

The CLDR itself distinguishes levels of CLDR support; this includes how lists are sorted, how numbers are written and how a few languages are called. For this project to insist on a complete set of data takes courage but is in my opinion the right thing to do.

There are people who say that a language is on the map when it has its own Wikipedia, in my opinion a complete set of CLDR data has a much wider application.

An exploratory meeting at the Zuiderzee museum

Yesterday I went to the Zuiderzee museum, it has a rich collection about the area where my family comes from. It shows how people lived and worked in days gone by. The museum is named after the Zuiderzee, a body of water that became fresh water after the building of the afsluitdijk in 1932 and it has a collection that is particularly focused on the people who lived near it.

The Zuiderzee museum is in the process of digitising its collection and quite a bit can be already found on the Internet on the "geheugen van Nederland" which is called in translation, the memory of the Netherlands. Many Dutch archives and museums collaborate in this initiative that aims to digitise much of the Dutch cultural heritage.

After a number of telephone calls, we had a first meeting in Enkhuizen; it was very much a "getting to know you". We talked about what is important, licenses, meta data and how to make the heritage that has both our interest interesting to our public. Having communicated a lot lately with people like Mathias Schindler, Witty lama and Durova helped prepare me for this meeting.

Given that the Zuiderzee museum is in the process of digitising its collection and is already looking at how collaboration can further its objectives, a possible relation will be much more collaborative. We did talk about our digital restoration efforts, the importance of having information with historic material, the different types of material in the collection...

One of the restorations done by Durov is this photochrom showing two girls from Marken. I had brought this nicely printed at a high resolution as a present. It is pictures like this that made Marken the tourist attraction it still is and as such I hoped that it would be of interest.

Currently there is an exhibition at the museum that aims to give a modern interpretation of modern fashion designers of the Dutch traditional clothing. The colours were added later in the picture but the question what colours they should be will probably always be a mistery.

In a couple of weeks we will be in contact again and explore further where collaboration / partnership may bring us.

Tuesday, April 07, 2009

ş and ţ or ș and ț

Bugzilla bug 17428 informs us that the Romanian language used to be written with the cedille. This was wrong and there have been improvements in Unicode and it is possible to write Romanian properly.

You have to use Windows Vista for this.. I wonder how Linux supports Romanian.

As many people use legacy Windows systems, a straight conversion to the proper characters is not an option at this time for the Romanian Wikipedia. The bug report talks about a conversion similar to what is done for Chinese or Serbian. A problem is that this kind of software is not part of what the Internationalisation people at do.

So what to do, who to turn to?

A sense of perspective and the monthly statistics

I had the priviledge to be at the developer meeting in Berlin. I enjoyed it, it was a great meeting and many of the subjects that are dear to me moved forward. The best news was Open Street Maps, this will become part of Wikipedia. I am quite happy with the reception of the Wikiation Extension Testing Environment, this is a framework for installing and testing MediaWiki and its extensions.There is a sandbox environment available to MediaWiki developers, they just have to ask ...

I had several great conversations, one of them was with Trevor Parscal of the Usability project. One of the things we discussed was the MediaWiki localisation. I was happy with Trevor's appreciation of the work done at It is indeed quite amazing to submit code in the evening to find in the morning that you have to sync your code because of the work done overnight. When new code gets to, Siebrand, Raymond or Nikerabbit have a look at it and work on the internationalisation and then make it available for localisation. Typically there are people ready for any new messages and it is indeed impressive to find something completely localised in a couple of languages the next day.

My perspective is different. For me is 310 languages, only 58 languages have all the most used messages localised and when you look at the group statistics in time you find that our localisation only goes up slowly. I know the hard work that goes into getting ready for a new project and while I welcome improved usability and functionality it comes at a price.

The good news is the numbers are still improving and Brion is quite happy to consider options that will make localisations available sooner. The good news is that Nikerabbit may work on the Translate functionality in a GSOC project. The good news is in little things like the Babel extension being available in Gujarati.

With apprehension I am looking at the many messages that will be created as a result of the usability project.It is not that the Internationalisation will not be done, I fear that many languages will spend their time localising the new messages, messages that will be of necessity be volatile. I would prefer it when these new messages are done when the localisation is ready for the production messages.

What will happen is that we will leave things as they are; people can work on what they like. It will be interesting what the fall out will be. When the new usability is the hit that I hope it will be, it may mean that we gain more localisers and our statistics continue to slowly but surely improve and that my fears prove to be unfounded.

Finnish Wikiversity

I am happy that the Finnish Wikiversity has been created :) Let the editing begin :)