Thursday, February 26, 2009


Microsoft was important to me. I used to be an MCP. The last few years, I didn't have reason to keep up with what they do. Vista came and the stories about it were repetitive, there were no stories that I found compelling. So, Microsoft was important to me.

At this moment Microsoft is getting headlines that get my attention again. I learned that there will be a new version of their software, that they are still embroiled with the US government after eigth years of Republican rule. That their problems in Europe are not over either, and now they start fighting a patent war with Tom Tom.

Microsoft was important to me. Then I became indifferent. Microsoft may become important again; sad...

A Commons portal for Media restoration

Commons has a new portal, one for Media restoration. As it is, it is beautiful, it says well what restoration is about and, it is giving a home to the growing community of people who restore.

Commons is a repository for freely licensed material and, its content should be of value to the projects hosted by the Wikimedia Foundation. Of these projects, Wikipedia is the biggest and because of its encyclopaedic nature, many of the subjects are best illustrated with historical material. When you talk for instance about Jerusalem, nobody flying into Ben Gurion airport will see Jerusalem like this..
When people talk about history, it is really hard to imagine how different things were and for these reasons historically important pictures like this are so powerful. There is a wealth of detail in a picture like this, details painstakingly restored to provide us with an impression of what once was.

Wikipedia and its sister projects have a need for quality pictures like this and it is right that the restoration process will be given more attention. This portal will help to grow this community and, by giving more attention to the great work our restorers do, we will convince archives and museums to work with us and so together we will provide our audience with a better picture of what is our past.

A GMAIL feature request

I received an e-mail from my friend Raoul. Raoul has a new e-mail address. I changed his profile and added his new e-mail address. Now should I remove his old e-mail address?
  • When I remove his old address, I lose the connection to all the mails I received in the past
  • When I keep his old address, I may use it to send him new mail
What I would like is for Gmail to have a "historic" bit for e-mail addresses so that I will be able to find the old mails from my friend Raoul.

Wednesday, February 25, 2009

OpenID is a good idea, but how to use it really?

At the moment I am testing and documenting the Wikiation Extension Testing Environment (WETE). This means that I find myself installing all kinds of MediaWiki extensions and see to what extend I can install them.

Many of the extensions implement good ideas and particularly those good ideas I feel strongly about, I try to implement with the Wikiation Installer. When I can install them successfully, when the extensions can be tested successfully I can better argue why a particular extension is important.

OpenID is one idea that I think should be supported. I have my OpenID and, I have started to associate my profiles with my OpenID. I thank AboutUS, among others, for making my web presence more secure.

The Freebase Wikipedia extracts are available at Amazon webservices

In a blog post of the Freebase blog, it was announced that the Freebase extracts of Wikipedia are now available at the Amazon web services public data sets.

For me this is an indication how relevant data from Wikipedia and data derived from Wikipedia is. This is only one out of many projects where data derived from Wikipedia has its own life. It is exceedingly important that this movement of extracting data, mashing it with other sources is recognised for its potential. This is the frontier of data connectivity.

Because of its role as the premier encyclopaedic resource Wikipedia has an important position. I hope that the Wikimedia Foundation will recognise its importance and give priority to the necessary tasks that will further this movement.

Tuesday, February 24, 2009

Gmail is off line

Gmail is unavailable to me at the moment. The funny thing is that I only noticed this because I cannot get at my contact information. I have been happily reading my mail, sending out mail.

The annoying thing is that Google does not yet support the contact information off line.

Sunday, February 22, 2009

Lower Saxon or the nds nightmare

To be blunt, there is no such thing as one Lower Saxon language. This is best explained by the Language Family Tree at Ethnologue.

This tree shows that there is a "Low Saxon" group of languages and in it you can find "Saxon, Low". This "Saxon, Low" is what the standard refers to as nds.

So let us analyse this mess a bit more. The Dutch Low Saxon languages are on a same level as nds which contain the German linguistic entities. This is the result of a different methods being applied to the Dutch and German "Low Saxon" languages.

In Germany there is a group fighting to preserve the "Saxon, Low" language. They do this by creating an overarching orthography for all the "Saxon, Low" linguistic entities. The idea is that as a result there are enough people to save the "Saxon, Low" language. In order to further their cause they have made this orthography mandatory. The problem is that they borrowed heavily from the German language. This resulted in the split of the; Dutch people who read and write "Low Saxon" found their language was no longer welcome.

There is a request for a so called "". This is problematic on many levels. The nds code excludes the Dutch languages and there is no code for "Low Saxon". There COULD be a code for "Low Saxon", what is needed is to make a request to the maintainer of the ISO-639-5.

At this moment there is a request to rename the nds language in into nds-de. You will now appreciate that this is technically not right. What would be right is leave the localisation alone and apply for a code for "Low Saxon". This would then be the correct code that is to be used with this new "Low Saxon" Wikisource.

When you think this is complicated, start thinking why this should not be part of the

Happy birthday Valerie

OpenID is a good idea, but how to use it really?

OpenID is a good idea; you log on once and once you are logged on, you are authenticated against your active credentials. The idea is simple and it makes the password hell manageable.

Passwords are a pain because there are too many places where you have to maintain them. When the Wikimedia Foundation introduced Single User Logon, it was great because it replaced 435 websites where I had a password with only one password.

I want to reduce the number of places where I have to enter a password because this provides me with more control over my profile and my security. I would prefer it if I could use my banks strong authentication to authenticate to my OpenID.

The problem is I cannot. I love it when the BBC writes: "Easy login plans gather pace" but for me the reality is different. I do not care that Yahoo, Paypal, IBM, Google are a supplier of OpenID, I want them to accept my credentials when I log on to their website(s).

Support of OpenID means first and foremost that you ACCEPT authentication. What I want is OpenID everywhere including Wikipedia because otherwise it is just a distraction.

Thursday, February 19, 2009

The importance of having a chapter

Every now and again, I read something on one of the mailing lists that is REALLY exciting. Something that may put a different perspective on things.

In a reply on the Foundation-l, mentioned that a large donation of content is not going to one of our projects because there is no chapter. RYU Cheol asks if we would like to promote donations of content as much as donations in money.

Even though this apportunity is lost to us, it is a clear sign why it is important to have chapters in as many countries as feasible; they help us achieve our goals.

Wednesday, February 18, 2009

An epic fail for Google and Life

Google hosts the Life photo archive. That is great. What is not so great is that they claim copyright where the copyright is clearly expired. There is an end to copyright and, while it is perfectly ok to watermark pictures that are obviously out of copyright, claiming copyright is not.

The idea is that if when you like a picture, you buy a framed version of that picture. The problem with these framed pictures is that you do not see what you get. This picture of an unidentified native American from 1897 for instance includes the full sized scan with the colour references and other stuff that are great when you restore a picture, but are lousy when you want to hang the picture on a wall.

So in essence it is an epic fail. Copyright is claimed where this is indefensible and, the product sold is not how people want to see it in a frame.. Even so, what is the point of a "premium frame with styrene glazing" when the picture is not of the quality of one of Commons restored images ?

Tuesday, February 17, 2009

Apertium, or benefitting from computational linguistics

There are Wikipedias in many languages. Some languages are written in multiple scripts. And as long as one script has sufficient information it is possible to convert it into another script. When this can be done with a program, it prevents an awful lot of work.

Several languages have a Wikipedia with multiple scripts. For some like Chinese, there is software that converts from one script in another. For other languages it would be nice to have this as well. Fiji Hindi is one such; it is written in the Latin and the Devanagari script. A lot of work has already gone in the Latin localisation at, and it would be really beneficial when we could automate correctly in the Devanagari script.

I had a talk with my friend Francis Tyers and he is quite willing to help with this. I know Francis from his work on the Apertium translation engine and conversions like this may be doable. Francis is happy to look into the possibility of such a conversion. The only thing we have to solve is how do we get the Latin text out and back into the message file again.

We are looking into the language part of this, now we need someone who can help us with the MediaWiki side of it.. Can you help?

Domesday scenario

How do you call a project with over a million people collaborating on information about their country. A project that came to a successful end. A project with many units sold? A success.

What do you call the same project when twenty one years later only two working systems are left ?

The Domesday project is very much a project of its day and as a generation of British school children were involved, it has had a lot of attention to make the data available again.

There are lessons to be learned. Some of them seem to be obvious in our open content world. They seem to be obvious because we insist on Open Source and the use of licenses that are considered to be "free". There is however more to it. There are also the standards underlying the data. The basic standard we use is text, text expressed in Unicode. This standard is not perfect because some of the languages supported in the WMF do have characters not yet supported in Unicode.

In this text we often express information in a structured way. As long as it is seen as HTML, I can read it. When I look at the Wiki syntax I am lost. When people datamine Wikipedia, special software is written to parse these infoboxes and tables. The result is DBpedia and the DBpedia community does a great job.

The point is that it does not have to be this way. Were we to adopt Semantic MediaWiki for Wikipedia, we would adopt Open Standards that enable us to present our data in a way that is understood by other computers. This will help us achieve our goal of providing information to people because our data will be used to provide a better understanding. In this way we open up our data in a way that was not possible at the time when either of the Domesday books were written. We would open up our data and make it truly free because we make it available for innovative applications.

Mark Harmon is picture of the day - a Wikiportrait

A picture of Mark Harmon is picture of the day on the English Wikipedia. It is a great picture and there is a funny story behind it. The photographer, Jerry Avenaim, had provided the picture in a lower resolution version. As it is a great picture, it was proposed for "Featured Picture". Because of the low resolution it did not qualify and a higher resolution picture was asked and given. Now it is the picture of the day.. :)

Many articles about living people do not have a picture. Having a quality picture with the article is typically in the interest of both Wikipedia and the person in question..

The Dutch chapter has a project called Wikiportet. This is a Dutch website where people are asked to contribute pictures to those articles that do not have one. It works really well, when you read Dutch. It does include all our hoops, free license, OTRS references the works.

Now as so many people in the world do not speak Dutch, there is the opportunity to localise this project in English, French or whatever. There are people everywhere that did merit an article but have no picture or a lousy picture. There is room for many more featured pictures :)

Sunday, February 15, 2009

How to present big files.

On Meta we proposed a project for the digital restoration of images. These images are typically public domain and they are often made available from a friendly archive. In the workflow of a restoration, there are several moments when a change is made that cannot be undone. When you work on restorations in a Wiki way, you have to save the work before such a change because this allows others to improve on them.

Images that are being restored may not be compressed because this introduces distortions that negate the restoration work. Once a restoration is completed, it may be compressed for use in Commons. For a normal illustration in a Wikipedia article compression is normal.

When you look at a really big picture, it can take quite some time before it is available to you; it slowly build from top to bottom. It is like watching paint dry. It makes more sense to start with an outline of the image and drill down for more details. Djatoka is open source software written by the Los Alamos National Laboratory that does exactly that. When I read its specifications, it makes me feel really entheausiastic. Have a look at this ox for instance.

So the next question is how do we make this functionality part of MediaWiki.

There is a wiki for the Wikipedia Usability Initiative and it has one nice surprise; it uses the Liquid Threads extension for its talk pages.I have started two threads on my user page and I find the way it feeds into the recent changes really intriguing.

The Liquid Threads provide a paradigm shift away from the current talk pages and, I do welcome having more control over the discussions. With the implementation of LQT in the Usability wiki, I think there is no need to have a separate implementation in the labs environment.

I wonder what it takes for a labs implementation to happen. Brion indicated in his FOSDEM presentation, that Semantic MediaWiki is a mature technology, I would welcome a labs implementation. I would love to learn what the procedure is for getting one.

Saturday, February 14, 2009


Orphanet is a website specialised in rare deseases and orphan drugs. It is considered to be a really relevant organisation. In their French newsletter, they found it necessary to express a negative opinion on Wikipedia. In a way it is sad. It is sad because even when Wikipedia gives the best information on rare diseases, its intention is not to be a specialist portal for rare diseases.

Organisations like Orphanet should appreciate Wikipedia for what it is; a place where encyclopaedic level information is provided. Encyclopaedic information is limited and when an organisation is part of the Wikipedia solution, it may feature as an external resource.


Wikipedia is about education. Many of the subject in science are hard to grasp. How do you get your mind around a molecule, a protein.

The picture is one called 1D66.pdb, it is part of a group of five called "cartoons". When you look at the cartoons, you have the option to spin it or zoom in. The realisation that this is three dimensional really sinks in when you see it rotate.

The power of Jmol, is understood. I can remember that Tim Starling looked at the code and found that it is possible to include malicious code.

When you read the Jmol article on MediaWiki, you will read that they have fixed this in their revision 10467. What I would like is to have the extension for the WMF checked again. I do think it is important that we share the benefit of visualisation tools like Jmol.

The problem is the capacity of the WMF to review code. There are too few people who assess code. This is a problem that prevents tools like Jmol to be (re-)assessed. It is a problem that prevents the work of several developers from going live. This makes MediaWiki inward looking and prevents a wider acceptance of MediaWiki in the rest of the world.

I wonder, if the assessment of Jmol would gain priority when localises this code ?

Friday, February 13, 2009 formerly known as Betawiki

Localisation and internationalisation for MediaWiki is done at We have always called it Betawiki, but it proved to be a source of confusion. The name of the project does not have a relation to the URL.

It is for this reason that the admins of have decided to abandon the Betawiki name. This blog has many articles referring to this project. All the articles have been given a new label.

This is just a name change. Effectively nothing changes; we will continue to rely on your help in making MediaWiki insanely great for all the languages that need localisation.

Brion presented at Fosdem

FOSDEM is a great conference. I enjoy going there and this year one of the highlights was the message according to Brion. The title of his presentation is: "MediaWiki's big code and usability push".

There are a few REALLY relevant things in his presentation. The one that really thrilled me was that Brion stated that Semantic MediaWiki is mature software and, that it has a priority for him to provide a final evaluation of this code. An endorsement like this was not something that I expected.

The interface optimised for mobile phones was another thing that I found interesting. My initial impression was that only English is supported. This turned out not to be the case. Several languages are already supported. It turns out that there are a set of values that need to be known to make Wikipedia content available for other languages.. When there is a clear specification, it should be possible to support many many more Wikipedias.

There is much more in the presentation and yet more was discussed. Fosdem was awesome... Berlin is next in April, I expect it to be Wunderbar.

Monday, February 09, 2009

Sandro Botticelli

These are two versions of a self portrait of Sandro Botticelli. This picture was originally uploaded with a bot by Eloquence, and at some stage the picture was "enhanced". While I do agree that the original colours are a bit flat, just changing the colours does not a restoration make. There are other details to consider; the colours of the tree and the shrubs for instance do not look green any more.

Botticelli is a famous artist, it shows because the picture is used on at least 150 pages in 38 projects. When art is presented, it is not a great idea to just change the original picture "because you can". It is to be preferred when the original remains available. When improvements are made to an image, adjusting the colour levels is only one aspect of a range of actions that should be applied as a whole.

Changing an image like this may be well intended but the link with what a picture "really" looks like is lost. In my opinion it is extremely important that the changes that are made are motivated and the different phases of such a change should be available. This allows other people to collaborate on a restoration. In this way, the image is available for impovement in a wiki way.

Nikerabbit's laptop

Nikerabbit is the principal developer at Betawiki. He came to Fosdem this year. We had a great time; it was great to have people like MinuteElectron, Siebrand an Nikerabbit meet Brion. It was important to discuss what we can do with the MediaWiki internationalisation and localisation and what we cannot. It was really important that Nikerabbit was there.

When we came back to our car, we had the unpleasant surprise that the car had been broken in to. Nikerabbit's bachpack with his laptop, his passport and his clothes was gone. Niklas is a student. He works as a volunteer. His work is indispensible.

If you can help Niklas out, it would be really welcome.

Sunday, February 08, 2009

How an extension gets to Betawiki

We are with a large group of MediaWiki developers at Fosdem. We talked with the Mozilla people. We had a look at their wiki and, they have a really interesting extension that supports videos from many video hosts. Siebrand had a look at its code, its license. He picked it up and committed it after some work on the code and the messaging to SVN. It is now at Betawiki.. A thank you is due toRuiz, the author of the code.

We have another thing to talk about with the Mozilla people.

Wednesday, February 04, 2009


There are all kinds of reason why attribution is desired. It indicates the rights holder to some IP, it may be because people are proud of their work, it may be to direct people to another resource.

Many people associate this picture of a boar with me because it is used on the many profiles I have on the Internet. In a way, when people see my "zwijntje" they will associate it with me. Often pictures are shown beside threads and consequently this pig is associated with text attributed to me.

It is important to understand that attribution exists on two levels. There is the attribution that is the acknowledgment of effort and there is the attribution that is the consequence of copyright law and licensing. Many people mistake the first with the second.

Not all material in Wikipedia must be attributed, Most of the material that is being restored is in the public domain. As a consequence, copyright law does not apply to restored material and therefore there is no legal need to attribute the work to the person who restored the picture. Acknowledging this work is something that we do in our community. It is important that we do. By acknowledging our efforts in our community, a legitimate sense of pride can be felt by our contributors. People take pride in their number of articles, their featured articles, "did you know"s and featured pictures.

Pride in our achievements is provided by attributing the good work to the people who did it. Attribution in the legal sense is firmly a function of IP law, it is the kind of attribution I feel uncomfortable with.

The pain of providing stable targets

In the policy for new languages, there is a requirements for the localisation of the "most used" MediaWiki messages. These messages have been stable for over a year. This ensured that there was a stable target to shoot at.

Recently Siebrand did some profiling on the MediaWiki messages, and the "most used" messages were no longer the most used message. Consequently there are new "most used" messages and the statistics for the most used messages took a nose dive.

I can imagine that people may be upset about this. But the reality is, we require the localisation of the "most used" messages because they are the most used messages. Profiling which messages is a lot of work that is done only occasionally. I am grateful that the composition of the most used messages has changed. If anything I would welcome regular changes to the "most used" messages because it demonstrates vividly that localisation is shooting at moving targets.

Monday, February 02, 2009

Google is my friend

Waldir created a new Betawiki gadget. It is a real time saver. In a nutshell when you are going to localise, Google Translate is used to translate the text from English into your language. We have tested it for several languages and we find that in 80% of the messages, the result is great. The other messages needs some work.

All the message that are first translated by Google Translate are first FUZZIED. It is only a matter of removing the FUZZY tag to make the localised message available. Google proves again to be a friend because our localisers become more effective and for the languages Google Translate supports, it will become easier to keep up with the ever growing number of messages that need localising.

The only sad thing is that there are so many languages Google Translate does not support yet.

Sunday, February 01, 2009

Dutch chapter's annual meeting

Yesterday I went to the annual meeting of the Dutch chapter. In a way it was a celebration of the things that went well. There was a lot that went well. In another way it was the moment to reflect on the things that did not go as well as was hoped for. For me it is always great to have such a group of wonderful people in one place.

The ambitions for this and the next year are high. We hope that Wikimania 2010 will come to the Netherlands. The aim is to extend and intensify our contacts with organisations and government so that we can promote a more inclusive and open culture. Thanks to Wikiportret we have uploaded many portraits of famous Dutch people to Commons for instance this one of Carice van Houten.. At the meeting the first tentative arrangements were made to internationalise this project for other countries and other languages.

At an annual meeting, there are the obvious and boring things, the financial statements the discharge of the trearurer, the old board, the election of a new board. All these things happened. What I like most is having so many interesting people together, learning what they have been up to.

My appeal to Jimmy Wales

Jimmy, in a thread about counting characters, you express that you should be informed about such stuff. A reference is made to a manual of style discussion about scientific notation. Obviously it is very important and I am glad that you may make a difference in getting this important issue moving.

Jimmy, many of our Wikipedias are not doing so well. The problem for several of them is really basic. The characters used do not show up properly and reliably in modern browsers. The problem is a mix of MediaWiki, browsers and fonts. The issue is that no one is interested in championing a complete solution.. Too many parties seem to be involved.

Jimmy, you have your connections in so many places. You are an "approved character". Your appeal is obvious. You can make a difference because really the issues can be solved. This issue needs a champion.

Jimmy, could you please help me make my dream come true that a word like "Mbɔ́tɛ" will show up properly everywhere in MediaWiki. Could you please help realise this dream that all languages can use MediaWiki as it is intended. Oh and Jimmy, when the Lingala Wikipedia for instance takes off, they will need scientific notation too.