Sunday, April 29, 2007

Orientation of text

When you have a text, you write either in a right to left direction, like in Arab or Hebrew, or you write in a left to right direction, like in English or Russian. In OmegaWiki, we are going to support the orientation for right to left once Multilingual MediaWiki is integrated in our software.

I have given SignWriting a lot of thought lately. This is a script that goes top to bottom. What I just realised is, that as as consequence a left to right word mixed into such an environment would also go top to bottom. It makes however as much sense to have a right to left word also go in the same direction.

The problem is that when you have both RTL and LTR in a top down environment, it will seem odd.


Friday, April 27, 2007

Google video

For a lark I checked Google Video for Wikipedia articles. It was no surprise to find stuff there. There was all kinds of stuff including this political rant about articles and NPOV on Wikipedia. The guy was insulting and threatening a Wikipedian in a way that I would ban him for a substantial amount of time. This is not freedom of speech is about.

I have no opinion at all about what this guy is on about. What I do know is that he does not help his case. This kind of gangster mentality is really obnoxious. His problem is that by behaving in this way he polarises to the extend that I would not even consider his arguments.

Thursday, April 26, 2007

Skype ..

I have been using Skype for so long now. It is absolutely essential to me. Today I had a first and, the person I spoke to also had a first in the way we were using Skype. To her Skype is an essential tool too.

I spoke with Valerie Sutton, her first was to use Skype and actually listen to it. Valerie is using Skype with video and uses sign language to communicate.

For me it was a first to actually see Valerie while communicating on Skype. It was all really accidental; I was about to disconnect when I found that the video was actually supported as well.

We discussed licenses.. I asked for some pictures to illustrate Wikipedia articles that are about sign languages. She told me that i could have any picture of her that I wanted and have it under a Free license.. We discussed the quality of pictures as well..

Now that the conversation is over, I thought of this, I would like to have a video fragment with every Wikipedia article about a sign language and have someone who is native in that language sign it :)


How the Wiktionary interwiki bot is running

In a previous blog about Wiktionary quality issues, in a mail to the Wiktionary mailing list and a post on the English language bear parlour, I informed of the Polish Wiktionaries request / demand not to include the Russian and Vietnamese Wiktionary in the interwiki link.

The general reaction is that the Polish can wish what they want and that I should be willing to accommodate them. It is however not a zero sum game. Whatever I do, it has repercussions and it is in the end for me to make the choice that fits me, as the runner of the bot, best.

I have talked with Andre Engels about this. The result is that I will not update the Polish Wiktionary any more. All other Wiktionaries will still be updated with information with information about all Wiktionaries including the Polish.

I really dislike this situation. The good news is that the Vietnames Wiktionary is aware of this problem and is working on a solution, they just need time. The Russian Wiktionary is in my opinion a wasteland that is not really inviting to work on. Then again it is stubs on a grand scale.. people are working on it though. The Polish .. I am sorry that they feel this way. I do think that this only isolates them and that isolation is the major weakness of the Wiktionary projects.


Wednesday, April 25, 2007

The lamest technology mascots ever

Wired has an article with some mascots. The idea was funny enough to get my attention. I got two things out of it. I did not know about the existence of Wikipede or that we might have had a mascot if it had not been for "a slow death by consensus". The other thing is the way our encyclopaedic project is described: " Gangware encyclopaedia Wikipedia".. Funny


Monday, April 23, 2007

Lies, damned lies and statistics

It has often been said that the only "reliable" statistics are the ones that you compile yourself. Strike that, I am not great at this science of statistics, I just use them to blind people with what is made obvious in this way. A week ago we celebrated that we had 15.000 Expressions for German. Today we celebrate that we have over 10.000 Expressions in Italian.

According to Kipcool, the statistics are wrong. His statistics take into account the fact that we occasionally have reason to delete Expressions. According to his figures we have 14.020 German and 9.776 Italian Expressions. This means that we may celebrate the round numbers again at a later date.. :)

For the Destinazione Italia project in OmegaWiki, we will need translations in many languages with an emphasis on European languages. As we already have a lot of content in Italian, it is important to mark what is already there also is known to be part of this collection. In this way do we know how many Expressions are there to process.

It seems to me that numbers, statistics tell a story, they tell as much about what is represented as about the person who uses the data. It is after the collaboration on statistics that you find what the numbers actually mean and what they truly represent. After such a process, they have become the statistics of a project and are owned by its community.


Saturday, April 21, 2007

Wiktionary quality issues

On the Wiktionary project I run the interwiki bot. The process is simple; when an article exists in another language spelled exactly the same, I create an "interwiki" link. This allows you to see the information on another language Wiktionary. This process is an automated process, it works on all Wiktionaries and it is an unattended process.

I have received a request from the Polish Wiktionary to stop adding interwiki links for the Russian and for the Vietnamese Wiktionary. The reason given is one of quality. On the Russian Wiktionary many of the articles are created by a bot and they do not provide good information. An example is dispersion, there is nothing really in there. The Vietnamese Wiktionary is more problematic because a bot was used to generate declension and conjugation tables of Russian words and they got it wrong.

The Russian Wiktionary has some 81.000 empty shells and refuse to remove it. The Vietnamese are not willing to remove there incorrect data.

I have been asked to stop including the Russian Wiktionary and the Vietnamese Wiktionary when I run the interwiki process. To be honest, I run the bot as a service and I do not think it is the right thing to do. I think the Vietnamese are wrong not to correct the wrong data that they have. I am less sure about the Russian approach; in essence it is a stub. However, creating a Wiktionary in this way is like stamp collecting; you can look at it but there is not information about it.

Given how the process works, I am not sure that I can exclude either the Russian or the Vietnamese Wiktionary. The way it works is that I run explicitly on all Wiktionaries. When I exclude Russian or Vietnamese, I will probably end up removing all references to these projects. They are the third and fourth Wiktionary is size.

When I do not exclude the Russian and the Vietnamese Wiktionary, the bot may end up being blocked on the Polish Wiktionary. This will also kill off the interwiki process.

From my point of view, using bots to generate content in a Wiktionary only makes sense when there is at least a link to the word in the base language. When the initial creation of stubs is followed by the enrichment of these stubs it is acceptable. For having information that is completely wrong, there is no excuse.

The question is, will there be a discussion about acceptable practices in Wiktionary. The question are:
  • Can the Polish demand what they do?
  • Is having a project that consists mainly of stubs acceptable?
  • Is having incorrect data acceptable?


Tuesday, April 17, 2007

The Uyghur‎ user interface

Uyghur‎ is a Turkic language spoken by the Uyghur people in Xinjiang. According to the English Wikipedia, the language is written in the Arabic, Roman and again the Arabic script. Cyrillic is actively used as well.

The officials script in China for this language is Arabic and has been since 1983. The ug.wikipedia has a Latin script user interface. The ug.wiktionary however is right to left. This means that there is at least a need for a user interface that is either in Arab and one in the Latin script. Given that Cyrillic is another actively used script, we need three message files for Uyghur.

When a language is expressed in so many ways, the current MediaWiki software does not allow for supporting Uyghur in a meaningful way. It will be good when Multilingual MediaWiki becomes a reality. The good news is that the prospects for this are really good. The last status report I heard was that some install routines have to be written and then people can start experimenting with this new functionality.

PS Check out what MLMW offers :)


Friday, April 13, 2007


SignWriting is one way of expressing signed languages. There are several ways of doing this, signwriting is the one favoured by most people who actually write down the signed languages. There is a request for a Wikipedia for the American Sign Language or ASL expressed in SignWriting.

I am in favour of such a project. There are however a few relevant issues
  • SignWriting is currently not supported in UTF-8 and MediaWiki expects UTF-8.
  • There is software specific to SignWriting, it can be used for a Wikipedia
  • Due to the technical issues, there is no way it can use the Incubator.
Given the objective of the Wikimedia Foundation, supporting this project is very much something that we should support. ASL is a very different language. Given that the organisation behind SignWriting is very much in favour of the creation of a Wikipedia, there is ample scope for the Wikimedia Foundation and the SignWriting organisation to work together and overcome all the relevant issues.

For me it is important that with this Wikipedia project the culture of the deaf will get a boost. When SignWriting becomes even more mainstream, it has the potential to become the script for many of the other sign languages as well.

Practically, I would like to start with this project on WMF servers using the wiki like software that already exists. I would look for programmers and or funding to enable MediaWiki to support SignWriting. I would look for funding to create a full implementation of the SignWriting glyphs in UNICODE.


Thursday, April 12, 2007


When you cannot see, resources like Wikipedia are not available to you. When you cannot see, you need something like braille to make such resources available to you. You still have the issue who is going to convert text to braille.

RoboBraille is an organization that converts to several different formats that can be used by braille readers that work with computers. The software works by receiving an e-mail with the text that needs conversion.

I can imagine that this engine could work for a website as well.. Consider what a difference it would make when Wikipedia would be available in this way.. The converted pages can be cached like any other page. When there is an issue with doing this realtime or near realtime, it would be possible to do this for articles that are featured articles or that are part of a "final version".

I would love to see a solution like this to become part of the service that we provide.. We aim to bring all information to all people.. blind people qualify :)


Tuesday, April 10, 2007

Linguists go Wikipedia

In a previous post I mentioned that as part of the funding drive for the Linguist list, the subscribers of the list were asked to vote with their wallet. They did, and an intern will be paid to organise an editorial update of the Wikipedia pages on linguistics. The idea is that the many thousands of linguist that are subscribed to the linguist list will ensure that notable linguists like Eve Clark and Tanya Reinhart will get a mention.

Consider, this is a leading list of some 14,649 linguists that will be urged to help us improve both the quality and the quantity of the coverage of the field of linguistics.

I am really exited about this.


Saturday, April 07, 2007

A board of trustees or an executive board

In a post on his blog Tawker suggests that it was "widely reported why" Mr Wool left his job. This is not true. Mr Wool explicitly refrained from explaining why he left his job and in this way prevented a discussion both about the organisation and his role in it.

The organisational consequences in the statements made in this blog are also very much wrong. The Wikimedia Foundation is in the process of setting up an organisation. This effort is hindered by a lack of funding and an unlucky choice in staff. As the staff becomes more professional, the board will be able to distance itself from the day to day affairs.


Friday, April 06, 2007


In OmegaWiki you can have many of the labels used in the data part in your language as well. For this to function, you have to select a language in the "user preferences" and there have to be translations of the words involved. Words like Estonian, Georgian or noun, verb, adjective will be shown in the selected language.

As the user interface is one of the most critical aspects for getting buy-in, it is something I often spend time on. Given that I do not speak languages like tiếng Việt, it is not always easy to find the right translations. For this language I was not able to find Gujarati. More bewildering was that I could not initially find the word Korean, it was there as korean with "tiếng Triều tiên" as its translation.

All in all I have added more than 10 translations in this latest session. I am sure that people who speak this language will have an easier time doing the same job.


Thursday, April 05, 2007


At OmegaWiki we have a word of the day. This word of the day is hopefully created before a new day starts. It is something that I find myself doing almost everyday. Most often I just pick a word at random. Today I picked the word petition.

Today I signed a petition, the Alan Johnston Petition, at the BBC News website. I think it is really sad when journalists are the victims of political violence. It prevents news coming from corners of the world. In this way you can prevent news going our or coming in. By stopping the flow of information, the risk increases that the "other" will be seen as the enemy.

I am afraid the Palestinians are shooting themselves in the foot. You can also sign the petition.. or think some positive thoughts about this.


Tuesday, April 03, 2007

Linguist List's Wikipedia Update Vote

Linguist list is the premier mailing list that aims to provide a forum where academic linguists can discuss linguistic issues and exchange linguistic information. It is more than just a mailing list; it also provides fellowships to graduate students who serve in return as editors on the list.

Yesterday there was a message that following an initiative on the Russian Wikipedia, they would pay for a graduate assistant to work half time for one semester to coordinate the improvement of the field of linguistics in the English language Wikipedia.

When their community thinks it important, they have to spend $2000,- extra and this will pay for this initiative.

There is nothing that stops people not yet associated with the Linguist list to contribute to this funding drive as well. :)