Friday, February 29, 2008

More things I learned about signing

I noticed that when a signer signs and a interpretor translates in English, the speed of the conversation is as fast.

There was a comparison in the complexity of the use of gestures and signs. It turns out that mother tongue signers use the most of either. In many ways it does not surprise because I understood that sign languages become more complex with a succession of generations of people signing. What I would expect that it is particularly this group that has the biggest influence on the development of sign language.

Gesturing is considered a sign of engagement in the story ie it is done more when this is the case. Particularly for speaking people.

WOW a presentation in ASL.. so I am listening to the interpretor :) So we have a presenter signing in ASL, the ASL interpretor translating in English and a interpretor translating the English into German Sign Language. And it is really funny; the speaker was asked to pause when a specific sign is used as an example as fort the German interpretor the sign is to be identified in order to present it in the translation. One of the Americans started to isolate the signs and it was picked up by the second German interpretor and passed to translating interpretor . The applause was unexpected... everyone started to wave ... WONDERFULL ... in the discussion there is the synchonisity because signers can speak at the same time ... the interpretors cannot do this as well as sounds really clash ... the applause was something else; everyone waving their hands over their head :)

For people that are familiar with SignWriting, I met Mieke van Herreweghe.. I discussed if a Wiki would be of use in educaton is Flemish Sign Language.. It is certainly great to extend the number of available tools :)

For me to be at this conference is truly a privilege, I talked with several people about what my interest is; the realisation of a Wikipedia for American Sign Language. What several people tell me is that it will be primarily relevant for children as it is hard for people that sign to learn to read and write their sign language. So as so often, some see this chicken and egg problem; are there enough people for a Wikipedia ... For a change, I would argue that it is time for an omelette. Have a Wikipedia and allow it to grow slowly and use its momentum also to grow the written expression of sign languages.

One of the reasons why I am so privileged is that many of these presenations will not be generally available... Such a shame, scientists have to publish...

Wednesday, February 27, 2008

The Bamberg conference; sessions about signing and gesturing

I am in Bamberg, there is a meeting of the German society of linguists and I am sitting in on sessions that are themed about signs and gestures. These are scientific presentations. As particularly deaf people have an interest in this subject, there are not one but two people who translate into sign language. One translates in German Sign Language, the other in American Sign Language.

There is a lot of interest for these sessions; all seats are taken and for nine people there is standing room only. The first presenter is not able to speak English and presents in German. Sadly the ASL presenter does not know German ... The German translator does :) When you give a presentation on gestures and signs, I would expect that you want to be seen. I am perplexed that this man does not only does not speak English but also sits on his chair preventing people to see his gestures.

In the first session of the conference, the notion of theorising was considered to be of importance but its relevance is established when these theories are proven by further research. This means that the numbers should not be massaged to prove a point. What I wonder is what numbers, what research underlies what I will hear in the sessions about signs and gestures

Watching a translator into a sign language is really interesting; one big thing is that to be effective, the translator has to be seen. In a full room with many people, this is not a given. I am not small, but another big person is in front of ME. :)

When sign language is to be written down, it seems really relevant to be able to have a video, it goes SO quick from sign to sign. Then again this is also true for spoken language, this is what steno is about. So when you are to give to as diverse a group of people as this, what do you do? What is on your power point, do you use power point. Do you want to organise the people that are in a room based on what you can see where? Having a full view of a signer is of no use to me, being at a fair distance to a screen is nice.

The question if a signer can gesture is something where I think I see an answer in front of me; the expression of one person signing to another is telling to me. The way the angle of the body is used is something that is not part of ASL as far as I know but used to emphasise. This may not be seen as gestures but they are similar to the way I use my hands and my face as part of getting my message out what are considered gestures.

One presenter is really sympathetic to me because she keeps track with how the translators are doing this is to prevent a gap to start between what she is saying and where the translator is left behind. It makes the presentation less fluent ... it is however NICE.

Before I went to these sessions, I thought what to wear .. I decided on my SignWriting T-shirt. Thanks,


Sunday, February 24, 2008

Asterisk at Fosdem

, fosAt Fosdem, I met some of the Asterisk developers. For those that do not know, Asterisk is the leading edge open source software for telephony. If I understand things well, it is even used in the WMF Office. I had a really wonderful talk with one of the senior Asterisk developers. What I learned that for telephony localisation is very much about the differences in telephony between countries. The dial tone, the exchanged tone is different from country and when a person is ringing in from a different country it helps THEM to understand what the status of their call is.

So where Internationalisation is about how telephone exchanges work for Asterisk and where Internationalisation is about language for what I am used to, it seems to me that a combination of the two is what makes a most compelling package.

Friday, February 22, 2008

International Mother Tongue Day

For me, the high point of the International Mother Tongue Day was the request for the fourth and final part for the localisation of MediaWiki in Telugu. I am really impressed with all the hard work that makes such a difference for the usability of our projects.

It is a well established fact that providing information in someone's mother tongue is most effective. With a Wikipedia where both the User Interface and the content are available in the same language we provide the best environment possible. When this is realised it is "just" a matter of maintaining and improving the quantity and quality of the project.

Thursday, February 21, 2008

Microsoft opening up ??

There is a lot of noise about Microsoft opening up much of its documentation. It is said that Open Source developers are free to use this documentation. It is only for people and organisations that are paid for delivering or supporting the resulting software that a license fee is to be paid.

The way I understand the GPL license, this is not allowed. When software is available under the GPL it is explicitly permitted to sell services for this software. In my understanding this is an other case of smoke and mirrors. When this new documentation is available for people writing software that is licensed under the GPL, Microsoft cannot claim license fees down stream. OR people are not permitted to make software that is encumbered in this way available under a free license.

With more software being used that does not consider Microsoft's software, it will become increasingly problematic for Microsoft to support its customers. I think that Microsoft's objective is to gain free support for its products.

How long will it take for people to come to the same conclusion ? :) PS I am not a lawyer :)

Kenyan call for peace, "Wakenya Pamoja"

I had another look at the wonderful Kamusi project. On its main page there is a video of many Kenyan artists singing, imploring their fellow Kenyans for peace. The song, it is in Swahili, can be found in many other places like the, youtube as well.

In translations the title "Wakenya Pamoja" means "Kenyans together". Have a listen, if only to listen how diverse the Kenyan music is. It is a great song.


Monday, February 18, 2008

We want more localisations

Betawiki is doing fine. The localisations of more messages are happening all the time, new and changed messages pop up as well. With Brion about to release a new release, this is the best moment to include more messages.

In an e-mail Siebrand included a table that shows that it is not just the small languages that we want to do better. These are the localisations for the languages of our 10 biggest Wikipedias.

| language | mostused | core | WMF
| English | 100% | 100% | 100% |
| German | 100% | 100% | 99% |
| French | 100% | 100% | 100% |
| Polish | 100% | 100% | 80% |
| Japanse | 100% | 100% | 58% |
| Italian | 100% | 100% | 64% |
| Dutch | 100% | 100% | 100% |
| Portugese | 100% | 100% | 100% |
| Spanish | 100% | 92% | 24% |
| Swedish | 100% | 100% | 100% |

What I find astonishing is how badly the extensions as used in the WMF for Polish, Japanese, Italian and Spanish are supported. Spanish does not even qualify as a language that is fully localised for MediaWiki !! Four out of ten of these languages do not qualify for a new project!!

I know that the Wikipedias in those languages may provide a better support. You will have to agree with me is that a lot of wasted effort is involved because why should this work be done twice?

When needed, Brion provides updates to the MediaWiki releases; they are the security updates. With the continued maturation of BetaWiki, it has become possible to create localisation updates as well. What is needed is to arrange and agree the mechanism on how to do this. When this is to be done, with what regularity will we do this, how will it be announced and the most important part: how will we inform our users.

We are still offering bonuses for work done on localisations for the languages spoken in Africa, Asia and South America. I even indicated Danny Wool that we are happy to pay for the localisation of MediaWiki in Central Quiché (quc).

Friday, February 15, 2008

Spot the loonie

There is no obvious solution to this one.
GerardM and

In the last week, two great resources were pointed out to me. TED is the one I like best; it is a great resource with some brilliant talks given by some brilliant people. The subjects are all over the place but what struck me most was a talk about how much we get things wrong when our perceived reality about the third world is confronted with numbers. The other one that was as revealing was about how to develop the Ethiopian market for agricultural products. Both speakers changed my thinking. I do thank Millosh for getting my attention sharing TED on Google reader. was pointed out to me on the foundation-l mailing list by Pharos. He pointed out the many movies that can be found tagged as Wikimedia. I have had a look and indeed, I am so happy to see the many Wikimania recordings that I did not know existed. I am happy with was is there, I am sad for what might be there as well. Maybe more recordings can be added to this wonderful collection.


Dalecarlian is recognised by Ethnologue. And according to the information provided, some 1500 people speak it. It is the first time that I did look up a record in Ethnologue to find that it does not exist in the ISO-639-3.

I looked it up because someone volunteers to localise MediaWiki in Betawiki. The question is, what to do with such a request. The first thing to realise is that the dlc code is unlikely to be ever used as an ISO-639-3 code. The second thing is that it IS a living linguistic entity and, it will be not too hard to argue that if it is not a lanugage, it is at least a dialect.

So allowing for the localisation of Dalecarlian is not problematic. From a standards point of view, there is only one issue to be resolved. What is the correct code that we should use for it?

Friday, February 08, 2008

How to deal with improved localisation

In Betawiki, a lot of localisation is done. Much of the work starts with importing the localisation from a Wikipedia. The existing work is expanded upon but also the existing work is improved upon. An obvious example is the inclusion of plural support. When a lot of such work is done, it has been committed and it has gone life on all the Wikimedia Foundation servers .. you want to see it best on your own project.

This will disappoint, and it will disappoint for the best of reasons. There are two types of localisation; there is the language localisation, this is what we do at BetaWiki, and there is the project localisation. In the project localisation there are changes that are specific to the project. You do not want to overwrite these. What a moderator may want to do is delete all the local language localisations. In this way only the project localisations are left. In this way all subsequent improvements to these language localisations will become available when they do.

When a language is not a language

Kurdish is not a language, Chinese is not a language. They are both macro languages. Macro languages are a way that allow for one ISO-639 language identifier to refer to multiple other ISO-639 language identifiers. Chinese is in many ways easy to understand; these are many languages that are all using the same script, the same writing system; a script that is not sound based. Given that the Library of Congress was the maintainer of the ISO-639-2, it is perfectly understandable that many languages were considered the same, were considered to be Chinese.

Kurdish is distinctly different. Kurdish is also very much a political story. The Kurdish language and the Kurdish people and culture are seen as problematic in all the countries where the Kurds live. Linguistically, Kurdish is considered to be three languages. Central, Northern and Southern Kurdish. These languages are written in the Arabic and Latin script.

In the localisation of MediaWiki at Betawiki, there is Kurdish in Latin and Arab script. The question is, is it Kurmanji or Sorani because I have been told that transliteration is how you get the Arabic version.. I have been told that it is Sorani.

These questions have implications. Take for instance Persian or Farsi, There are successful WMF projects that start with fa. The language is also a macro language; it is divided in Western- and Eastern Farsi. I am quite positive that the is in Western Farsi. But the url should be

There are people that I know and respect that tell me that the difference between Farsi and Dari is only 2 to 3 % and that consequently this difference is an abomination. Other equally respectable people tell me that the difference between Farsi and Tajik is only 2 to 3%. Nobody has so far insisted on bundling the two together.

The request for the is in the discussion stage. It does fulfill the requirement for localisation. There is a sufficient group of people interested in making this project a success. It is just that I do not feel comfortable with the designation fa for this project that prevents me from changing it to the "eligible" status.

What I want is to have the "" renamed.


Thursday, February 07, 2008

Wikiversity on UNESCO's CI News

Wikiversity has a course called "Composing free and open online educational resources" that targets teachers and teacher-students without prior knowledge about free and open educational resources. This course will start on March 3rd 2008 and a registration for participation is needed.

The course will be given in English but resources in other languages can be created as part of the course work for this nine week course.

I think it is absolutely splendid that Wikiversity is able to get UNESCO's attention and have it included on the CI News. Getting attention with cool projects makes a difference; in this way Wikiversity creates its own space and relevance.

The students will start creating educational material in many languages as part of the course. It is exactly the MediaWiki environment that is well suited for this. Currently there are projects in over 250 languages and the quality of the localisation of MediaWiki is improving continuously; in Januari alone localisation was done in over 130 languages on Betawiki.

So good news from Wikiversity and I hope that the course will go really well :)

Tuesday, February 05, 2008

When a language is no longer that language

Gothic is a truly ancient language. It is dead, it has not been used in earnest anymore for a long time. The Wikipedia article is quite clear about this; "There are only a few surviving documents in Gothic, not enough to completely reconstruct the language".

There is a Gothic Wikipedia. I cannot read it, I do not have the necessary fonts and that is not a problem. What I do find problematic is what I learned about the localisation effort on Betawiki. Siebrand asked about the change in the font used because prior to the new work, the localisation was in Gothic. The answer is that the messages are now transliterated into the Latin script. One of the arguments used is that "the original fonts are more difficult".

I have a problem with reconstructed languages. They are no longer the language itself and they should not be tagged as such. By moving to another script, this project has moved even further from what I think is acceptable. This move from the Gothic script to the Latin script is not acceptable to me. When the problem is with finding a proper font, this has to be tacked.

I do urge the people interested in the Gothic to be truthful to that language. This is in my opinion already quite impossible but as they stop being truthful to the language by ditching the script, I am of the opinion that they lost the license to call it a Gothic Wikipedia.



Kotava is a constructed language, it is recognised as an ISO-639-3 language (avk), and they have started their encyclopaedic project at They are doing really good at making sure that their language is supported well in MediaWiki; currently 76.56% of the core messages have been localised in their continuing effort.

With the recognition of Kotava in the ISO-639-3, Kotava was added to OmegaWiki as a language that can be used for editing. What I am really happy with is the way Kotava is supported. The first thing that was done is localise the user interface in Betawiki, and today I received a file with translations and definitions of the languages of the ISO-639-1.

For OmegaWiki Kotava is a great example on how to support a language. Everything that can be done has been done. When Brion makes the next release of MediaWiki we hope to upgrade to this release. The nice people at Betawiki are planning on supporting the last stable release starting with the next release. This would mean that we will be able to update the MediaWiki localisation as well as the OmegaWiki localisation.

In the mean time, I am proud that we support Kotava already this well.

Sunday, February 03, 2008

One requirement for the Portugues Wikiversity has been met

When a subsequent project is requested for a language, one of the requirements is that the localisation for that language has to be completed. It means that all the MediaWiki messages and all the messages of the extensions used in the Wikimedia Foundation projects has to be complete.

This requirement has now been met for the Portuguese Wikiversity. Malafaya, a member of the Portuguese community finished the localisation today. He notified me, because he knew of the request for a pt/wv. He is not part of this new project but he is passionate about Wikipedia and he likes to tackle the issues where he can make a difference.

This week Betawiki migrated to a server with better specifications. The effect has been immediate, the response time has gone down and the number of messages that have been localised has increased. This is a moment where you can write about a sponsor... We found that Netcup, the ISP for BetaWiki, was willing to help us with a bigger server... It is not that they expect us to write about their hosting, but we can and we are happy to.


Is localisation for plurals supported for your language?

When you localise software, the information that is provided in a message may differ. There can be none, one or more instances involved of what the messages informs about. Depending on the numbers involved, the sentence differs and support for singular and plural is needed.

MediaWiki supports this. However, it is not obvious how plural support should work as it differs from language to language. Welsh for instance has more then five forms of plural to consider. When you want your language to support plurals, it needs to be clearly defined how this should work and once this is done, you either post a bug to Bugzilla or ask the fine people at Betawiki to set it up for you.

Nikerabbit has recently implemented the plural support for Lithuanian. As a reaction a bug was entered for Russian with a request for plural support. On the Betawiki tasks list there was a comment that there are many more languages that could profit... Hr, Cu, Cs, Be, Be_tarask, Sk, Sl, Sr_el, Sr_ec ...

When plural support has been implemented, it means that a lot of messages have to be revisited to create the right text. These messages can be found on the "List of warnings" where messages with problems are indicated.

Supporting plurals happens in two phases; first the software is adapted for a language and then all the messages with plurals have to be adapted. It is a lot of work but it sure improves the user experience :)

Friday, February 01, 2008

Group statistics in time

The Group statistics in time provide you with information about the localisation of MediaWiki. The numbers are impressive. In a year the number of languages that have a full localisation have doubled. In the last month alone 5 more languages have been completely localised.

The most impressive news is that in just a month 25 more languages have the most often used messages localised. I expect that this will have an effect in reaching out to the readers of our wikis, I hope that we will help more of them to becoming editors and consequently be better able to bring information to more people.

There are still bounties to be won for languages in Africa, Asia and Latin America. The first claims for Tajik, Northern Sotho and Marathi have been completed. We really want MediaWiki to be the software that is the preferred way of informing people in any language on the Internet.


OmegaWIki in Telugu

OmegaWiki has its fist messages localised in Telugu. To celebrate this, I made తెలుగు the word of the day. :)