Thursday, October 30, 2008

Proof of concept for an ASL wikipedia.

I just received the wonderful news that Steve Slevinski has created a proof of concept for a MediaWiki that supports SignWriting. This is great news. It means that a Wikipedia in American Sign Language is that much closer..

This demonstration is not only relevant for American Sign Language. SignWriting is a script and given that SignWriting enables the writing of any sign languages, many more languages now have a real prospect of a Wikipedia in their own language.

Wednesday, October 29, 2008

Radio history, digitised and on-line

The Radionieuwsdienst ANP has been broadcasting radio news bulletins from 1937. In 1993, when the ANP moved house, many of the texts were found in the cellar. This treasure trove for the Dutch language was preserved and stored at the Koninklijke Bibliotheek. The 1,8 milion sheets of papers have now been digitised and made available on the website of the KB.

Material like this is invaluable to understand how a language evolves. It is also the perfect indicator for learning when certain topics became in the general interest. I hope that there are many resources like this available to the public. For the Dutch Wikipedia it is a great resource to serve as a source.

Tuesday, October 28, 2008

Bishnupriya on a cutting edge

Bishnupriya is a language of the Indian subcontinent. There is a Wikipedia in this language and it currently has 23k articles and is the number 52 in size. This makes it surprisingly bigger then the Hindi Wikipedia at number54. It is a language that only recently registered on me and I added it as a language to OmegaWiki.I am currently adding loads of translations to the collection of countries and territories and Bishnupriya is on the edge of getting its own status bar in this collection. This means that there will be 10% countries known to OmegaWiki in Bishnupriya.

Currently 23 of the 26 words in Bishnupriya are countries and it is therefore arguably not that big an occasion. It is however part of me raising awareness for "other" languages, how to give languages like Bishnupriya or Maldivian some presence. In my opinion, celebrating little events like this seems appropriate.

It is great for me to enjoy the little things because many little things slowly add up. Sharing these good things may get a smile on your face ... :)

Monday, October 27, 2008

Building a tower of Babel

MediaWiki is a big undertaking. It is ever growing in functionality, it is increasing its pull of developers coming to the platform. There are an increasing number of websites outside of the Wikimedia Foundation using it for the many qualities of the software and consequently there is an increased need for supporting this great platform.

When Wikipedia started, like at the start of the building of the tower of Babel, everybody spoke the same language and there was one God. Now there are many languages and many people decry a lack of faith and a lack of community. At the same time, the English language Wikipedia alone has grown to a staggering 2.6 million articles, there are many sister projects in other languages and sister projects with other aims and, the WMF has grown into a multi million dollar organisation.

All projects have their own momentum and each project has its own issues but when new functionality is added to MediaWiki, it is immediately there for all to use. The people localising at Betawiki have an increasingly hard time to keep up because of the size of the software and the increase in development.

The problem is that when the user interface is too hard to use, people will not use it. There are several issues that are a factor.
  • insufficient localisaton
  • inconsistent terminology
When the localisation is not sufficient, when people do not know this other language, they cannot edit. When the words used for a specific action differ from application to application, people will be confused and will not edit. There are solutions but they require a lot of attention and investment.

The first thing we could do is reduce what needs to be localised. This may mean that MediaWiki becomes more modular and that these modules can be turned on when needed. A good example is the "Flagged Revisions". Another strategy would be to first localise the messages that have proven to be stable; those would be messages that have not changed since the previous release.

We need localisation, any localisation and inconsistent terminology is only a problem when there is such initial localisation. The problem is that even with consistency within MediaWiki there is still a problem when it is not the same terminology as used in the other software people use.

So what can we do to make things better. The WMF should invest in the tooling at Betawiki. There are several plans already that will improve the Betawiki functionality, they will improve efficiency and this is likely to increase the throughput of translators. For me the biggest advantage of such an investment is that the WMF will become more aware of what it is to support software and projects for 297 languages.

Tuesday, October 21, 2008

Localising for the fundraiser of the Wikimedia Foundation

The Wikimedia Foundation uses MediaWiki for its software. The localisation of this software is done at Betawiki. At Betawiki we support several other applications for their localisation. The benefit for these other applications is that Betawiki has a great community of translators.

Every year, the WMF has its fund raiser. Every year the message needs to be tuned and consequently the software is spruced up. You would expect that the WMF would use Betawiki for the localisation of their message. They do not. When you look at their translation request you will find only 30 languages, you have to work an Meta, so as a translator you will not find help that explains what the message is about.

I think it silly that the WMF thinks that a Wiki is the universal tool appropriate to translate text. These messages are to be used in software; these wiki pages have to be converted... For me it is demonstrates that the WMF does not have a strategy on how to deal with multiple languages. It is also gives the impression that all the "other" languages and their communities are not considered to be relevant.

Friday, October 17, 2008

Science of the past is still science, is still relevant

Wired has a most interesting article "Forgotten Experiment May Explain Origins of Life". It is worth a read, please do. 

For me the most interesting part of the story is that this science still exists because its material has been handed down after the death by the original scientist to one of his former graduate students. Using modern tools on the original material provided information that was not available at the time. 

A lot of scientific work has been done in the past. Good science is repeatable, but as this example shows, the results are dependent on the tooling available. As the results of experiments were still available it was cheap and easy to analyse again and it led to new insights.

This is a lucky break that resulted from the understanding of competent scientists. It is a lucky break because much scientific fact is no longer available to us. Many papers were published and are not really available because they have not been digitised. Some material may still be available on diskettes but, who can find this material in the Internet age?

Scientific endeavour has been compared to "standing on the shoulders of giants". The works of these giants often exist only in a paper trail and in order to have their work available, it needs to be digitised. I am convinced that many results as profound as the work of Stanley Miller is there for the finding.

Thursday, October 16, 2008

Obstacles to participating in WMF projects

Sue wrote in her "Report to the board August 2008" about an initiative about an article on Meta called "New contributor objections".  It is a great initiative and it took me little time to find that this article addresses the objections for people who have a problem contributing to the BIG Wikipedias. I have added some big hurdles that people have to overcome who might be active on the smaller projects:

  • People are not aware of the existence of Wikipedia / the existence of a Wikipedia in their language
  • An insufficient localisation of the software for their language
  • A Commons that is essentially English only
Most people that are engrossed in their Wikipedia forget that other Wikipedias exist. It makes sense because the creation of an encyclopaedia is an immense task and it takes a lot of dedication. The problem I find is that the WMF activities make the biggest projects bigger while hardly any attention is given to the smaller projects.

When you want to extend the reach of what we do, you have to make sure that people know about us. You have to reach out to where we are weak. The key focus should be on the people that can strenghten the projects that we already have.

When you look at localisation, you will find that Betawiki is a best effort project where volunteers localise MediaWiki. But it is already hard to maintain the existing localisation. With more and more programmers working on MediaWiki, it will be become increasingly difficult to maintain the support for all our languages.

While Commons is a great project, it is hardly useable for people who do not read English. Consequently Commons is not used as a resource by many projects to upload the images that reflect their world. This means that Commons is biased to a western point of view.

What I would love to learn from the WMF is how they want to address these issues because these issues is what prevents us from being what we aim to be.

Wednesday, October 15, 2008

Blog action day: Poverty

Today, October 15th is "blog action day". The aim is to talk about poverty and raise awareness, initiate action and shake the web.

Yesterday, I read on my Google reader feed that UNESCO promotes a publication called "Why languages matter". This publication gives a great concise run-down, with plenty of examples, the power of approaching people in their own language. The subtitle of the publication is: "Meeting Millennium Development Goals through local languages".

When you read this document, you get the feeling that local languages provide an essential tool in the fight against poverty, hunger, lack of education, child mortality, gender equality ... The text is full of success stories. It clearly states the obvious message that when you state your message in the language that people understand, you may be heard.

The Wikimedia Foundation provides the opportunity to write in many languages. Regularly projects in new languages are approved. As the support for the mother tongue of people provides such clear benefits, we can be proud with our continued support for so many languages. The trick is how to reach out to all the people for whom a Wikipedia is not yet available in their language.

An encyclopaedia provides a lot of basic information. When this basic information is available in the languages we support, it is not unlikely that it is the first encyclopaedic resource for many languages. When you look realistically at our resources, you will find that most Wikipedias are just not of a quality where they can make a difference.

Our challenge is to overcome the problems that exist. Our challenge is to find the resources so that we can enable language communities to create their content.

Monday, October 13, 2008


ASCII or the "American Standard Code for Information Interchange" is a limited set of characters and it is used for all kinds of applications. Yesterday I talked about support for multiple languages on the Internet and mentioned that .org domains can be named in multiple scripts.

I was told this was pretty useless because e-mail addresses are one of the uses of the domain name and they are firmly ASCII. Pretty useless because e-mail applications do not allow for e-mail addresses in anything but ASCII. I just checked and my e-mail application does indeed not allow for it.

So my question is, what e-mail applications DO allow for e-mail addresses in multiple scripts and, is it only a problem of the client applications or are there also problems in the infrastructure of the Internet...

Thursday, October 02, 2008

On the monthly Betawiki info

After the release of MediaWiki 1.13 many of the most used messages were altered. The effect of this is obvious in the monthly Betawiki statistics. Last month 112 languages supported 98% of these messages, this month there are 92. This means that we have had people fixing the FUZZIED messages for 92 languages or 28,75 of the languages we support in Betawiki. When you look at the list of Wikipedias, you will find that number 92 has 7488 articles...

The localisation of the extensions is doing well, it continues to improve both for the extensions used by the WMF and for all the other extensions. The numbers for these are small, but it is heartening to see them improve.

Personally I am happy when I was told that the languages that were added recently are doing fine. These are the languages with projects in the Incubator or who have their project started in the last year.

NB Today the Egyptian Arab Wikipedia has been waiting for 77 days for their project to be created. I have asked developers, I have asked Sue and Erik. If there is one thing where the language policy of the Wikimedia Foundation breaks down, it is in a timely implementation of its end result. Given that this is an official policy, there is no excuse.