Sunday, March 30, 2008


Yesterday I brought a friend of mine to the railway station. I brought him by bike as he had come on his tricycle. His bike is a marvel of engineering and as it is a folding bike, he is allowed to bring his bike on the train and this enabled him to come to the party we were at.

In 2004 he came with a rollator, another important gizmo that allows him to extend his mobility. At the time we worked on a Wikipedia article on that subject. I had a look at the article today and I find that it has little to do with the "walking frame" that is supposed to be the equivalent on the English Wikipedia.

When my friend, we were together in secondary school, got in the train, I had asked him to make a photo of him on his bike and one of his bike folded. As these things matter, I asked him to write an article on Wikipedia or send me the photos so that I can write an article about it.

My friend used to ride a racing bike and he toured long distances... Today I am happy that he has a tricycle and was able to be there.

Summer time

It is that time of year for me where the clock is advanced one hour... summer time. It is a moment when two songs come to mind, summer time and summer time. Both show my age and the age; both are on youtube and both do not have a film.

The comments on the second are lovely; a 13 year old: "classic jazz song. I am 13 and this is my fav song of all time." with a reply: "I´m 76 and I'm glad someone your age can enjoy this wonderful performance".

Anyway, I woke like always and it is early in the day. So it will just be light an hour longer I guess.

Wednesday, March 26, 2008


Dzongkha is a language spoken in Bhutan, India and Nepal. It is the official language of Bhutan. The Dzongkha Wikipedia is with currently 71 articles not really a success story.

For projects of this size, every active person makes a real difference. A new person came along today, Tenzin, and he has indicated that he wants to localise the most relevant messages for Dzongkha on Betawiki. He has created a user on the dz.wikipedia.

The Dzhongka Wikipedia predates the Language committee. So one new person has to make all the difference. I hope that this project will soon have several people working on their Wikipedia and with the basic localisation done a growing group of people finding use of their Wikipedia.

Ahead of the pack

The Wikimedia Foundation has a beautiful statistics presentation of the growth versus the size of its Wikipedias. Erik Zachte was inspired by the beautiful statistics by Hans Rosling's Gapcasts. This beautiful presentation ia available in two ways:
  • an 8Mb Flash presentation
  • a javascript-only 'Wikimedia Projects Growth Animated'
The second version requires HTML-5 this is a brand new work in progress and consequently it is only available in the latest browsers. Firefox 3 is in its beta 4 incarnation currently the only browser able to run this.

With the browser wars hotting up, it is great that Firefox can be found ahead of the pack where it really counts; ahead in supporting standards. Standards that are the same for everyone. I am also really happy that the WMF actively in the forefront of stimulating standards.

Saturday, March 22, 2008

How many languages for the WMF and MediaWiki

In a recent conversation the question was asked, how many languages will we have in the Wikimedia Foundation. A guess can be made how many languages will have an project by the end of the year and how many languages will have a presence on the Incubator.

The only statistics available on the number of languages in time are the ones at Betawiki. As the localisations for most WMF projects have been imported, it gives a fair indication to the WMF growth. The list of Wikipedias informs us of 255 Wikipedias. Many of these have been closed because of inactivity. This has its effect in Betawiki as well; for many of the 275 anguages there is no localisation activity either. However, for some of the requested languages like Hiligaynon there is no localisation activity yet.

Jimmy said at some stage that a Wikipedia starts to become relevant when it has more then 1000 articles. There are currently 152 such projects. The minimum number of articles needed for a project to be viable is for me at 500, we have 216 projects that qualify.

In Februari over 150 languages contributed to Betawiki; my prediction is that we will have some 280 wikipedias by the end of the year and 185 of these will have more then 500 articles. The Incubator may have some 25 new languages. For Betawiki I expect some 320 languages with 210 being actively maintained.

I expect that there is a relation between the quality of the localisation and the relative health of a project. I hope we will have by the end of the year a better understanding on this relation

Friday, March 21, 2008

Negative growth in the Betawiki statistics

This weeks extension of the week is Central Auth. The functionality of Central Auth is better known as "Single Logon". As the first experiments with Central Auth are taking place, this extension has been made an "extension used by the Wikimedia Foundation". This means that more messages are part of this group in in Betawiki.

Regurlarly messages appear, change or get deprecated. This has an impact on the statistics of the localisation. With the appearance of Central Auth, several languages do not make the grade any more to be counted in the "group statistics in time".

Given that Single Logon is applicable to ALL WMF projects, it does not make sense at all to localise locally. By localising in Betawiki, the messages will become available on all wikis and the statistics will once more show the improvement in the support for our readers and editors.

Please help your language and localise in Betawiki.

Thursday, March 20, 2008

What languages ?

Supporting relases; this is what we do it for

Don Osborn wrote on the Afrophone mailing list about the use of MediaWiki by ICANN for its Internationalized Domain Names evaluation. In the IDN evaluation, people are asked to test their application; eg browsers or mail clients with domain names that contain characters other then the Latin script.

Don wrote because among two new languages Amharic was added. Don wrote that they use a wiki for this and indeed they use MediaWiki.. :) . I had a look at the version they are using, they are on 1.10 while the latest release is 1.11 and release candidate 1.12.0rc1 is out.

For Amharic this difference is big. In Betawiki we have had a lot of activity for Amharic; 89.79% of all the MediaWiki messages and 4.69% of the messages used in the extensions used by the WMF are now localised. The most used messages were only finished on the 17th of Januari..

Other languages that ICANN does not support for its script in their IDN evaluation yet are Georgian and Bengali. Both are languages with a lot of recent localisations.

The application by ICANN of MediaWiki for its IDN evaluation demonstrates that our localisation effort is relevant outside the WMF as well. We are happy that with the 1.12 release there will be a continued localisation effort. The details on how later localisations will be made available are not known to me yet, but I have been told that there will not be an issue.

Tuesday, March 18, 2008

Tatar Wikipedia; request for deletion

The Tatar Wikipedia is with 4053 of a respectable size. Getting a request for deletion for a project of this size seems outrageous. So there must be more to it.

The proposer writes that he has been implementing the use of the ISO-8859-9 code page for the Tatar language. According to the proposer, there are 150 articles that are "semi useful" and the rest are bot generated articles.

The reason for the proposal is that Tatar is not written in the Latin script. The Tatars live mainly within the borders of what used to be the USSR and in Russia it is required to write Tatar in Cyrillic.

There are several issues;
  • Wikipedia is to bring information to people. When people do not read the Latin script in Tatar, what is the point?
  • There has been a battle raging with admins blocking each other
  • There is no existing precedent that deals with these issues
  • The language committee has explicitly recused itself from all projects that existed before its start
As this is the type of Wikidrama that people do not know about, the type of Wikidrama that does not get press coverage I would like to learn how we as an organisation, a community are going to deal with this.

Monday, March 17, 2008

Linguistlist update

On the mailing list of the Linguistlist I read the following....

Dear readers,

This letter is a progress report on LINGUIST's Wikipedia Update Project. You may remember that Fund Drive 2007 donors voted to earmark some funds for a LINGUIST team to look into filling some of the gaps in linguistics and language articles that appear in Wikipedia. Our goal was to make Wikipedia an even better resource for our field and the general public by asking the linguistics community itself to help improve the reliability of information in the "encyclopedia that anyone can edit." Our mandate was to get the word out about which articles needed revision or completion and to issue periodic calls for volunteers to edit such entries and/or add new entries. Our role was to serve as a hub for facilitating such updates and to report back to you on the results of our efforts.

We began our project in earnest in the summer of 2007. LINGUIST's Wikipedia team logged itself in as a user Linguistlist and merged its project with the existing WikiProject Linguistics page. We found that Wikipedia administrators and editors had already identified some 700 articles as "linguistic stubs," flagged as being incomplete or lacking in some respect-needing fuller content, having content that was "contested" by some, lacking proper references or citations, or having missing or empty links. We put these stubs on a "watchlist" to keep track of what got updated and when. From this large list of stubs, we then
created smaller lists by linguistic subfield-these included biographic entries of linguists and specific articles in phonetics, phonology, morphology, syntax, and sociolinguistics.

We then issued calls for volunteers, one per month for a total of six through the end of December 2007. Each call included a report on subscribers who had responded to our calls. Some reported having edited or created new articles while others promised to edit specific entries in the future. Our respondents came from all over, as close to home as Chicago and New York and as far away as Serbia and the West Indies. All LINGUIST activities on the update project, including the full text of each call for action, are logged on our user page.

We are pleased to report the following activities concerning linguistics articles:
  • 13 articles were completed or newly created by subscribers
  • 12 articles had references added
  • 33 links were added to existing articles
  • 28 articles were de-stubbed by virtue of being judged self-sufficient
The Wikipedia team was able to devote six months to the project, given a timeframe that was necessarily limited. Because we are also involved in other LINGUIST projects, we are no longer recruiting participants for the update project.

The Wikipedia team is extremely grateful to all of you for your collective interest in the project and especially to those of you who made specific contributions to Wikipedia entries. Overall, we are pleased that we were able to perform a public service by calling attention to the quality of linguistics articles in Wikipedia and inviting you to join in as active Wikipedians for the greater good of our discipline.

In closing, we thank you, as always, for being there for us: we depended crucially on you for this particular project and could not have done it without you! We hope we were able to a small difference with this project. Please remember that when you donate to this year's Fund Drive, you are contributing to similar projects aimed at providing you with the broadest range of resources in the field of linguistics.

With best wishes,

The Wikipedia Update Team
Roxana Ma Newman
Hannah Morales
Luiza Newlin Lukowicz

Saturday, March 15, 2008

Betawiki stats for the 25 biggest Wikipedias

Siebrand made a list of the 25 biggest Wikipedias and their localisation. It is an interesting list and, as with all such statistics it shows the moment. The Polish for instance are currently working hard while there is not much activity for Spanish.

What I find amazing is that Esperanto and Volap√ľk do better then Turkish. Danish is doing fine for the MediaWiki messages but messages for extensions need a lot of work.

Being one of the 25 biggest, I think that there should be a sense of "noblesse oblige". All the work done to create the content needed to be a top 25 Wikipedia deserves that the user experience is perfect. The localisation is part of this.

Tuesday, March 11, 2008

Another obvious reason to support SignWriting

The notion of SignWriting is one that I have take a fancy to. For those who do not know, SignWriting is the only script capable of writing sign languages. There are many sign languages.. according to Ethnologue there are 121 sign languages.

The arguments for SignWriting are
  • when written, a language a culture is able to write its own history
  • it stimulates the native preservation of both the language and the culture
  • children who learn to write their native language first do better academically
At the conference in Bamberg I learned that many translators of a sign language learn the language in university. They learn it like you learn a foreign language. The obvious reason why SignWriting can be so relevant is that it is so much easier to learn a language when it has a written component.

This has been staring me in the face ... only now I see it :)

Just a comparison

There are two projects that have their origin in the Wikimedia Foundation. Both are associated by people who have a claim to fame. Danny Wool used to work for the WMF and he seriously contributed to projects. Gerard Meijssen never worked for the WMF but he seriously contributed to projects. Danny's project is called Veropedia, Gerard's project is called OmegaWiki, one is a .com the other a .org. The first is based on Wikipedia the other has its origin on Wiktionary.

Danny used to be involved in fund raising for the WMF. Now his vitriolic contributions have a direct negative effect on the ability of the WMF both to function and to raise money.. if only because of all the time wasted to react to his allegations but also because people may find the reputation of the WMF tarnished.

Gerard is raising funds for his projects but these funds have a direct positive effect for the Wikimedia Foundation and its projects. Funding for localising MediaWiki and for improving the development infrastructure of MediaWiki. Funding for localising Commons.

Typically people are not interested in the positive approach; when Gerard urges people to localise at Betawiki because it has so much more impact, he gets told that he is not interested in the WMF projects. When people read Danny's negative comments, they react like "where there is smoke there must be fire".

Both organisations are outside of the WMF. One gets media attention, the other does not. One demostrably hurts the functioning of the WMF, the other does not.

I wish that people look at the bottom line. What is achieved by the actions of people. When the only justification is a right to the freedom of speech, then by all means let Danny have his say. But as what he has to say is about things that may have happened over a year ago, there is a limit to the credibility of the platform he has. As he does not want to see the positive things that are happening now, he effectively makes himself part of the problem and not a part of the solution.

The WMF has a budget outlining what it wants/needs to spend. The ability to get funding from the public is limited.

When a venture fund wants to donate serious money to the WMF, it is important to know if it comes with strings attached. The WMF currently has people with a serious ability to understand contracts and lawyerese. This is a major departure from the past. There are seriously rich people and organisations who appreciate WMF for what it is and what it aims to do. Many of them can easily help the WMF with its finances. The WMF, as an organisation, has its house in order and increasingly has the ability to make use of the rich network that has been build by Jimmy Wales.

Making use of the network Jimmy build can only be done with a positive attitude and appreciation of his accomplishments. It should be obvious but let me spell it out; when connecting to the people and organisations that know and appreciate Jimmy, dissing Jimmy is for the WMF effectively the same as shooting ourselves in the foot.


Wednesday, March 05, 2008

Mediawiki support for the big languages

MediaWiki is used for projects in many languages. Wikipedia is the project best known for its many languages. The WMF Sitematrix gives the distribution of the languages and its projects. When you consider the usability of the MediaWiki software, you would expect that there is a correlation with the size of the project.

When you consider the top 10 Wikipedias, nine need localisation and the localisation for German, French, Swedish, Japanese, Dutch and Portuguese are fine. Italian and Polish need work done on the extenstions as used by the WMF while Spanish needs work done on both the MediaWiki messages and the extensions.

I am really happy how the quality of the Japanese localisation has improved. It is well covered for usage in the Wikimedia Foundation and we are awaiting the creation of a Wikiversity in Japanese.

Italians, Polish and particularly Spanish are the odd ones out. I am amazed that the communities for these languages have not picked up the need for proper localisation. The Japanese example has shown that it takes only a few good men to make a big difference.

When you look at other languages like Hindi that have so many speakers, you really wonder why it is doing so poorly certainly when you compare it with other Indian languages like Bengali, Marathi, Telugu, Malayam, Bengali and Tamil.

When you are part of a language community, check out the statistics for your language. Help us at Betawiki to support your language; there is plenty of room for improvement in most languages.

Tuesday, March 04, 2008

Jimmy Wales is not a saint

Jimmy Wales is not a saint. He does not need to be a saint, his role is not to be a saint. He is the one person who is seen and known as the public face of Wikipedia. Jimmy has given many effective presentations on the subject, for instance at TED. Without Jimmy there would not have been Wikipedia. I have met Jimmy on many occasions and I can testify that he fulfils his role really well.

There are some people who have a beef with Jimmy, who have a beef with Wikipedia and who are invariably negative about this. I find this behaviour nauseating. To me it is clear that these people have put themselves firmly outside of what is acceptable behaviour on Wiki.

Danny Wool was the personal assistant of Jimmy and he was for some time an employee for the Wikimedia Foundation. He is now spouting gossip of what happened during this time. For me, the one person that I find despicable is not Jimmy but Danny. His moral values that allow him to be this "tell all" I find beyond contempt; it is vindictive and it is very POV.

The fine job Jimmy is performing for the Wikimedia Foundation is one of marketing. It may come as a surprise to some but the marketing of Wikipedia is still essential. In all the studies done on the Wikipedias there has not been a comparative study on the shared values of Wikipedia, particularly the NPOV and it is my belief that without communications on many levels the POV differences between the projects will only grow more rapidly

It is imho absolutely essential that the evangelisation for Wikipedia continues and in doing this Jimmy has proven himself a boon.

I have no doubt that people like Danny will continue their selfish destructive behaviour. I am sad that they do this without apparent consideration of the consequences of their actions. As they do not offer any credible alternative either, I find their utterings reprehensible.

As to publications like the "Valley wag", as long as they know how to spell Wikipedia or Jimmy Wales they offer free publicity. This is if you believe in "any publicity is good publicity".


Sunday, March 02, 2008

On the monthly Betawiki info

In February there have been contributions to Betawiki for 156 languages. Again, this is REALLY good news when you look where we are coming from. The quality of the MediaWiki localisation has improved a lot. Ten more languages support the most relevant messages, four the core messages, seven more support the extensions used by the WMF and 2 more languages support more then 65% of all the extension messages.

I am happy to agree that there is still a long way ahead of us. One big milestone we want to reach is when two hundred languages have posted localisations in one month. This would not make things perfect, it would make things better.

What is important to realise that there is nothing static in the numbers; MediaWiki development is a continuous process, new messages make their appearance, messages change. It takes continuous effort to maintain the quality of MediaWiki.

Dump of the English language Wikipedia

After a long wait, a really long wait, it was announced that the dump of the English language Wikipedia has finally finished successfully. It took 58 days to complete and yielded a 17 gigabyte compressed file that will become some 2 terabyte when unpacked.

This news has led to a flurry of activities; Erik Zachte now has the opportunity to run his fantastic statistics again, the dbpedia people now have the opportunity to run their software on something new. Many studies are under way to analyse Wikipedia that have asked where they PLEASE can find the latest data ....

It is important that a dump has successfully finished. It is equally important that this process proves to be repeatable. Many people have said that Wikipedia is not really open content when you cannot get its data. It is great to see that all the effort to create a dump has finally succeeded and proves that Wikipedia IS open content.