Monday, December 31, 2007

Happy New Year

Some of my friends are already in the New Year, some like me are celebrating the old and awaiting the New Year. The SignWriting you see is the Czech sign language.

One of my hopes is for SignWriting to do well in the new year. I hope that many people will learn to read and to write their sign language. I wish us, that the best is in front of us... Happy New Year :)

Linguistic tolerance

The Wikimedia Foundation has a policy about what linguistic entities it accepts and what linguistic entities it does not accept. When a linguistic entity is recognised as such and has an ISO-639-3 code, it is considered a language. As a rule, the language committee gives a conditional approval to requests for languages that have a code.

There are problems with this policy. As I wrote elsewhere, a language may be dead. A dead language is a language that is no longer actively used; there has been no new terminology, nobody is using it actively, good examples are Hittite (hit) or Akkadian (akk). In my opinion they can have a Wikisource but a Wikipedia is problematic because you cannot write in these languages for a modern public without changing the language into something completely different. It does not even make sense to have a MediaWiki localisation for such languages.

There are more problems, what to do with languages where from within the culture it is prohibited to write the language down. What to do with languages where there are few people speaking a language. What to do with languages where few people are truly literate for their language. What to do with constructed languages?

The biggest issue with all these issues is one of competency. Who is competent to judge that a language is truly dead. At what level are there sufficient people in a community to support a language for a WMF project. How do we judge the quality of our projects and as importantly who is to judge? Also does the WMF have a responsibility for less resourced languages.

Brianna blogged about the Volapük wikipedia. For her and for many others, Volapük became an issue because they had the audacity to create enough articles to be noticed. People like Brianna feel offended because it upsets the notion of what Wikipedia is. Brianna introduces the notion of a "language ego" but I am sure she will agree that every non dead language deserves its place under the sun and only the people that communicate in a language are the ones that can realise a WMF project. The good news is there is plenty of sun and it is not expensive to have another language.

When people equate artificial languages with languages without merit, they have a problem. Many languages have started out in this way. One of the more interesting examples is Italian, it was standardised by Dante, used as a lingua franca until the unification of Italy when it became an official language. Another example is Sardinian where a constructed merged linguistic entity that has not been recognised by the ISO-639-3 registrar, is recognised in Italian law.

When you compare Volapük with Klingon, the biggest difference is that Volapük allows you to express all modern subjects.

For me the issue of the Volapük Wikipedia is a non-issue. I know three people that speak Volapük, not all of them are involved in this project. Given the competency of the people that are, there are no issues in getting the information in Wikipedia right. Wikipedia has always allowed for a project to evolve and insisted on the independence of communities. I am sure that in the end both Wikipedia and the Volapük Wikipedia will emerge stronger from all this.


Sranang Tongo

Sranan Tongo is a creole language spoken in Suriname. A request was made for a Wikipedia in this language and as it does have an ISO-639-3 code (srn), granting a conditional approval was a formality.

What is becoming a comfortable routine is requesting the great people at BetaWiki to support another language. People that know the language can now start with the most used messages. I wish the proposers for this new project well, and I hope to hear from them when they consider to be ready for the big time :)


Saturday, December 29, 2007

Farmer; tactical technology

Farmer is an extension for MediaWiki. Farmer helps with the maintenance of Wiki farms and enables changes to the configuration from a Web interface. The reason why I blog about it is because it is one of those little gems that may make a difference in making MediaWiki more popular.

MediaWiki can be found in an NGO in a box. This is an initiative of Tactical Tech, an organisation that aims to "demystify technology for non profits". The big selling points for MediaWiki are the many people that know MediaWiki through Wikipedia and the many languages that are supported by it.

Recently farmer was welcomed as an extension in BetaWiki, and the developers active at BetaWiki have been working hard at improving the messaging of Farmer. Farmer will make the maintenance of MediaWiki less challenging. With its messages translated in more and more languages, MediaWiki becomes more and more tactical technology.


Friday, December 28, 2007

Firefox 3 beta

I have been bold, I have installed Firefox 3 beta. I cannot say it is all good but it is for many things much better. I would not have installed it without Firefox supporting Chatzilla.. Chatzilla is mandatory for me.

What I like:
  • When I click on an Arab text it will cleanly select the whole word.
  • URL's are shown much more cleanly
  • Firefox is still a great program, it seems more stable and responsive
What I do not like
  • I do not like the presentation of the browsing history
  • There is a bug that has a tab go to the beginning of the page

Monday, December 24, 2007

Localisation fast and furious

When things reach a certain maturity, visible things can happen really quickly. Another Christmas present is this presentation of the localization of MediaWiki. It shows the quality of the localisation of the MediaWiki software.

You will see a lot of red, not good. This is a typical situation of the "cup being half full" as the list is includes more languages. With a new visualisation you can not compare. So you do not notice the many recently added extensions. You do not notice the many recently added languages. You do not notice that there would have been more red for many languages. :)

If you want to help MediaWiki, help us improve the MediaWiki localisation for your language at BetaWiki..


Friday, December 21, 2007

BetaWiki exports .po files

BetaWiki has given us a splendid Christmas gift. Nikerabbit, Hashar and Siebrand have developed an export tool for the MediaWiki system messages into the .po format. This format is the standard used for many open source applications.

The most important thing about this format is that there are many tools that support it for off line translation. Many translators will not work on-line. For many languages we do not need to accommodate off line translation when there are sufficient people willing to maintain the localisation. However, when you look at the statistics, you will find that there are many languages not supported or poorly supported in MediaWiki. Hindi is a good example. The Wikipedia is well localised but for Hindi only 3.99% of the system messages is translated. Hindi is spoken by 180.000.000 people...

For Hindi .po files will not be the solution. Collaboration between Indian Wikimedians and the BetaWiki administrators will be a better solution. There are also languages like Neapolitan where it helps to localise and then have the localisation proof read. When the number of collaborators for a language is small, it is typically easy and safe to work off line. You do not have to wait for loading and saving, you can combine it with a translation memory and make it efficient.

Nikerabbit has now created a .po importer. What he is looking for are translations to test this new functionality...


Thursday, December 20, 2007

A great Christmas card

The question: what language and, what does it say..

Happy holidays,

Wednesday, December 19, 2007

WOSI, a really cool Open Source/Standards project

WOSI is a project of a Dutch school, the HVA, working on a software environment for "woningcorporaties". A woningcorporatie is an organisation that is involved in public housing. As an organisation they are genuinely capital intensive, their IT requirements are complex and evolving.

The objective of the school is to provide an environment for their students that will give them a real feel for what it is like to work in the ICT business and teach them Open Source and Open Standards. Students that are part of this project will experience that a project does not start from scratch, there is always something to build upon, there are always conflicting requirements, there is always the need to ensure interoperability because the use of Open Standards is a precondition.

The WOSI project is in its second year and, it is growing in size. Students from other disciplines are getting involved as there is overlap with other specialties like communications and marketing. More woningbouwcorporaties are interested in the project as well as other schools. The great thing is that as Open Source and Open Standards are key to this curriculum, professionals will be released to the job market that know how to apply these notions in real world scenarios. This is likely to prove the biggest boon to this really cool project.


Tuesday, December 18, 2007

Inter operability is important

Wikipedia and particularly the English language Wikipedia is a rich resource of information. The amount of information in it is staggering. Much of the information is duplicated in other Wikipedias and other websites. This is great. Because with more applications for the same data, more eye balls will find what is in error.

I am subscribed to the DBpedia mailing list and today I read about errors in Wikipedia that had to do with Wigan and Manchester City. Errors were found and the gentleman wrote that he can and will make the necessary updates. His question is when will the DBpedia reflect the changes.

When the data of Wikipedia is analysed with tools, and when the results are found to be of value, it adds relevance to what enables this collaboration. It typically relies on the availability of dumps. When the data is analysed, a new work emerges. When it has a completely different format, it is possible to mesh it with other data sources. This in turn will help establish the validity of the Wikipedia data and will allow for the extension of the data.

When multiple data sources are meshed, the issue of copyright and license raise their ugly head. You can create static and dynamic meshes. In a dynamic mesh you can build the mesh depending on what the person has access to. In a static mesh you can only include the data that is still available to the least privileged person who will get access to the data.

The consequence is that many people, organisations will mesh sources, manipulate data, publish and not indicate what all the sources are. They will not do that because they do not want to be bound by all kinds of licenses and because they do not want to be hassled.

This DBpedia example shows that the presentation of facts is important. It demonstrates that interoperability will result in a better Wikipedia. It is important for Wikipedia to be as open and engaging as it can be. Frankly, when people analyse our data in a similar way to DBpedia, it is a new work it should not be considered derivative. Best practice is to publish sources and this, more then the viral nature of a license like the GFDL or CC-by-sa, will drive collaboration and give Wikipedia more relevance.


Sunday, December 16, 2007

Localisation of MediaWiki

When you wonder in what languages MediaWiki has been localised, and to what extend the localisation is usable, BetaWiki has some great statistics.

The Localisation statics show the languages that have a central localisation and the percentage of the messages that have been done. It clearly shows that the MediaWiki localisation leaves a lot to be desired; for some 144 languages less then half of the messages have been localised. At this moment there are 235 languages known to MediaWiki. When you compare this to the 253 language that have a Wikipedia and add the languages that are starting in the Incubator, you get a clear picture of how much effort is needed to better support the readers of MediaWiki projects.

When you look at the statistics, the glass is half full, and it is filling. On average five languages are introduced every month and more then 500 messages are translated every day. The languages that are in the Incubator are doing well Seeltersk for instance has done an astonishing 99,3%.

One of the latest innovations in the BetaWiki are the core top 500 messages, they contain the most important messages and with these messages translated, MediaWiki is usable for a language. BetaWiki has a dedicated team of people that make MediaWiki and as a consequence MediaWiki projects usable for many people. With your help, we can improve the localisation even further. One message at a time will slowly but surely provide proper support for all the languages MediaWiki supports.


Friday, December 14, 2007

It is perfect after all

In my latest post I wrote about the Oostvaardersplassen, today I received a mail telling me that fish will in future be able to swim into and out of the Oostvaardersplassen.

This makes me perfectly happy. Now I know how the water flows from the Oostvaardersplassen into the "Wilgenbos" and I know that there will be a lot of work that needs doing. But when Staatsbosbeheer, as it does, states that fish will be able to swim in and out .. really great news.


Wednesday, December 12, 2007

Vindication of a kind

One of the Wikipedia articles I am proud of is the Dutch article about the Oostvaardersplassen. The Oostvaardersplassen are close to where I live and I think it is one of the best examples that nature is something that not only evolves, but also can be engineered. I have followed its development with considerable interest and my favourite point of view has been that the water management has been detrimental to the natural diversity.

I visited the Oostvaardersplassen this weekend, and I learned that a small dyke will be removed leading to a more natural distribution of water and a more dynamic water level. This will have a huge impact on the fish stock; the current population of mainly mature carps will make room for many more smaller fish. This will allow many small herons and other fish eaters finding their niche.

The one remaining question for me is if fish will be able to freely migrate in and out of the nature reserve. It would be grand if this is the case.. As only one dyke is mentioned, I do expect it to be great but not "perfect".

Monday, December 10, 2007


The word of the day for OmegaWiki should be burglary. It is not as another word had already been entered. I was sleeping and woke up because I heard the breaking of glass. I looked out of my window and saw someone breaking and entering. I called the police, they arrived quickly..

After all the excitement, I find it hard to go back to sleep.. Anyway, this is real life drama. Not dramatic, but it keeps me from sleeping.

NB the word of the day is íshokkí.


Wednesday, December 05, 2007

Shameless plug ...

A friend of mine send me what she called a "shameless plug". I agree with her, there is no shame in announcing that wikiHow is supporting the Dutch language.

As I absolutely approve of great projects doing great things, I am happy to shamelessly plug wikiHow and I wish it and all its language versions great editors and a great audience.


Monday, November 26, 2007

Wiktionary upset

When I look at the Wiktionary website at the moment, it does not show yet that the French language Wiktionary has more articles then the English language Wiktionary. I think it is absolutely wonderful because if anything it shows that you should not take things for granted in Wikis.

Not taking things for granted is a healthy attitude. There is an inherent bias against the French language Wiktionary in the Alexa numbers. However, I am impressed by the numbers quoted.

All the bigger Wiktionary projects have used bots to build up their content. I can imagine that a healthy rivalry will make the numbers go even higher, this would benefit the users of Wiktionary because I trust the Wiktionary communities to watch the quality of the content :)


Wednesday, November 21, 2007

Pride in a language

Many people feel strongly about their culture, their language. What I find special are the people that go the extra mile to promote their language. It is therefore that I am grateful when I notice languages like Spanish, Georgian, Breton and now Eastern Yiddish having a champion that make a difference.

It is especially interesting to see how with an ever increasing amount of terminology, the information becomes rich. Rich both for the people who are interesting in learning the language and also for the people that want to learn other languages starting from these languages.

I am grateful when people find in OmegaWiki a tool that helps to document and service their language. It is an imperfect tool but its redeeming quality is that it is getting better as we go along.


Tuesday, November 20, 2007

Thank you Mycom

Today started horribly; my Skype was not working and my headset was to blame. I could not call using skype so I had to use my plain old telephone to call abroad :( I then tested my system and it was indicated that my microphone was not working. I went to my computer supplier and, I was told that my headset came with a two year warranty !!

I got home and it was still not working. So I played with my configuration and I could not get it to work. So I went again to my computer supplier Mycom and we fiddled with all kinds of values. For whatever reason we got it to work but we were not able to pinpoint what the issue was. I think that it may have to do with an upgrade of the Skype software that I did the other day but I am not sure. In the end, the friendly service I got was the highlight of the day :)


Friday, November 16, 2007

69 page document in SignWriting

I received an e-mail that mentions a 69 page document is SignWriting. At this moment a document in SignWriting is still considered to be exceptionally large. It does prove that people are able to write documents in their sign language that are this big. As more texts are written it will be not be special much longer.

The reason for me to mention it is that it indicates that SignWriting is stepping over a threshold. It is enabling American Sign Language to be a literary language. This is another step closer to the realisation of a Wikipedia for ASL.



Tuvin is a language spoken in Russia, China and Mongolia. There is a wikipedia project on the Incubator that is dormant. What is exciting is that there is activity to localise MediaWiki in the Tuvin language.

On the Tyawiki there is a MediaWiki installation that aims to create a repository about Tyva. Withthe localisation of Tuvin in MediaWiki, they will be able to do a much better job.

I welcome this first localisation effort I am aware of that is driven from outside the Wikimedia Foundation. It demonstrates how MediaWiki is getting recognition for the outstanding software it is.


Thursday, November 15, 2007

Flags for languages

I got into a discussion about what flag should be shown for a particular language. There were two groups of people one side was in favour of a historical faction and the other was in favour of another historical faction. The argument was quite heated. At some stage I asked what the flag for English should be ...

Languages are spoken on both sides of a border. Languages are spoken by people who do not recognise countries or flags. Languages are separate from nationhood. Consequently it is in my honest opinion wrong to associate languages with flags. There are no obvious symbols for languages and for many languages they would share the same flag. OmegaWiki is not likely to ever associate a language with a flag.



On Alexa, OmegaWiki has for the first time gone through the 100.000 traffic rank. This number gives in Alexa better graphics so it is really welcome news. The daily rank of 99.813 is extraordinarily compared with our three monthly rank of 485.925. It however indicates that something is working in our favour or maybe we are doing something right.

When you compare the OmegaWiki statistics with the Wiktionary stats they are doing really great. With a daily rank of 1.601 it is time for the Wikimedia Foundation to demonstrate that they have a valuable resource in Wiktionary :)


Tuesday, November 13, 2007

Political science

With some dismay I read this article on the BBC-website. It is about science and the cost of science. It is said that there should be better planning so that ahead of time it is clear how much a given project will cost. This should mean that "the wider scientific community and industry should contribute to the decision-making process".

Effectively the right honourable gentlemen is looking for a reduction in the cost of science by increasing the involvement of even more people to select and manage scientific projects. Effectively this will lead to less money for doing science. Effectively it will not be the scientists who select the projects that are considered to be of scientific value.

The dismay I feel is because it is likely to lead to more yet "politically correct" science. Science that has more to do with what the expedient results should be and not with scientifically relevance. When you consider the huge amounts of administrative and other overhead it is a wonder that scientific research is still practised.


Sunday, November 11, 2007

When is a project alive

A month ago I blogged about dbpedia. Dbpedia is very much alive. I have had a look at it and like what they do. I subscribed to their mailing list and again, dbpedia is very much an active project. As always there are things I do not like; their ideas on copyright and licenses are defensive and as a consequence overly restrictive. It prevents cooperation in stead of fostering cooperation.

Today, an anonymous person replied to this blog entry. The suggestion is made to cooperate with the SWAD Europe group. They have a website, a blog but it all stopped in 2004. So I am wondering about all these projects, all this effort that just stops. Projects that may be valuable and given that people promote it in 2007, may still be alive. For me there is no way of knowing.

I have an idea how I would use semantic data in OmegaWiki. What I am not so sure about is how semantic web applications would use OmegaWiki data. In essence OmegaWiki is multi-lingual and exporting it in anything but a machine readable version only, would strip what I think is valuable in OmegaWiki.

Collaborating with for instance a SWAD Europe group makes sense. People can suggest cooperation, it should however be a two way street. Just pointing that there are others does not help me much.


Wednesday, November 07, 2007

Congratulations OLPC

I was happy to read this..

The World Language Documentation blog

It is with pleasure that I inform you that the World language Documentation Centre has a blog. As the WLDC is ambitious in what it wants to achieve, and as many of these objectives will take time, it is great that there is a blog where the board members of the WLDC can publish about they find of relevance.

I hope that the many members of the WLDC board will find the time to blog because this will help you appreciate the amazing qualities that you find in these people. As I have the privilege to be on this board as well, some of the subjects that I have written about in the past will now be covered on the WLDC blog ..

I hope you will find the WLDC blog of interest to follow it in your RSS reader.. :)


Sunday, November 04, 2007


"A speech disorder in which the flow of speech is disrupted by involuntary repetitions and prolongations of sounds, syllables, words or phrases, and involuntary silent pauses or blocks in which the stutterer is unable to produce sounds". This is as I have always understood stuttering or stammering as the Brits have it.

With great surprise I learned that stuttering also occurs in sign languages. I learned this from a mailing list that deals with sign languages and SignWriting. The implications are quite profound. It means that stuttering is not necessarily a speech disorder and consequently when it is not, speech therapy does not work.

This similarity in the problems between signed and spoken languages indicate that the format of communication is incidental. The same mechanisms are at play and therefore one is as good as the other. To me this seems obvious many people rate their own method of communication as superior. The spoken language is superior for when communication with me as I am dumb when it comes to signed languages...


Saturday, November 03, 2007

Who is Frank Thompson, and why include them in Wikipedia

Frank Thompson features on the "List of mayors of Yarra". He is "blue linked" so there must be an article on him right? Clicking on the link gets me Frank Thompson who was a member of the house of Representatives. It is more or less easy to fix and I am sure that someone will.

For me it is interesting to see how Wikipedia deals with what is considered relevant. To me these people are irrelevant, both misters Thompson are no longer in office. But they are deemed to be noteworthy enough to link to where might be an article.

In a similar way there are articles about pop stars who had a single hit in 1962, there are articles about wide receivers that only played one season.. There is a lot of information that is of no importance and that is fine.

What astounds me is that when an article is written in the German and the English Wikipedia about Kotava, a constructed language, it is speedily deleted. When it is then indicated that this language is on route to be recognised in the ISO-639-3 code, the comment is speedy deleted. The article that was deleted was more then a stub, it cited sources and I did not write it.

I would love to understand why a mayor of Yarra, a 1962 pop star or a 1956 wide receiver are "relevant" and a language like Kotava is not.


Tuesday, October 30, 2007

Kotava - another constructed language

Kotava is a constructed language. There is yet not article in the English Wikipedia, there is an article in eight other language.. (myth busting; if it is not in the English Wikipedia, it is not in Wikipedia).

At this moment Kotava is not eligible for a Wikipedia, it will not be enabled for editing in OmegaWiki. It does not have an ISO-639-3 code yet. What is special is that there are clear indications that the process for a code is under way. The code is likely to be "avk".

At OmegaWiki there is a Kotava enthusiast who has started a lot of the preparations for another language. It will be given once the code is official. For a Wikipedia, they may ask for a Kotava Wikipedia. With the ISO process under way, the language committee does not have to do anything until the code is granted.

It is great that the language committee has reserved the right to do nothing..


WCN, the network - an unsung hero

When you are at a conference and the networking just works, you will not hear anyone about it. At the Wikimedia Conferentie Nederland NOBODY mentioned the network; it was just there and it just did what it was supposed to do.



Sunday, October 28, 2007

Google docs - published presentation

When I am not signed on in Google docs, blogger, any Google application and I look at the URL that I used in my blog I get this screen. At the very bottom it indicates that I can have a look at the presentation. It then works for me..


Saturday, October 27, 2007


Today the Wikimedia Conferentie NL was held in Amsterdam. It was a great occasion. It was impossible to do justice to the program; they had three tracks and to chose one presentation over the other was an injustice to what was missed.

I had the privilege to give a presentation, and as I had to do some serious travelling to be there, I considered to what extend I could reduce what I took with me. I decided that with the latest Google application I did not need to bring anything. I could rely on there being a network, Kim brought his Merakis who are still on the old functional software and assuming one functional lap top should not be a problem either.

To prove that it was indeed the Google presentation tool. I selected one of the available backgrounds. It is different. What I need in a presentation is basic. I need to show some texts and some screen dumps. I think there is some need to polish the handling of the screen dumps.

One of the nice things is that the presentation can be made available. So have a look and let me know what you think.


Monday, October 22, 2007


Breton is is a Celtic language spoken by some of the inhabitants of Brittany (Breizh) in France. According to Ethnologue over half a million people speak the language.

There is an organisation that is actively promoting the Breton language. I am really happy to inform you that they have taken the trouble to localise the system messages of OmegaWiki and have started to add translations in Breton. I have send a file with many of the languages that are in the ISO 639-1. This combination will localise most of OmegaWiki in Breton.

The argument that proved convincing is that you get all the information we have. By steadily increasing the translations available in Breton, the experience will improve.


Saturday, October 20, 2007

Import, export

In OmegaWiki we provide some statistics. One of them is a breakdown of the Expressions per language. It is interesting because it shows what people are working on. It is a good indicator because after the initial import from GEMET, all the new work was done by hand. We are now experimenting with the import and export of data that are in collections and this will change things quite a bit.

The export creates a txt file with the columns separated by tabs. The columns are the number identifying the DefinedMeaning, and combinations of the Expressions and Definitions. The reason why we start with this export is because it is still much quicker to translate in a spreadsheet then it is to translate on the web. These experiments are done in a test environment and, we hope to bring it life soon. I can already send you a file when you are interested :)

When we start to import, it is likely that languages can make quite a jump in the statistics. I hope it will encourage people to help us by providing translations particularly for the less resourced languages.


Friday, October 19, 2007


I chatted with Duesentrieb the other day. He mentioned dbpedia. I had another look and it is really a great resource. For those that do not know, dbpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web.

It does a great job and it is different from that other project that deals with structured information, Semantic MediaWiki, in that it does operate by data mining information from Wikipedia while Semantic MediaWiki is an integral part of a MediaWiki project.

The great thing of dbpedia is that it explicitly encourages interlinking. With interlinking data, data that can be found in another resource, becomes available limited by the quality of the interface.

You might ask how OmegaWiki fits into all this. Dbpedia's information is in English while OmegaWiki allows for the representation of information in any language . Both OmegaWiki and dbpedia link to Wikipedia articles and consequently where the two share a link, the information can be mashed together.

In Wikiprotein there is a large amount of medical information available. This information does link to external databases. Given that the medical articles relate to external databases, there is an opportunity to link the data to Wikipedia articles. With Wikipedia articles linked in this way, more specialists will find their way to the Wikipedia medical articles and this in turn will make more enriched information available.

The thing to consider now is how OmegaWiki can benefit from dbpedia.. one of the issues is the difference in license. Then again, dbpedia provides its algorithms and consequently the result is not necessarily the license that dbpedia posts.


Friday, October 12, 2007

Domain names in other scripts

The Washington Post is reporting that ICANN is experimenting with URLs in other scripts. This is good news. For people that do not read or write in a language that uses the Latin script, it is a big handicap to have to type for instance also http://ar.ويكيبيديا.org is problematic. In order to enable people, you have to have the whole string in the appropriate script.

From a localisation point of view, this is one of the ultimate challenges. Consider; the .org has two components, the dot (.) and the org. The org needs an equivalent in all scripts. This top level domain or TLD is just one of many. The ISO 15924 defines many scripts and, a particular combination that is auspicious in one language can be the equivalent of wtf in another. Choosing these codes is not trivial. There are not only TLDs but also ccTLDs or country code top level domains.

We have agreed that ويكيبيدي is Arabic for Wikipedia .. Wikipedia is very much an international movement. We have agreed that Wikipedia is to be used for the Latin script in our domain name. The question is, if ويكيبيدي will be accepted to represent Wikipedia in the Arab script and, when there are multiple ways of writing Wikipedia, do we need to register for all these domains?

To make it even more confusing, what will the rules be when it comes to domain squatting. I can imagine that a brand is only registered for one script and not necessarily for another. I wonder what the position of the WMF will be; I am sure that it has not been considered yet.

ICANN is courageous, they are now experimenting with the technical issues and this will show that it can be done. I expect that the next part will be a proposal on how all the top level domain names are to be "transscripted". Then, it will become interesting because from that moment onwards the Internet will be truly global in its reach and no longer centered on one language or script.


Saturday, October 06, 2007

A picture paints a thousand words

I mentioned that in OmegaWiki we are really moving on the localisation. I am thrilled to announce that a lot of work has been done for Spanish, French, Portuguese, German and Dutch. But the icing on the cake is that for two other scripts a start has been made; for Serbian and Georgian the localisation has started.

I think the Georgian script is pretty :)


Now with grammatical gender

In Omegawiki we now have support for grammatical gender. When we now how to say it in "your" language, we can show it. :)


OmegaWiki vs Semantic MediaWiki

Both OmegaWiki and Semantic MediaWiki are providing semantic support. There are people that have expressed that OmegaWiki should not include particular types of data because Semantic MediaWiki does a better job.

Both OW and SMW are extensions to MediaWiki, so at first face it seems like a reasonable suggestion. The two extensions however do completely different things.

Semantic MediaWiki will shine when it becomes part of a project like the English Wikipedia; when key data that is in the article is marked, it will provide a great improvement in making these facts available. SMW even provides a really rich environment to query the information. It is absolutely great and it is absolutely mono-lingual.

OmegaWiki is at this moment very much a stand alone application. It does not derive data from anything, it is great at presenting the same data in many languages. This means that when we know that the information exists, we will show it in "your" language.

SMW can export and when OW can import, we have the best of both worlds; that is to say we have the best of both worlds when we can link the resources to each other. As OmegaWiki is not encyclopaedic and does not want to be, it is our stated intention to link to Wikipedia articles. As the SMW is tightly linked to the Wikipedia articles, this may be just the trick.

The suggestion that OmegaWiki would only have the semantic information that is in Wikipedia is incorrect. In Wikiprotein we already have a real rich set of annotations of proteins. This is the kind of information that is not encyclopaedic. It enables scientists to maintain information on "their" proteins. The language of the science of proteins is English, however many are known in other languages. It is rich that all this can integrate as it brings diverse information together.

Both OmegaWiki and Semantic MediaWiki have their own, and different strengths. Within Open Progress we use Semantic MediaWiki for our internal wiki. It works absolutely fabulous. I love both MediaWiki extensions!


Thursday, October 04, 2007

Changes in the user interface

A DefinedMeaning in OmegaWiki can have a lot of data associated with it. China borders on so many countries and seas that it is just a bit much. So it makes sense to bundle certain types of information together and keep them separately. This has a profound impact on what the data looks like. So far I was happy when we had the data on the screen but now it becomes possible to think where does it makes sense to have the data. I asked Erik if the incoming messages could be at the bottom of the page..

Well some things seem like miracles and we can have them in five minutes.. The problem with the collations is that at this moment it has to be by hand. This makes that it does not scale. With the "borders on" example however, we have a great showcase WHY we need to be able to sort these texts and also why they should be Expressions in OmegaWiki like all the other information that we hold.

Really, I could not be more happy with the progress that started to happen.. :)


Wednesday, October 03, 2007

Is it a Wiki ?

OmegaWiki is a wiki. Some people however disagree; they consider the fixed format conclusive evidence why it is not. As the software is maturing, this argument loses a lot of its lustre. Obviously we have been saying that these people are wrong all along :)

The best argument why OmegaWiki is a wiki is because people can add/ change little items one at a time, they are not compelled to do everything at one go. As this is acceptable, our data has to be correct but does not necessarily need to be complete.

With the new terminological support; we can now indicate that a language is a language, all kinds of additional information can be added once you have stated that it is a language. Have a look at French for instance, the "incoming relations" and the annotations provide a lot of information. As the moment of writing it is not yet clear that French is an official language of France for instance.

With the expansion of existing classes, with the expansion of the class attributes information will become more available and integrated. The best bit is that as the OmegaWiki specific user interface can be translated in many language people will be challenged to ensure that the right terminology is used in translation.

With the new functionality OmegaWiki became much more wiki. It is there for every one to see, and everyone is cordially invited to have a look, create a user, add some Babel templates and have a go at it.


Monday, September 24, 2007

Self promotions on BBC-News

The BBC-news website is my primary website for news. It provides great information and I have read it for many years. As I am driving less nowadays, I do not listen as often as I used to to the BBC-Worldservice (648 AM).

Recently there has been an increase in the number of video fragments. At the same time I find that I am less likely to watch them. The reason; every fragments is now preceded with a promotion and it is annoying and distracting to the point where I do not bother any more.


Sunday, September 16, 2007


When kids go to school for the very first time in Germany, they get a " Zuckertüte". A Zuckertüte is a big carton cornet filled with sweets and little gifts.

The word Zuckertüte is a good example of a word where I do not expect translations in any other language as it is a real German tradition. This does not mean that there cannot be translations of the definition.

I hope that the grandchildren of this little lady have a great day at school tomorrow :)


Friday, September 14, 2007


When I started to record pronunciations for Wiktionary, I used Audacity. It is a great tool but hey when you are a serial recorder, you want to have a tool that assist and helps you to do it it efficiently. Shtooka is a great tool. I think I blogged it in the past.

Now they have surpassed themselves. There is now the Shtooka explorer. It allows you to listen to the many, many pronunciations in several languages they have recorded. It is an absolutely gorgeous application that really shows off an already great project :)

Thursday, September 13, 2007

Social networks

I have been using social networks for some time now, LinkedIn, Plaxo, ecademy and facebook is where you can find me. As I have invested a considerable amount of time, it is relevant to consider if they are worth the time and effort.

LinkedIn, Plaxo and ecademy are at a considerable disadvantage because in order to get the full functionality, you have to spend money. It then becomes relevant to understand the potential benefits for these networks. All three provide basic functionality that is for free and as there is no real overlap yet, I maintain a presence there.

Facebook is what I have been looking at lately. What it has right is that it tries to connect people, the groups they belong to, the causes they champion and the organisations they are associated with. When you combine it with the potential to program extra functionality for facebook, you get a more compelling package then its competition.

It is however lacking in other ways. For one the way it has its security is minimal. I do not mind to tell the world that I am on facebook, but I prefer that only friends and friends of friends can see who my friends are. I would happily leave Plaxo behind when facebook had the same security for sharing personal and work information.

The one thing all the social networks have in common is that they are proprietary. To make their functionality useful to me, I have to trust it with my information. I do however not know if the implementation of their security can be trusted. Were they to use something like A-Select I would feel more comfortable because it would allow enough eye balls to vouch for the authentication process. Even though it would be a great step forward, it would be better if all the software were Open / Free software. Authentication is one aspect, the authorisation within the application itself can still make my data insecure.

Who to trust, why to trust ...


Tuesday, September 11, 2007


Hehe is a language spoken in the Iringa Region, south of Gogo in Tanzania by some 750.000 people. There is no article about this language on the English Wikipedia but there is an article on the people who speak this language; the Hehe.

Even though there are many references at the back of the article, it is marked as not citing references or sources. I do agree however that this article would benefit a lot from wikifying and the creation of supporting articles. It is a good example of the amount of work that needs doing to make the English Wikipedia relevant as a resource for Africa.


Monday, September 10, 2007

Sassarese and Sardinian

There is a Wikipedia in the Sardinian language. It uses the sc ISO-639-1 code. What was known as Sardinian became srd in the ISO-639-2. In the ISO-639-3 it was recognised as a macrolanguage; practically what was called Sardinian was split into four languages.

The Italian government has officially recognised the Sardinian language or the "Limba Sarda Comune". This is in essence a constructed language as it tries to make one language out of the four "dialects". One of the effects has been that some people prevent others from writing in one of the four languages on the sc.wikpedia.

The language committee of the Wikimedia Foundation has a request to approve a new language; one of the Sardinian languages, Sassarese with ISO code sdc.

There are two problems to deal with:
  • The "Limba Sarda Comune" is not recognised as a language
  • The proponents of the "Limba Sarda Comune" reserve the sc.wikipedia for their language
This issue is political. The first thing that I understand when you go to the official website is the notion of identity and indeed, to create one Sardinian identity it would be instrumental to have a unifying language. However, the map of the Sardinian languages is clear, the island is divided in four.

Given that the language committee has as one of its rules that political arguments are not accepted, there are a few conclusions that we should make.
  1. Sassarese can have a conditional approval
  2. We urge the proponents of the Limba Sarda Comune to ask for the recognition of this newly constructed language from ISO.
I have had a chat with Debbie Garside about all this, and I understand that it is necessary to apply for an ISO-639-3 code before an IANA language code is likely to be approved. At least fifty published works in the Limba Sarda Comune will be required.

Saturday, September 08, 2007

Wikizine but more relevant WalterBE

Walter is a long-standing member of the Wikimedia Foundation. He is a steward, he is the editor-in-chief of Wikizine. He organised the first official meeting of people of the nl.wikipedia .. there were only two people and we had fun.

People who know Walter know him as a soft spoken can-do person. He has done much of the organisational work on the Dutch Wikipedia, organising elections, being involved in OTRS from the start. He is the press contact for Belgium ... As a steward he did much good, he is a member of the communication committee ...

Walter has indicated that he has grown away from the community and as such his motivation for Wikipedia and Wikimedia stuff has gone downhill. He does not feel that he can properly represent the WMF and the community, he is disappointed in the lack of cooperation around Wikizine ... What has prevented him so far to stop is his sense of responsibility to the Wikizine readers. Wikizine has been a labour of love for Walter, there has been little input from the community to inform Walter about the latest, no people except for proof readers who shared the burden of this well received periodical.

Well, I can only be sad that Walter finds his commitments a burden. I do hope that he will know and remember how much he is appreciated for the work that he does and has done. Really, to me Walter is one of the most important Wikimedians.


Friday, September 07, 2007

My friend Bèrto 'd Sèra

I think Facebook sucks, they are not able to write my friend's name correctly ...

Kamusi, The Internet Living Swahili Dictionary has been taken offline.

Kamusi is a great lexical resource for Swahili. With a lot of great effort this became one of the really relevant Internet projects of Yale University. It is with sadness that I found that it has been taken off line.

It is sad, that such a sterling effort is endangered on what seems to me a minor issue. It is sad because the many Kamusi's users are now without their Swahili dictionary.

I contacted Martin Benjamin, the editor of Kamusi and, I learned that the World Language Documentation Centre is willing to help out with the hosting of Kamusi. Martin and the WLDC are looking for the best way forward; maybe another university can take over where Yale has dropped the ball, maybe Yale will reconsider ...

When I know how to get to the new Kamusi website, I will let you know.


Monday, September 03, 2007

Cool application

I found this tool called Touchgraph Google Browser. It allows you to see how a particular website is connected to other websites. It gives you a nice presentation with links between the many websites that are connected. It is nice to compare for instance a, a and an


Saturday, September 01, 2007

A computer that works for Luna and Marco

Luna and Marco live in Italy. In Italy it can get hot. It gets so hot that the computer they use overheats. Their computer only works early in the morning and late in the evening. They can push at the edges a bit because Marco found that an Ubuntu live CD gives them some more time then Windows XP does.

Their mother, Luna and Marca are five years old, also has a computer. Her computer only works reliably in the hot Italian summer with an external fan pointing at the computer. The computer is raised a bit from the desk to improve ventilation even further.

In all the talk about computers something simple like the environment is hardly mentioned. PC's and laptops are thought to work in an office environment. In an office environment it is expected that the temperature is regulated.

The OLPC is made for kids and it will work in a hot environment. Luna and Marco would love to have one. It is sad that their mother has to use a computer that cannot take the heat and, a computer that is not as cool.


Thursday, August 30, 2007


Some of the earliest writing that has been preserved for us is written on stone, or in baked clay. The writing system used in what was called Mesopotamia, is called the cuneiform script. Much of these clay tablets can be found in museums, collections and in archaeological sites.

Today I came across a really nice website called VirtualSecrets, it provides you with a tool that allows you to translate English into Assyrian/Babylonian, Sumerian and Egyptian. It is a really nice tool and I get the idea that it should be good because museums use this tool as well.

Soldiers of many nations are currently in Mesopotamia, they are bound to bring souvenirs home. Some of them will be original clay tablets. Much of these tablets will be useless without the context where they came from. But given a machine translator, it would be possible to have a text translated from photos.

Photos from the clay tablets in an archive, would make a digital collection. The texts can be translated and with an ever increasing amount of material in such a repository, individual tablets that are currently out of context may fall into place after all.


Wednesday, August 29, 2007

Music and languages, the sound of it

It seems obvious that a language can be associated with its culture. Particularly music provides a nice window on both language and culture. It is really good fun to listen to music from all different parts of the world. With Sabine I have listened and seen much music on youtube. There has been much that I would otherwise never have heard of.

Not knowing the language, you listen to the sound. The language in the music that I heard from Berto, Piedmontese, is so different from Neapolitan. The Neapolitan sounds more like Spanish and Piedmontese is more like what I associate with Tirol. Knowing the geography and some history it makes some sense.

At this moment I really like I Musicatoria. Have a listen, I think you will like it :)


Sunday, August 26, 2007

Language support in applications

When an application says it supports certain languages, it often means that the application has been localised in these languages. This difference is significant, because many applications are able to support all the languages that have Unicode support.

One of the problems that people have that want to use one of these "other" languages is the ability to just state that they are writing in their own language. The Unicode enabled applications should have a list of all the languages that are supported in Unicode and thereby provide the most basic level of support. This way people are enabled to enrich their document with the right meta data for their language.

Many people would expect that all the official languages of the world are supported in Unicode. This is sadly not the fact. Brianna informed me that several official Indian languages are not fully supported in Unicode.

In order for applications to know what languages they can support on this most basic level, there is a need for a public database that keeps this information up to date. Yes, it would be great if it also includes a link to a font that is needed as well.


Tuesday, August 21, 2007

French sign language in Togo

As so often I am working on completing the ISO 639-3 coverage in OmegaWiki. This time I added fsl or French Sign Language. In the information on Ethnologue it says that it is taught in one school in Togo.

Given the information on Togo, it is the only sign language known. There are no other sign languages that have been recognised and, if this is the case, French Sign Language is certainly as good as any other. However, I can not help but wonder if there are no other sign languages. It seems odd to me that deaf people do not have their own sign language.

When there is a native sign language, it may be that because of the French Sign Language being taught at school this language is marginalised. When however the local language is strong, it may be that this French language is isolated ...

Really, languages are a really interesting subject. :)


Sunday, August 19, 2007

Inspiration for working on content in OmegaWiki

When you are working on a wiki, you want to make sure that the work you do has relevance. Certainly when there is much content to be added, it is not difficult to find more to add. When you add the right words, the content gains relevancy for some constituency of a project.

At Wikimania I showed that we are supporting with real time semantic support for Wikipedia. The information that it uses is content that exist in all the databases of OmegaWiki. This means that when you look in one of the articles in this dump of the English language Wikipedia, you will find many concepts that do not yet exist. When you add these concepts, the Semantic Support will only become better once we have the process of updating the new terminology implemented.

Another way it is stimulating is that the existing concepts exist mostly in the UMLS database. In order to start providing Semantic Support for other languages, we need translations. This is done by creating a DefinedMeaning in the Community Database and linking it to the UMLS database.

It turns out that we already have a number of languages that are really doing well considering that we have mapped almost 4% of the records of the Community Database.


Wednesday, August 15, 2007

Erik Moeller wrote on his blog " & paying for free culture". In many ways it is a really nice idea. All the Micropledge projects as they currently listed have one big failing, they do not state how much money is needed for any of the projects.

Erik proposed and probably pledged some money for a project called "RSS extension for namespaces with smart quality filtering". Twenty dollars have been pledged (US / Canadian / Taiwan it does not say..) and two people bothered to comment on the project .. they like it :) . I however would not pledge money to it as it is completely unclear how much money is needed for this job.

Micropledge is a young project and it deserves a chance. So I asked the CTO of Open Progress to come up with a budget that would allow an expensive developer to do the job. Some money would also be set aside for the necessary overhead. The point is not that Open Progress wants to do this job, the point is that there will be at least one Micropledge project that has a target amount associated with it.

In many projects like Rentacoder, you can post a project and developers can bid for the project. I sincerely hope that a good developer will want to do the job and will do it for less then the amount we will post. Bidding for projects is however not something that Micropledge caters for at this moment in time.

When in the end it turns out that enough money has been pledged, Open Progress will do the job. For us it is an experiment as well. Will posting a realistic amount of money get more pledges, will we in the end be asked to do the job ??


Saturday, August 11, 2007

Court Rules: Novell owns the UNIX and UnixWare copyrights! Novell has right to waive!

Groklaw reports some of the best news. We owe a debt of gratitude to Novell for defending the GPL and for fighting the shameless opportunism of the people that came new to SCO.


Friday, August 03, 2007

Equipment for an Internet traveller on the cheap

So you get on a plane and land somewhere outlandish. You have your instructions on how to get to your hotel and then what.. Well, you get want to get back on the Internet. You brought your power adapter, and an extension block and all the stuff you need to load the battery of your phone, your PDA and what not.

Your PC does not pick up a network, so you get out your Meraki router and with the extra long distance antenna you have you find a Fon network. A Ethernet cable is welcome, it allows you to reconfigure the Meraki if need be. When you are in luck, the hotel provides you with WIFI but with you own router, you can connect to your preconfigured network and connect a bit more securely to your own network and move from there.

Before you leave the hotel, you check again the position of the hotel on your GPS system, you take your directions and move to your destination. With some luck you have the GPS location of your destination and off you go. The good news of the GPS is that even when you cannot read the street signs, you know have a tool that tells you if you are getting closer or getting further away from your destination, it beats getting a cab to get to your destination.

Typically you do not bring a T-shirt to sleep in, when you are really lucky you have a new clean one for every night of your stay.. T-shirt, proof of: “been there, done that”.


Wednesday, July 25, 2007


WiktionaryDev was pointed out to me and there is much to like. I really like the fact that they brought some tables to the party. The system now knows when there is content in a language. I experimented a bit and added both the word Cherokee and a word in that language.

Superb is the possibility to indicate that two labels indicate content in the same language, this brings the content of Greek and Greek (modern) for instance under the same heading. Having indexes for each language is really powerful; it allows people that are interested to work on one specific language.

The WiktionaryDev functionality builds upon the standards that were adopted in the en.Wiktionary. It is absolutely fabulous that the hard work of standardising Wiktionary results in all this new functionality.. :)


Tuesday, July 24, 2007

Licenses are often not even a nice pain

Many people produce content and many people collaborate on such content. They have a reason to do so, they want to make their work available to the work so that select a license. Often, this content is produces as an extra to something else.

You would create a spell checker under the GPL, but it is incompatible with the GFDL, the CC-by-sa ... You would create a machine translation engine under the GPL ...

The best reason for selecting a Free or Open license for software is because you want to ensure that the freedoms will remain available to the people who receive the programs downstream. In essence it is a defensive measure using copyright law.

Facts are different from programs. You cannot copyright facts. You can copyright collections of facts. Large collections of facts are available under a Free / Open license and these are incompatible with other Free / Open licenses. This means that you either take these collections and just use them, or you get into long discussions about licenses. Either option has its issues.

When you get into discussions about licenses, you have to indicate that the license does not liberate the facts for use in another setting. People get really grumpy even upset when they are told that their favourite Open or Free license is the issue.

The worst thing happens when you are asked to cooperate on a project. A project that has obvious merits. You are asked to help out because you know the subject matter. You are asked to help and comply with their license. You are asked to collaborate for free but you are not permitted to use your own work. When you then tell these people that you do not want to collaborate because their Free / Open license is an issue, you first get stunned silence and disbelief and then you get the same old religious arguments why their license is best. To me licenses are only great if you belief in the copyright system. I believe the copyright system is evil.

In OmegaWiki, we make our community data available under a combined license, CC-by and GFDL. In this way we reach out to both the Free and the Open communities. The people that use our data downstream can pick either license or when their license is more restrictive, they can even re licence our data. Our data will remain Free / Open. People can come back to us and improve and append our data and from OmegaWiki it will be available to the people who use the data downstream.

We are happy to cooperate with anyone. We are happy to collaborate on any database of facts but we have to insist that we work in our community environment. Facts need to be liberated and be available to all. This notion I learned from the people I met in the Open Access world.


Monday, July 23, 2007

Happy news

This article on the BBC-news website had me smile. It is excellent news and it indicates a big milestone towards a Free Culture.

Congratulation to all the people who are involved and make it happen :)


Sunday, July 22, 2007

Harry Potter and the Deathly Hallows

The amount of e-mails on the Wikimedia mailing lists has been somewhat less the last days. I expect that many, like myself, have enjoyed or are enjoying the latest tome by JK Rowlings. It is a great time sink. If I would have been smart I would have saved it for the plane to Taipei... Resistance was futile.


Friday, July 20, 2007

Ishi could have spoken Coos

According to a study from 1962 there were two people in Oregon who spoke Coos. It is not an unreasonable assumption that in 2007 nobody speaks Coos any more. Suppose that Ishi had some profound things to say; his message was recorded and it was recorded on a wax cylinder in 1911.

So let us consider this situation. Ishi spoke Coos, his language is now extinct and he used the technology of the day and left a recording of his message. We do not speak the language any more and we have problems with technology less than hundred years later. This wax cylinder is owned by the Phoebe Hearst Museum of Anthropology and we might be lucky; there could be a complete set of annotations including translation of Ishi's words. When we are lucky, we can listen to the the Ishi recording 96 years later and have some understanding through the possible annotation; a window is opened in our past.

When Ishi had spoken his message in English, we would consider it to be easier for us to understand it. The message would still be from a different culture, it would still require the same annotations for us to understand it properly. Now Bill who was also living in Oregon, had an as profound message. Suppose Ishi and Bill knew each other, Ishi's background would be Coos, and Bill's background would be Welsh. Our ability to understand their true message would depend on our understanding of that time. Without sufficient understanding of the culture, the profound message of either Bill or Ishi will not reach us. It would be an artefact of a museum, an artefact to be studied.

For Bill and Ishi it might have been of great significance that their profound message was recorded. For Ishi it would be natural to communicate in Coos, it was his language and it is not unlikely that is was only recorded and annotated because it was Coos. Bill's message was not recorded, his English and his message was not considered of similar significance.

Much of what is said and done, is done only for the present moment. My message is written in English because it is the best way for me to convey my message. Many of the messages of Sabine are in Neapolitan when she reaches out to that particular audience. My message is written on Blogger, I do not spend much thought considering its format. If at all, it may be saved for posterity thanks to the effort of the Internet Archive. When people cannot read it, understand it in 100 years time, I do not really mind as they are not my intended audience.

Much of what we do on our Wikis is for our current audience. Our content is transient, its shelf life is limited. We aim to bring information to our public and we to do this now. We provide Free content and when a wealth of content is available in a format like Flash, we should imho provide it because we aim to provide the best possible service now. With the continued development of Gnash, I feel reasonably safe that a future generation will still be able to experience some of what our day and age is about. The stuff that I really enjoyed.. well that is another story.. some people try to preserve it..


Thursday, July 19, 2007

Ishi meets IRENE

Ishi was a native American who was recorded in 1911 on a wax cylinder. It is unique authentic material and reproducing the sound can deteriorate the wax cylinder. There are many such recordings and for those who have an interest in such things the invention of IRENE is absolutely fabulous.

IRENE is a method that in stead of a needle uses a camera to find all the little grooves in the track. With this information it is able to emulate what a needle would find and, play the music. There are many priceless recordings and it will be great when they are digitised. This will ensure that they are less likely to get lost.

Obviously, given the age of this material, is it all public domain.


Wednesday, July 18, 2007

Today's word of the day: hypoxia

An article on the BBC news website informs you about the dead zones of the Gulf of Mexico. There are several reasons why I like the word hypoxia. The first is that this word is not part of the GEMET thesaurus while it is a genuine term that deals with the environment. OmegaWiki, does know the term and as such it shows that an interested community can add value to a resource that is considered to be authoritative.

The sad thing about hypoxia is that it is preventable. Hypoxia was a phenomena that happened in the Wadden Sea as a result of the pollution that came in from the Rhine. As a result of cleaning up this river, the hypoxic areas or dead zones have diminished, the areas where seagrass is growing are on the rise again. The same could happen with the Mississippi and the Gulf of Mexico. The main things required is the prevention of nitrate and phosphate getting into the waterways. It is well known how this can be done, it just takes the political will to make this happen.

My interest in this subject can be understood from the fact that I wrote most of the articles about the fresh water fish of the Benelux on the Dutch Wikipedia.


Sunday, July 15, 2007


There is an interesting read about copyfraud on slashdot, it refers to a paper published by the Social Science Research Network. I have been reading this paper now for some time and it does a good job at explaining the problems with the copyright claims of works that are in the public domain. The issue is very much that as the paper explains nobody cares about what is technically an offence.

Organisations that deal in copyfraud are legally fraudsters. While reading this paper one question that comes up to me is, how can industries that do not implement the law themselves on such a massive scale expect their customers to respect the law ?

The paper informs of the many ways it prevents people to use material that is public domain. There have been many threads on the WMF mailinglists about this subject and it is quite clear that our projects would benefit enormously from a strengthened public domain.

This paper does address the issue of how the public domain can be strengthened, it mentions among other things that courts ruled that those with dirty hands because of the assertion of copyright on public domain material were denied copyright enforcement.

Industry protects its copyright through organisations that represent them. With an industry massively breaking the law by claiming copyright where it is not theirs to claim, the moral footing of their representatives is undermined. There is legislation where the court denied copyright enforcement to copyright owners with unclean hands. Many industries have engaged in the implementation of digital rights management. These implementation do not take into consideration the fact that copyrights expire. Consequently I would argue that these implementations are broken by design and consequently they are not a legal correct implementation of copyright restrictions. I think it could also be argued that combined with the massive copyfraud perpetrated by the industry copyright enforcement should not be allowed because of the fraudulent behaviour of these industries by organisations representing the whole of an industry.


Saturday, July 14, 2007

Deletions of pictures

At one time I was a prolific writer on Wikipedias. This was when there was no Commons yet. At the time all pictures had to be present at the individual wikis in order to be seen. In 2004, we won the Prix Ars Electronica and I received an e-mail off to the organisation behind this price; we got permission to use their logo with our articles. There was a restriction that it was to be used with an article about the price, price winners. A reasonable restriction because we are talking about their logo, their trade mark.

I have had two instances now of people insisting on deleting this logo because it is not Free. It will be probably be deleted from the Dutch Wikipedia where the information about the permission is documented and it will after this be deleted from the English Wikipedia because the reference for the permission will no longer be readable.

There are several issues that I want to raise. Commons is as far as I am concerned not as good as it could be because it does not have a way to deal with a restrictive use of logos of organisations that allow for the use of their organisation within the limitations that they have to insist on because of it being part of a trademark.

Permissions given on one project, are often referred to on another. This is not considered when the licenses of pictures are evaluated. This is understandable because there is so much. Much of the older material does not have all the templates and doodahs because these did not exist at the time. The consequence is that much is lost because of this insistence on the compliance with later policies.

For the photos I made, I do not care too much if they are kept on projects or not. Everybody can make a photo of a building, an animal ... The problem is with material that is of benefit to our projects that we do not consider because it requires licenses that are by necessity restrictive. Restrictive in a way that even the organisation that puts the restriction on cannot help.

One additional benefit for accepting a license for logos and stuff would be that as a result our own logos would no longer have a "status aparte" on Commons.


Friday, July 13, 2007

Proposal: localisation sure but enable languages first !!

The Pan African L10N had a conference in Morocco in February. I learned about it through a blog I read today. I am really happy with the progress that is reported on of more languages being supported. I am however coming more and more to the conclusion that there is a need for a stage before actual localisation that will provide a service to the bilingual people of a language.

At this stage, the support of a language is very much an all or nothing affair. There is a localisation or there is nothing. This is not how it needs to be. When a language is known to exist, the lowest level of support for that language is the acknowledgement that this language exists. This is currently not done, and I think it is a missed opportunity.

The first thing to consider is, what languages and linguistic entities exist and, how do you support this. This is a surprisingly complex question. Languages are recognised in the ISO 639 standard. There are several versions of the standard and not all languages have a script that is supported in Unicode. Even when a script is supported in Unicode, it does not mean the an associated font is available for a language. The consequence of these two points is that a subset is needed on computer. On the other hand the currently recognised versions of the ISO 639 do not recognise orthographies or dialects or other entities that make a difference to how documents are to be supported.

This is not an issue the organisations that develop and localise software want to tackle. For them this a distraction. Deciding what linguistic entities can be supported is something that is best addressed by one organisation that exists to deal with issues like these. The World Language Documentation Centre (WLDC) is that organisation. Through its association with Geolang and because its board of experts in many of the relevant fields, it is already in a prime position to the research that goes into the development of the ISO 639-6.

With the WLDC and Geolang able to provide researched and verified information about linguistic entities that can be safely supported, it is then up to the applications to at least acknowledge the existence and allow a user to create content in that language. As more information becomes available, spell checkers can be added specific to that linguistic entity. In this way slowly but surely the functionality grows without the need to first localise the application.

In a way this is a solution for a "chicken and egg" problem. This problem is solved when you think of it in an evolutionary way. First there was the egg, the support of the language, and then the chicken evolved, the localisation of the application.


Thursday, July 12, 2007


I have postponed it long enough, I need to be able to pay abroad and, I have to be able to pay to and receive money from outside of Europe as well. To me this is not a straight forward affair. In Europe it is not customary to pay by credit card and, transferring money within the EU does cost nothing or, it costs as much as a national transfer.

Credit card payments cost money, paypal payments cost money. I would expect that credit card payments are more expensive and it seems customary that you pay your paypal bill with a credit card as well. I have the option to pay a paypal bill directly with my checking account. Intuitively this seems to be the option that is less expensive. So I have opted for this.

The question I am left with is how paypal compares with transferring money using traditional banking methods. As paypal exists for some time now, I am sure it will have had an effect on the banks. Paypal benefits from ubiquitous Internet and highly automated procedures, these benefits are available to banks as well...

One of these days I will know the answer to these questions.


PS Yes, I am going to Taiwan :)

Wednesday, July 11, 2007

Finaly, Ariana may stay

I know Ariana for many years, I see her every so often. She is a nice bright woman, a registered nurse and a university student. She is also from Kosovo and escaped the atrocities of war and fled to the Netherlands.

When Ariana got her nursing diploma she was not allowed to work, she was a refugee. After many years she finally got her permit to stay in the Netherlands. Many of the people that know Ariana have been appalled by the, in our eyes un-just, policies of the Dutch government. Not only my mother was willing to provide her with a sanctuary and sabotage these loathed policies that deal with refugees.

I am happy that Ariana can stay and now has more of a future. It is a shame that she has had to suffer all these years of uncertainty and doubt that have added to an already troubled past.


Sunday, July 08, 2007

RTFM, read the fine manual

SignWriting is the script that gives sign languages the potential of writing. It is way beyond an experiment, it is used. It is used in daily life, one of the telling things is that they have message pads with layouts that indicate who called, who left a message.

There is a mailing list and the last week there were two things that really got my attention. The first one said that not all the symbols that make up the total character set are to be used for one language. There are for instance characters specific to the Italian and the Ethiopian sign language. This lead to the observation that there is a need to identify the characters that are used for a particular language. This is similar to for instance the Latin script where only so many character are used for a language.

In the same way I appreciate this latest amazing story unfolds; there is a lot of documentation on how to use the SignWriting characters... People do not really read it. This is of course to be expected but it brings these wonderful moments where people find that there is more to it. That like for any other language, the basic tenets have to be really understood. That you can go find a character, a movement and it will be readable but it might prove not be the best one. What makes it so nice to read is the wonder and delight these people express that such great documentation exists.

I am impressed with SignWriting and I believe that it would be good when serious funding would find its way towards the further development of SignWriting. To sum up some points why; kids who learn to write their first language first have an easier time to learn the dominant written language that surrounds them, it gives the deaf people of the whole world the opportunity to express themselves in their own language. It allows for a better preservation of so many cultures that do not have anything but video.


Saturday, July 07, 2007

Another BBC journalist comes home

Another BBC journalist came home. I was so happy to write about Alan Johnston coming home. In the same week Frances Harrison is coming home from Iran. Frances was the bureau chief of the BBC in Tehran, in her last presentation she writes how people are afraid to talk to the BBC because they fear for their liberty, their freedom. Frances writes how she struggles with her conscience because she has to justify her need as a journalist with the danger that goes with it.

I am moved, I know several Iranians; they are wonderful people. I have been to concerts of Persian music, it is enchanting. I am saddened because with this continued breakdown of the free exchange of ideas with the deterioration of the freedom of the press, this wonderful country, these wonderful people may become painted even more as an enemy of our culture.

Frances writes: "
The Islamic system of government has deliberately erased much of what was Persian culture and it is only by looking hard that you can catch glimpses of the past." It is self righteous politicians that try to make the world in their own image. It is self righteous politicians that do not allow their own and their own actions to be judged in the same way as they judge others.