Saturday, September 30, 2006


I run the pywikipedia bot for the Wiktionary projects. I have done this for quite some time, and what I do is a "public service".. The software is quirky; when it works, it works well. That is until recently when it decided to blank pages for no reason.

This was a great moment to update the software. This did not work; authentication problems. Sourceforge decided to have me change my password. Thank you sourceforge. This was the moment were it still did not work.

I asked Andre Engels to have a look. The result; the bot works again after some major chirurgy. Also the way it works for me is different; it assumes that I have a user on any wiktionary.. This was already more or less the case. It will now test more systems than before to see if an expression exists there..

All in all, I hope / expect that this solves my problems running the pywikipedia bot.


The patron saint for the translators

Today, the 30th of September it is the day of St Jerome. He is best known as the translator of the bible from Greek and Hebrew into Latin. He is recognized by the Vatican as a Doctor of the Church. It is also the "International Translation Day".

WiktionaryZ will be a tool for everyone, it welcomes people from all countries and languages. Typically there is not that much political or religious to be found. This does not mean that we should not take a moments notice; we had Ramadan as the word of the day, and today I blog about St Jerome.


Thursday, September 28, 2006

European day of languages

On the 26th of September, the European day of languages was held. We did not know.. This means that we are either not into languages or there was not that much marketing for this event.

The European Centre for Modern Languages has a website, it used to have posters agendas and all kinds of information about this. The stuff just does not load on my computer. It may be that the moment has gone, on the other hand many of the other pages of their website do not load for me.

I really wonder, if you have a website that does not work for people, does it serve it's purpose ?? Anyway there is always next year .. the 26th of September :)


Monday, September 25, 2006

Cooperation ? Not with Debian...

I learned how "nice" it is to be "Free" at any price. Firefox, my favorite browser is Open Source. A great amount of effort was spent in making Firefox popular; advertisements in the New York Times covering a whole page.. The marketing of Firefox is truly one of the success stories of the Open Source world. Firefox is a trademark, it has a slick logo and the Mozilla Foundation protects its assets.

The Debian distribution is one of those "Free" distributions that does not appreciate that derivations of logos are not permitted. As it does not allow these files in it's distribution. Firefox insists that the logo and the name go together.. this results in a mess where Debian is likely to rename Firefox..

In my mind this is foolishness. Logos and trademarks are there to help products like Firefox to create a market. The evident denial IMHO of this is as stupid as the Wikimedia Foundation only having it's logos in the Commons repository and denying the logos of other organizations.

It is much better to appreciate that logos have a special place and, that it is great when organizations allow the use of their logo to be used with encyclopedic content.. For now is not to be because it is considered not to be "Free".. in effect denying this Freedom to others that is reserved for the own organization. It is not consistent .. it is a muddle.. It is bad practice.


Sunday, September 24, 2006

When Mohammed doesn't come to the mountain ..

Some things are inevetitable; WiktionaryZ tried always to be a project about content. Getting the attention of people has been difficult because what is WZ about, what is it's relation with the Wiktionary projects, what is the relation with it's partners, paying developers and last but not least, what is the relation between WiktionaryZ and the Wikimedia Foundation.

At some stage, the "Special Projects Committee" of the WMF issued a resolution that they want to host WiktionaryZ. Combine this with the wish of Jimmy Wales for having WiktionaryZ in the WMF; it put us under some pressure. On the other hand, the experience with the InstantCommons project taught us that even friends need contracts to stay friends and it also taught us that when you want to get things done, it is often best to do it yourself.

With the election of Erik to the board of the Wikimedia Foundation, it seems that the mountain has come to Mohammed. Erik has been and is a key contributor to WiktionaryZ. This should facilitate a great relation between the WMF and the WiktionaryZ community and partners.

We have invested a lot of ourselves in WiktionaryZ. We intent to invest even more in the success of WiktionaryZ; we have schemes how it should integrate with the WMF projects. We see a bright future but as guardians of the project we will protect what WiktionaryZ stands for and nurture what we think WiktionaryZ will make possible.


Thursday, September 14, 2006


Boinc or the Berkeley Open Infrastructure for Network Computing is something I learned about the other day on a BBC worldservice program. I was fascinated by the wealth of options that it provides. It is quiet similar to the "Ligandfit" application that I have been running for the last 2 year and 153 days. It is different in that there is a much bigger array of things you can work on. Both have projects that I do recommend.

Given that many people like myself have loads of computer cycles going to waste, it makes sense to do something with it. For me there are two things that I do with my excess computing power; I have bots running doing maintenance work on Wiktionary and the Ligandfit..

Do you have cycles to donate ?

Wednesday, September 13, 2006

Transliteration, do we need to do things twice?

At WiktionaryZ we support all the scripts UNICODE supports. This means that we already do both Serbian in the Cyrillic and the Latin script.. We also support Mandarin in both the simplified and the traditional script.. We want to support Cherokee which is available in the Cherokee and the Latin script ..

Many of these conversions can be done by a program or, like for Mandarin there are databases with both versions of the script. We received from Jeffrey V. Merkey permission to use a long list of Cherokee words, we also received permission to use a program called chr2syl that does transliteration automatically. There is a similar program for Serbian.

WiktionaryZ is at a pre-alpha stage, so we do not even have the basic functionality available, but would it not be nice when we add a Serbian word we automagically get the other version ?


Tuesday, September 12, 2006


Papiamento is a language with two orthographies. There is the Aruban and the Antillian version of this language. Today I have the possibility to add this language to WiktionaryZ. The problem is that I do not have a clue what the code for the language should look like.


Sunday, September 10, 2006

Antonyms in WiktionaryZ

An antonym is "a word or phrase that has exactly or nearly exactly the opposite meaning to another word or phrase". In WiktionaryZ, meaning is associated with what is called a DefinedMeaning. A DM is different in the way people think about concepts in that it refers to both synonyms and translations when you approach a concept from an Expression.

The technical problem is, when implementing antonyms in WiktionaryZ, are they still antonyms. Traditionally antonymy is considered within one language but because of the way WZ implements things, this is not true with in WZ.

Antonymy has it's own problems too. When a concept is considered the opposite of, this notion is often cultural. The problem then is, does antonym translate in the first place.


Thursday, September 07, 2006

Internationalisation or internationalization or I18N

The interest that we have in localisation and internationalisation is well known by many, we need it for WiktionaryZ as we want a User Interface in the languages that we want to support. We want to support all languages.

One of us was approached by a really interesting company, a company asking us if we know people who are in the business of I18N. Asking us if we know people who are interested in a job doing a great job for a great company.

I find it really funny that we are seen as being (becoming) relevant for this subject...


Wednesday, September 06, 2006

A small annoyance

The user interface is important, it is how people learn to work with a software environment. For WiktionaryZ the content is in the "WiktionaryZ" namespace. This is already counter intuitive as the default namespace is only supportive to what is done in WiktionaryZ.

The small annoyance is in the "New entries" it does show you only what is new in the "default namespace".. Not really relevant ..


Lies, damn lies and statistics

A lot of number crunching is done on the Wikipedia content. Particular the most famous one, the English language version has a lot going for it. My favorite number crunching project is Wikiword, it tries to get semantic content out of Wikipedia. It is a great project.

Many statistics seem to proof what the researcher tries to prove. Aaron Swartz wrote a really nice article on who writes Wikipedia. It is great because it challenges conventional wisdom. The conventional wisdom is that a small group of people write Wikipedia. Aaron makes it plausible that it is the anonymous user who contributes the most letters to the article that you read in Wikipedia.

What is particularly important for me are the consequences for WiktionaryZ. The suggestion is that we have to make it easy for the casual user. That people contribute to the things they know and care about. That an intuitive screen helps, WYSIWYG helps people who are the writers, the wikisyntax is for editors, the people who make things pretty.

The user interface of WiktionaryZ needs to be compared against the user interface of Wiktionary. Wiktionary is flat file and almost every Wiktionary is essentially different. WiktionaryZ will have one user interface for all languages. Contributions made by some will be available to all. I trust that we have and edge.

The flipside of the coin is that indeed a limited group of people do the EDITING. What we do not have in place are tools that help editors. Editors are the people that will prevent WiktionaryZ from becoming a mess. Particular the merging and deletion of the [[DefinedMeaning]] and [[Expression]] will be important to get right..

There are ideas on how to do this, they are not mature yet.


Monday, September 04, 2006


No, I will not talk about the spelling of the word. I want to raise the subject of defining colours. There are many colours and in order to agree what the colour IS, you cannot really use words. For colours there is a standard, the RAL codes. This effort to bring quality to the delivery of colours started off with some 40 colours and now it defines some 1900.

Lemon yellow, is such a colour. It has the RAL-1012 code. Because of it's long existence, the colour indicated by this code has been translated to numerous languages; suurlemoengeel is the name of the colour in Afrikaans. When the colour is defined by the RAL code, and you use the names used with this colour, you have an identical meaning for the word. This is obvious as this is what the RAL codes are there for.

The question is, how to list the RAL codes themselves in WiktionaryZ. The RAL colours are a collection, we can describe them as such. We can also include the RAL-codes as an expression, the zxx language seems obvious to me. We can do either and we can do both.


Sunday, September 03, 2006

The Chinese language

There is no such thing as "the Chinese language". This should be no surprise to people. When you learn more about Chinese languages, it does not take long that people mean Mandarin and the simplified script when they talk about the Chinese language. For WiktionaryZ, this is not a position that we can leave like that; WiktionaryZ is to be a lexicological, terminological and ontological resource in every language. So we will change Chinese and have it called Mandarin (simplified) we will add Mandarin (traditional) and Min Nam to the languages that we will support in WiktionaryZ.

The basis for this is the way we have embraced ISO-639-3 and the experience we have gained with Serbian and English. For Serbian we have two scripts and this works well, for English the words that are universal are English, the specific US-American words are now English (American).
The only thing left for English is to include English (British).