Saturday, June 25, 2005

Old spelling / new spelling / alternative spelling

When you think about an Ultimate Wiktionary, the idea of including all words of all languages is a given. That is ambitious enough. You do not need anything more, right ?

The Dutch language will change in 2006, it will change things that are artificial like paardenbloem back to paardebloem, it has always been pronounced as paardebloem.. The result will be that many words will be wrong from 2006 onwards.

In October 2005 a list of words will be published with the old and new spelling. It means that we have to cater for this list in the Ultimate Wiktionary. So the Ultimate Wiktionary has to be more ambitious alas..

This is then the time to start experimenting. So I am using the word Imbiß as an example, in modern German it is spelled as Imbiss, I have introduced two new templates. One to be used in front of everything to signal old spelling and the correct one. One to say that it used to be correct.

Having a date for the change will make the information even more valuable. When UW is used within software to be used for optical character reading, it may be used as a pass after the initial pass that did the scanning. It will allow for an appropriate spellcheck that will allow to enhance the quality of the OCR process.

One thing to consider as well is that some spellings are local to a certain region or country. Rudolf Heß is called Rudolf Hess in Switzerland.. the "scharfes S" is not used in die Schweiz.. So words that still have there "scharfes S" in German, are spelled differently in Switzerland. This is just spelling. Some words or their meaning are not known to all people who speak German like "Paradeis" which Austrians know to be a "Tomate".

I am more and more appreciating the fact that linguist find it astounding that we attempt to make the Ultimate Wiktionary a reality. What makes us try it is that it was for us a natural growth path from Wiktionary. So we have our problems serially and not in a parallel fashion. The issues are there to be solved and they can be solved. Getting the issues serially helps because it prevents you from being overwhelmed by complexities for us it is just a matter of refactoring.


No comments: