Saturday, May 28, 2005

Chippewa or Ojibwe

Within the Wikimedia projects we have some rules. One of them is "ignore all rules" but that is a different story. The rule about what languages we support is based on a few things. For me the existence of a ISO-639 code is the more important one. I have been "bold" as I use the ISO-639-3 code; this one is not even ratified yet but it does include many more languages ..

Today I was working on some translations from the English Wiktionary and the word "fruit" had two translations in language I had not seen before. One of these was the "Ojibwe" language. There is not mention of this language in the ISO-639-3 so I had a problem. The language codes that indicate that a word is a language are based on this code.

Google as so often turned out to be my friend; the Ojibwe are better known as the Chippewa; the code for the Chippewa language is ciw. So I was pleased to have a code to go with the Ojibwe language.

In the Ultimate Wiktionary there will be no reliance on the existence of an ISO 639 code. We could have more languages and nobody would realise ..

Thanks,
GerardM

Wednesday, May 25, 2005

babel

"babel" is a template used on several Wikimedia projects. It intends to inform about the language skills of a person. I am not a particularly good at languages I only care to post about three languages on my user page. The rating is funny because you rate yourself; I do not think much of my German skills, Sabine thinks I should be a de-2 .. She is a "profi" at this language :)

What a template like this can be used for, is not only indicate who has some expertise in some language but also to use it as a filter on the recent changes in an Ultimate Wiktionary. This could function in a same way as it would work for the inclusion/exclusion of bots.

Another use would be to help indicate who shares knowledge of a language, this in turn could help form a community for a language.. I can appreciate that there could be a need for a "village well" or a "kroeg" for each community.

One thing I did find was that it is also a bit of information where you need LOADSA localisation.. There are three templates for each language; Oscar one of my Dutch Wikipedia friends is a tr-3 there is no template for that yet. His being only a tr-3 means that you cannot expect him to write this template in Turkish :)

Thanks,
GerardM

Friday, May 20, 2005

Licensing and mirrors

Most of the Wikimedia projects are licensed with the GNU-FDL license. It works relatively well for server content. There are many people who have issues with it, but given its rules and regulations the content is free and will forever be free. That is genuinly cool.

As I recently explained we do want to use the data in Ultimate Wiktionary for non server purposes as well. I mentioned the .dict data format. Data in this format is also used in off line usage. To create this data you create a subset of the data we hold. Given the license we should inform about every contributor to each word. This is not practical. It is practical to refer to the UW for the history of every word.

As the Ultimate Wiktionary is a new database, it is best to start with an appropriate license that is free and prevents the data from becoming unfree.

With the UW containing the free data, it does not pay to be too concerned about mirrors that host UW data. Given time and the entheausiasm of our community, the UW content will grow and therfore has the potential to outcompete this type of competition in all important ways.

Thanks,
GerardM

Monday, May 16, 2005

RFC 2229 and dict

One of the important things for an open content project is cooperation. Currently every Wiktionary has its own community and, when Ultimate Wiktionary will become a reality, we hope that many wiktionarians will find there home in the Ultimate Wiktionary.

The biggest challenge however will be to grow both the content and the community. As there still a limited number of languages present, we do need to grow a presence, a community for the missing languages. For many languages including my own mother tongue, there is no comprehensive coverage yet. We will be searching for content to be added by our community and by incorporating existing glossaries, wordlists and thesauri.

Today I learned about a third way of making free content available, it is by use of the RFC 2229 a protocol to provide people dictionary information over the Internet. The trick here is that there is a database and that it does provide the information where it is available. So from a user point of view it would be great when we cooperate with dict.org.

There will be two issues that need to be resolved. We use the GNU-FDL license for our content and the GPL for our software and they use the GPL. In order to cooperate we will need to work something out. Licenses are a necessary evil but it would be a travesty if free licenses are found to be mutually exclusive.

RFC 2229 compliance would be for Ultimate Wiktionary mark II .. It is funny that Ultimate will be as much a work in progress as everything else.. Not really a suprise. :)

Thanks,
GerardM

Thursday, May 12, 2005

Ziekenhuis.nl again

Yesterday I mentioned that we received this big body of work of nl-en nl description medical wordlist. People downloaded this list. A Portugese guy who is active on the Portugese Wiktionary has started translating words into Portugese. He found some potential errors as well; what is a diuretic, "diurese" or a "diureticum" ??

More than 18 people downloaded the file already. This is extroadinary when you consider that it was posted on an Italian list for translators.. :)

I understand that some work may be done to translate the list into German.. Is free content not exciting to watch? With these effects I am convinced that we are growing a community that will make the Ultimate Wiktionary a success. :)

Thanks,
GerardM

Wednesday, May 11, 2005

Ziekenhuis.nl

Ultimate wiktionary will get its prominence partly because of what it aims to do and partly because we will actively look for glossaries and coherent bodies of work. The theory is that the whole is more than the sum of the parts..

Ziekenhuis.nl has a 1834 words glossary with academic medical words with a description and an English translation. I received permission to use this in Wiktionary and it is a perfect addition to the EU thesaurus with medical terminology. They will both be included in the UW.

The Ziekenhuis.nl words will be included in the Dutch wiktionary. I have a few reasons to do it, one is my sister teaches nursing and it is nice for her to have it. Another is, that I want to create some buzz about wiktionary and UW and finally as I aim to have words pronounced, I might as well start with it.

As these words are now available under the GNU-FDL, Sabine also mentioned this on some mailinglists and made the file available for download. It would be stellar if this resulted in some translations ..

Thanks,
GerardM

Sunday, May 08, 2005

It is lonely at the head of a curve

Call it hubris, but I am making a difference in the wiktionary world. I have made it possible for programming on the Ultimate Wiktionary to start. I got into contact with GEMET and it may result into something great. I will be the first to acknowledge that it is something whose time had come and that what I have done would not have been possible without the inspriration from people like Sabine Cretella.

But it is hard to communicate the vision that we have. Why UW may make a difference. It is hard to communicate what will be included, why UW may gain a big community why it will matter practically what UW becomes.

People complain that I am not forthcoming with information about what is going on. But I am at a loss how to be more open and communicative. The problem is that there are so many seperate groups of people that start to know UW and want to be informed. It is becoming increasingly difficult to be everything to all.

Today I have finally finished and send an article to LISA, it took me two weeks to write because I consider working with them vitally important. The best thing that can happen is when we get a bigger community around UW. It will make it even more difficult but it will not be so much dependent of a few people.

Thanks,
GerardM

Saturday, May 07, 2005

Swear words

When you compile words in a dictionary, at some stage you will be asked the question, what about swear words. My answer would be: why not. If someone insults me, I want to know what is said and, what it means. Another argument is that many resources do not carry swear words and for translators it is hard to find proper translations. So as far as I am concerned, I think we should include them.

As swear words are a rare resource in the lexicological sense of the word, it would enhance the Wiktionary. When we combine several relevant collections, the total is more than the sum of the parts..

One question I have no answer to, yet, a word like "godverdomme" of god damn me, is only valid in a religious conotation. Should it be in a religious glossary or, should we treat it as we treat categories and have "Christian swearwords" as a subcategory of "Christianity" and "swearwords"??

Thanks,
GerardM

Wednesday, May 04, 2005

Ilse and inspiration

Today I went to Ilsemedia to discuss cooperating on Ultimate Wiktionary.. Ilse is a company focused on the Dutch language market. They want to include lexicological content to the search results that are produced in their search engine. They were looking to implement something themselves but they are interested in cooperating with Wiktionary on this. They produced a mock-up of what it might look like. They are interested in Ultimate Wiktionary because it is free both in software and in content.

We discussed many things about wikimedia, among them Commons.. currently some 90.000 pictures. The problem is finding something. We came up with an interesting idea that helps both Commons and Ultimate Wiktionary.. To motivate you to have a look: it has a nice picture of horses. :)

Thanks,
GerardM

Tuesday, May 03, 2005

New resource found :)

In the Dutch Wikibooks there is a resource that is lexicological by nature. They are words that are foreign to the Dutch language with a description and Dutch equivalents. As the license is the same license as we use in Wiktionary, it is a good candidate for inclusion. It is a glossary in its own right.

Thanks,
GerardM

Monday, May 02, 2005

Windows SP-2

Windows SP-2 is old news. It has been around for quit some time. It is thought to be so important to all Windows XP users that it will become mandatory to have it. There are security patches that insist that SP-2 is installed.

For translators, it is a major pain in the rear-end as the software that they bought will not work with SP-2. Many application either do not have an upgrade path or involve the partition of dineros. The consequence is that these people loose on security or on functionality.

One thing with Wiktionary is that it will be free of charge. It is an Internet phenomena and the only thing it lacks at the moment is cooperation between the different language based communities and content. Isn't it funny that Microsoft's SP-2 is a reason why some translators hope that Ultimate Wiktionary will be a success ??

Thanks,
GerardM