Saturday, December 31, 2005
We discussed the long standing need for single login. Because it is a dependency for many other projects related to Ultimate Wiktionary and Wikimedia in general, if at all possible, Brion will start working on it soon after the release of MediaWiki 1.6. Brion will also look into Surfnet's A-Select in order to interface Wikimedia with other authentication service providers. There is a need for outside authentication to Wikimedia projects, especially Wiktionary, and A-Select has the potential to provide it. However, Brion feels that thinking about federation is only possible after the Wikimedia-internal authentication problems are fully resolved.
Together with Erik, we discussed the need for handling multiple languages inside a MediaWiki installation, which is obviously related to Wikidata and UW and will likely be one of our next development milestones. While Brion sees this as a quite complex problem, he did agree that the situation as it currently is - that multilingual projects like Meta and Commons have no language-awareness whatsoever - is broken. What we agreed to do is to send him specifications to review before any implementation begins. Brion also pointed out some current and potential problems with MySQL: that certain UTF-8 characters cannot be stored with the proper charset encoding, and that it may not be possible to have multiple sorting orders on a field without duplicating it. (In this context, we debated the need for following important standards such as CLDR for locale data.)
Besides meeting Brion, we made an appointment with two officials of Wikimedia Germany to discuss the potential for cooperation in different areas. So far, the signals are positive.
Monday, December 26, 2005
We have been given permission by European Environment Information and Observation Network (EIONET) to host the GEneral Multilingual Environmental Thesaurus (GEMET) thesaurus. This showcases our wish to have great relevant content in many languages that does have structures as can be found in thesauruses.
The data is not complete yet and the content will be improved. To put it in perspective, this read only implementation of a Wikidata database is a proof of concept. This will be expanded slowly but surely, not only to improve the technical features but also the information and the user interface.
We really welcome all your constructive comments.
Sunday, December 25, 2005
Important is that even when this software is not and does not become Free software, we will be able to cooperate.. This is the key thing of the Ultimate Wiktionary project.
PS I received an e-mail from Erik that he is importing the GEMET data into a Wikidata database.. It does a cool 1.000 records a minute.
PS2 I am also celebrating Christmas. But I am in two minds.. I want this soo bad. How much do I want to be like a Santa bringing good cheer.. :)
Saturday, December 24, 2005
Well what do we find under our virtual Christmas tree. Today I received an e-mail from Erik who wants meat on the bones of his present. He does want to have "versioned tables" to be included even though the main purpose is that we have something to show :) . It is still very much intended to be on-line in the coming day/days.
Yesterday, I had a conversation with Jimmy Wales. We discussed many things. I was really happy to learn that he considers it necessary to have "committees" that take care for the board of Wikimedia Foundation projects that do not get the attention that they do deserve. Wiktionary would be one of these projects.
In the last month many things have happened, I hinted so some and not to others. Some of the highlights are the potential cooperation with several organisations. Two of these I want to highlight; the GEvTerm project and the ProZ organisation.
- GEvTerm is based on a great idea; when you have an international event somewhere. It will mean that many people will congregate to one place. They need to communicate but it cannot be expected that all people share one common language. The idea of GEvTerm is to concentrate on translations that are associated with the particular event and make it available to ease the human interaction.
- ProZ is a member based organisation of professional translators. It serves the largest community of translators. Proz has been active building glossaries, they have their kudoz where colleagues can help out with particular problematic translations. They were about to create their own dictionary and we were lucky to get into contact with them through Sabine who is a ProZ member. We are now talking on how we can create one fabulous resource together.
I wish everyone the most joyous of Christmases..
Friday, December 23, 2005
The name for the project does not determine that the eventual project will be found at "http://ultimate.wiktionary.org". There are reasons for it and, there are reasons against it. It is cheap to name it like this as the domain wiktionary is already owned by the Wikimedia Foundation. The second reason is that a fair number of people already know this name and finally it does link the old with the new. A big argument against is the use of this "ultimate" label. The functionality of the software will grow but at first there will not be much that deserves this accolade.
Now there is this opportunity; what name to pick and, what arguments to use
Wiktionary2 is another great name it. It also gives a great link to the current Wiktionaries and it symbolises well the big technological step that it represents. One thing that is problematic that some people suggested that it could be seen as a version number; this would mean that the future might bring us a "http://wiktionary3.org". This is not a sensible way of doing things as you do not want to reflect version numbers in your domain name.
I hope that people will like these suggestions, and there is room for many more.
Tuesday, December 20, 2005
Ultimate Wiktionary is there to be used. My definition for success is "when people find an application for the data that we did not think of". There is however nothing wrong with us coming up with new ways in which we can extend the potential use.
Particularly interesting to me are the changes that extend the community of users. We want the scientists; the translators but we could also have the puzzlers. For me this would be really cool if UW becomes a challenge to my mother, she likes her crosswords and her cryptograms. Puzzlers are interested in synonymy and definitions so by adding this one field in the Expression table, the first step is taken to charm yet another group of people into the Ultimate Wiktionary...
Sunday, December 18, 2005
Wikidata is not the same as Ultimate Wiktionary and consequently has requirements of its own. It has language requirements of its own. It may need longer texts, it may require texts in a format that Ultimate Wiktionary frowns upon like capitalised expressions. As we are investigating the use of TBX for the static part of Ultimate Wiktionary, it made sense to think about TMX as well for this issue. This means that we need some basic stuff to deal with handling translation projects. I have come with this extension of Ultimate Wiktionary, this datadesign makes use of tables that are part of UW and may as a result become part of Mediawiki proper.
I realise that when we implement this, we have the core of a translation / localisation workflow. This makes sense when you consider that Wikipedia, one of the biggest websites of this world, exists in 212 different languages. When a Mediawiki message is changed, who is going to do the translation.. I doubt that there is one organisation that can do that well on a continuous basis. As I am a firm believer in using standards AND in eating my own dogfood, this is my first take on this issue.
Saturday, December 17, 2005
That in a nutshell describes the situation for many standards and as far as I am concerned, I would prefer a definition that includes relevancy. "A standard is a standard when a standard body says so and when it is freely available for adoption". When a standard is not freely available, it means that the standard will not be adopted by some for monetary reasons. The consequence is that money removes relevancy from a Standard when it leads to it not being adopted.
In my mind the worst thing that can happen to a standard is that it is not adopted or ignored.
Friday, December 16, 2005
Many of these things find their origin in being the legacy of a paper based origin. In a digital resource with some magic linking "plague" and "bubonic plague", one would suffice. The problem is in how to make the Ultimate Wiktionary relevant. When we do include "plague, bubonic" in some way, we allow for the one to one linking from the Unified Medical Language System to Ultimate Wiktionary and vice versa. It would even allow for the inclusion of UMLS data in Ultimate Wiktionary.
My current thinking is about two options. I know that in lexicology they have some anotation to describe in what relation in a sentence a word exists. The other option is to have an AlternateRepresentation table that links an Expression to the preferred Expression.
I do want this anotation anyway, what I do not know is if this anotation is aware of capitalisation.
Thursday, December 15, 2005
Ultimate Wiktionary wants to be open, wellcoming to all communities and.. yes, I was human, so I was convinced that Term was a better term. However, it is one of those words with multiple meanings and particularly in the worlds of terminology, lexicology and thesauri. After some discussion we came to the conclusion that this is not the right word to describe what we mean, and also that it is not really neutral. So, a new word was agreed upon: LexicalItem.
There are some more changes that we decided on in
There are loads of things that I have learned that I am still internalising. When I have that there will be several other changes.
Oh, the great news is that many of these changes are inspired by the great people that I met In
Tuesday, December 13, 2005
Sunday, December 11, 2005
- A word is specific to the dialect
- A word is used both in all areas where the language is spoken
- A word is not used in the dialect but specific to the parts where the dialect is not spoken,
This aproach is not problematic when you consider the Dutch and Belgian situation, languages / dialects like Andalusian are much more problematic because people will bring political dimensions to it. The history of the creation of new wikipedia project often proves that a language is a dialect with an army.
Saturday, December 10, 2005
There are other resources that have importance to people interested in lexicology. Logos in its dictionary provides a rich tapestry of words with translations. In its link to wordtheque, you find the words in its context. In the philosophy of Logos, this often provides as clear an idea as a definition would do. A link to publicly available resource is not available through the resources of the TST centrale.
In the Kudoz open glossaries of Proz, you find a rich resource of hard to translate words. When you start looking for resources that have a relevance for the creation of dictionaries, there are many resources that are not created in a "scientific" manner. Practically they can be extremely usefull. It is a shame that the scientific resources are not Free and consequently that they make the "unscientific" resources unavailable for the enrichment.
Anyway, as long as these resources are used side by side there is nothing that stops the research of lexicology. As the Wikipedias are a rich resource of contemporary language, and as its content is categorised as to subject matter, it is good to know that scientists are free to use it for their research. I checked it with Jimmy Wales and he was happy to confirm this.
Tomorrow I will be going to Berlin. We will talking about interfacing the Ultimate Wiktionary using the TBX standard..
Friday, December 09, 2005
Yesterday, I was at a conference for Dutch language lexicologists. It was my first such thing and it was a grand experience. Lexicologists have always been abstract people, now they have faces they exist in many shapes and forms and they largely do many different things. They work together in many ways and to me they do marvellous things.
The difference in our approach and the scientific approach can be given in one word: scientific. What we try to do with Ultimate Wiktionary is not scientific. Being scientific has never been considered. Our outlook has always been practical. We want to do practical things with our dictionary. Publishing a scientific paper is not practical to us. That is not what our goal is.
Given this difference in approach, there is still very much that we can do for each other. By building a resource that is useful but not complete, it may have a limited scientific value but it does have a value. Being build by people who do not necessarily share the same methodology, it may be chaotic but is still has a scientific value. Even for all these "issues" a project that makes lexicons relevant to people who typically do not care is probably the most valuable gift we can give to the science of lexicology. If we can make lexicons relevant and exiting, there will be new people who will find their way in this profession..
Monday, December 05, 2005
Last year we started to collaborate on Christmas wishes and it was good fun. It is so funny to see a text in alphabetic script and not have a clue as to how it is pronounced.. "Përshumvjet Krishtlindjen dhe Gëzuar Vitin e Ri". Last year we were as ambitious as this year; we would love more people to translate and say: "Merry Christmas and a happy New Year!" in their language..
We hope and expect that the first tangible results of all the effort that has gone into Ultimate Wiktionary will be our Christmas gift.. In the mean time we will also do some more work on our Christmas glossary.. Have a look and see how you can make the glossary yours as well :)
Friday, December 02, 2005
When disruptive technology apears, it changes business as usual. It has done so in our society from the moment when innovation was considered to be good. Innovation was never considered to be universally good, but it led to our current society with a number of people having it good in a way that could not be conceived one hundred years ago. In a way, with the ever increasing speed of communication, new ideas get an audience with an ever increasing speed.
Wikipedia is an encyclopedia, it is internet based and it is growing as quickly as new servers can be brought online. It can only do this because of the huge pent up demand for affordable information that has a neutral point of view. Important is the realisation that Wikipedia is not one but many encyclopedias. Every month there is yet another language that gets its own Wikipedia.
These wikipedias all have the ambition to equal the star wikipedias like the German and the English Wikipedia. They will have to grow from a small project where everybody knows everbody to a project where even the heroes of last year are not known by all anymore. Slowly but surely these project create Free information and get the recognition for the viability of the languages they express.
Certainly when there are few resources in a language, the impact that a wikipedia may have is big. Comparatively Wikipedia cannot be as important for languages like English and German as it could be for Swahili. It will take its own good time..
With all this talk about disruptive technology, it is fun for me to predict that Wikidata and Ultimate Wiktionary will be disruptive in their own right. It will be in more ways than one.. I am anxious in how conservative the Wikimedia crowd will prove to be. If they are like I expect them to be, they will allow both Wikidata and Ultimate Wiktionary to develop its potential.
Thursday, December 01, 2005
The problem is that we have to pay outside of the European Community. So we do get into silly stuff like currency and costs.. We have to find out what the cheapest way is to get money elsewhere.
It is a problem but I prefer this to not having code finished.