Saturday, March 25, 2006

A new concept; the "Regime"

In WiktionaryZ we want to have a lot of data, both lexicological, terminological and thesaurus information. Information may come from many sources, reputable sources. And yes, in WiktionaryZ people will be able to add, modify. When the information fits in with the existing data, it will be important to know where it fits in from a quality point of view.

Enter the Regime, a regime would be a procedure that people can submit to. It would be optional. There will be many areas where we can develop these regimes. It may be that a regime is developed or execustes with organisations that we will partner with.. One thing, to state the obvious, our data is Free so the process will be transparant and the resulting data will be Free.

It is an idea that we came up recently, we discussed it with some and now, we are interested in what you think of this.


Saturday, March 11, 2006

Babel templates and WiktionaryZ

I discussed how to proceed with WiktionaryZ with Dvortygirl. What she said was that it is time to ask people to get involved. According to the planning, we hope to have an editable version of the relational data at the end of the month. The software will be an "pre-alpha release", the importance of this release is to show that we can edit relational data in a wiki.

WiktionaryZ is a fully functional wiki. This means that we can add content; we can create users, we can create templates, categories. We should when it makes sense. And it does. When we start the coming test, we will start with people who understand what WiktionaryZ is about. This means that they understand the concept of the DefinedMeaning. An other factor that will help us decide who to ask, is the languages these people master.

WiktionaryZ will for now not be available to anonymous users. People can create a user. When they add Babel information to their user page, we will learn who has expertise in what language. The templates we start with have been copied from the en.wikipedia and, we hope we will get many more templates that will allow us to have five levels; the native speaker and level 1 to 4 to indicate the growing level of proficiency.

The scope of the test is limited; priority is in learning the edit process for relational data. What does work what does not. What improvement will be needed to make us ready to meet the "great unwashed". We will also be able to work on information that will be important in later phases, things like names of languages and other terminology that is likely to end up in the user interface...


Tuesday, March 07, 2006

If I had money that I could freely spend ...

If I could hire a programmer that would give me one extra functionality, something that is not scheduled for this moment, what would I have him do? What if I have a budget for at least a few weeks of work ..

Interwiki links
I hate these things. They are always out of sync. Many people spend a lot of hard work on getting them right while the problem is getting bigger and it is not solved at all. It feels like a waste. When a project grows, it has articles that need to be linked. As more and more wikipedias grow, the number of articles that need updating grow rapidly. Small projects are not easily integrated. It costs a lot of resources.

With a centralised database, we could link an article to another article and by inference it would be known to all the articles that are linked to it. As all the articles are about the same subject, we could check if article names are translations. When they are, it is a basis for linking to the lexicological content of WiktionaryZ ..

Inflection boxes
When a verb, a noun and adjective changes under given rules, it makes sense to have inflection boxes. They are generated using templates on many wiktionaries, but it makes more sense to have some software that allows us to build these boxes. Software that associates inflections of one language to the inflections of other languages for the purpose of translation.

Better support for tools like OmegaT
OmegaT is a CAT-tool, it helps translators with their work. I would do two things to OmegaT, I would have it read directly from a MediaWiki wiki and when the translation is finished, write it to another wiki. I would also have its translation glossary funtction make use of WiktionaryZ..

Yes there would be a quid pro quo, when a translator adds a word to the glossary it would be fed back to WiktionaryZ.


PS What would your suggestion be ?

Wednesday, March 01, 2006

Pictures in WiktionaryZ

One of the more famous quotes is "a pictures paints a thousand words". As the interwiki bot is running again, I do check every now and again its work. When I look at the entry [[ラッコ]], you will forgive me that it is Japanese to me. However, the picture that can be seen allows me to translate it to "zeeotter".

Pictures are great. We do need them with the WiktionaryZ content. Having said this, the fun starts. We can decide to allow only for pictures that can be found in Commons. There are precedents for it. It solves the problem of what upload rules would be allowed; they would be the strict rules of Commons.

The other thing you would get into are the illustrations that are problematic for some cultures. In my country, people have as many genitalia as in any other, they look more or less the same as in any other, they are as rare as in any other and a picture would not be problematic. In many countries this cannot be done. One great example of this can be found here. Now when you read this in some 10 days time this picture may not be there anymore. It does however show you the sensitivities that need to be considered with pictures.