Wednesday, January 17, 2007

Standards .. we need them

I had another conversation with someone who has a big interest in the Apertium machine translation software. OmegaWiki has something to offer to offer to tools like Apertium; it is a database, it intends to have all words of all languages and last but not least it's data will be available under a Free license.

To be of relevance to tools like Apertium we need to have conjugations and inflections. This is something that we have always planned to include. The new thing for me is that it is desired to have morphological information as well. With the current costs of computer hardware, including the data is not that much of an issue; as it is only text this will not really amount to a big increase in hardware costs.

I then asked the question; is there a standard format for morphological data.. Apparently there is no standard for it. So what to do when there is no standard. You can select one way of showing morphological data amd this will piss people off or you can create a new standard and piss off every one.

The good news is that we first have to deal with the things that come first; being able to include conjugations and inflections. So there is some time before it actually becomes relevant. The thing is it will, and it is likely that we will do something. When we select one way, I hope it will be flexible enough to allow for conversion to other formats as well.

