Tuesday, February 17, 2009

Apertium, or benefitting from computational linguistics

There are Wikipedias in many languages. Some languages are written in multiple scripts. And as long as one script has sufficient information it is possible to convert it into another script. When this can be done with a program, it prevents an awful lot of work.

Several languages have a Wikipedia with multiple scripts. For some like Chinese, there is software that converts from one script in another. For other languages it would be nice to have this as well. Fiji Hindi is one such; it is written in the Latin and the Devanagari script. A lot of work has already gone in the Latin localisation at translatewiki.net, and it would be really beneficial when we could automate correctly in the Devanagari script.

I had a talk with my friend Francis Tyers and he is quite willing to help with this. I know Francis from his work on the Apertium translation engine and conversions like this may be doable. Francis is happy to look into the possibility of such a conversion. The only thing we have to solve is how do we get the Latin text out and back into the message file again.

We are looking into the language part of this, now we need someone who can help us with the MediaWiki side of it.. Can you help?
