Friday, February 10, 2006

What is a language

When I was in school, I had to learn several definitions for "intelligence". My favorite still is: "Intelligence is what the intelligence test measures". Many people use as their definition for what a language is; "a language is a language when it has its language code".

From a technical point of view, such a definition is beneficial. For all the major languages it is simple; it is obvious that there is a language code that describes them. For some languages there is a language code because people in the west take an interest, tlh is one code for one such language. For other languages it is more problematic, some of them have their code but that can make the problem worse. Some tools rely on the codes to be their and applicable; OmegaT, a CAT tool, for instance relies on the codes that exist in its programming environment. This is a serious problem because this programming environment supports ISO-639-1 and only with ISO-639-2 the code for the Neapolitan language became available. Consequently a translation tool does not support MANY languages. Even ISO-639-2 does not really help; the Kurdish language is acknowledged not to be a language; it is considered a language family that consists at least of three languages. These languages are acknowledged in the ISO/DIS-639-3.

While the ISO/DIS-639-3 is a huge improvement, it gets opposition from a few quarters. Many people, particularly developers of software, are of the opinion that some 8.000 languages is too much. Other people are of the opinion that the number of languages is not big enough. But also some languages that had support can be considered a dialect of another language like what is happening for the Twi language. How this will be appreciated by the people who speak Twi is anybodies guess. Twi is considered to be part of the Akan language, this article on the Akan language is indeed another example of the systematic lack of attention Africa gets.

For a CAT tool, it is really relevant that it allows its users to use the tool to its fullest potential. This does not mean that standards should not be supported, it means that multiple standards should be supported AND that you can introduce user defined languages as well.

