Monday, July 25, 2011

#Unicode is not that complicated

When good people associate Unicode with this #XKCD strip, things are wrong, seriously wrong. The value of a standard is in its acceptance. Unicode is best known for its work on scripts and it is this work that makes Wikipedia possible. Without the standardisation brought about by Unicode it would be impossible to support the 270+ languages who have a Wikipedia and, the languages working towards their Wiki in the Incubator.

As there is confusion about Unicode lets analyse what is said. First of all, Unicode is a work in progress. As a consequence there is no guarantee that every script has been encoded and, there is even less of a guarantee that a font is available let alone a freely licensed font. One reason why the situation is not good is because many organisations consider the development of Unicode encodings for scripts and fonts out of scope.

Unicode is developing an additional standard, the CLDR or the Common Locale Data Repository. While this standard is important, many of the languages supported by the Wikimedia Foundation are not represented and languages that are represented do not have complete data. Consequently there is no authoritative source for many a date format, a currency format, the sorting order ...

Unicode is a consortium of companies and organisations. Particularly the companies have an interest in ensuring that the technology represents its investments. When individuals are singled out negatively, particularly people like Michael Everson who does not represent company interests, who is involved in open content, it becomes clear that Unicode should to do what it is supposed to do and provide support for languages, all languages and scripts, all scripts.

