Thursday, September 08, 2011

The name is Bon, James Bon

Santhosh is one of the special agents in the fight to bring language support to the Internet. Identifying him in English is easy; all the characters used to transliterate സന്തോഷ് തോട്ടിങ്ങല്‍ are available to identify Santhosh for who he is.

The last character in his name is the "zwj". According to some, this character is not available for identification purposes. Without the "zwj", the name looks different:
  • സന്തോഷ് തോട്ടിങ്ങല്‍
  • സന്തോഷ് തോട്ടിങ്ങല്
Santhosh is a Malayalam name and for Malayalam there is an alternative way of writing Unicode. So technically it is possible to transliterate the "zwj" out of his name for Malayalam.

It becomes more interesting when you write Sri Lankha in Singhala. this cannot be done without a "zwj". From a Wikimedia Foundation point of view, the Unicode report "Unicode Identifier and Pattern Syntax" assumes for many languages that they are "aspirational" or "limited" use, is not really workable. Our aim is to have support for all scripts and identifying people by their name; their real name.

As we do identify people, an implementation of this Unicode specification is important to us. Having people like Mr തോട്ടിങ്ങല്‍ in the drivers seat will surely get us a best result. It may even get us a reference implementation.

