Sunday, March 06, 2011

#CLDR and #GLIBC - what makes a #standard for locales

When the standard for locales, the CLDR, does not provide the information needed, what are you to do. When the locale information in GLIBC conflicts with information in the standard what then?

In a perfect world, such dilemmas do not exist. In the real world they do.

There are two basic issues; the definitions of the locale information needs to be harmonised and the information in these formats needs to be merged, verified and codified. Finally there is culture.

In the standard world, the inclusion of new data can take years. In software, when people with the right seniority have gone through the motions of defining and testing a new locale, it goes into production and before you know it everyone using the latest version of GLIBC will support it in their application.

The GLIBC data for India for example supports Konkani and Sindhi. There are keyboard layouts, fonts and spell check dictionaries. Depending on possible collaboration on the font, Saurashtra can be supported in months, not years.

The problem is that GLIBC is great for everyone who uses it exclusively. Most people involved in language support in the Free Software world have to deal with the reality of an outside world where the CLDR defines the standard.

Saurashtra rendered with an experimental font

With the definitions harmonised and the existing data included in the CLDR, there will still be a need for developing support for new languages and locales.  When for every new language or locale the information is provided to the CLDR for inclusion it is only a matter of correcting the implementation for a language or a locale when the standard boys and girls find an error.


cibu cj said...

How do I see data included in GLIBC for a specific language, say, Konkani?

GerardM said...

Santhosh says:
$ cat /usr/share/i18n/locales/xx_YY
thats the locale definition file for an existing language