A language like Picard is recognised as a language however, software that is in common use, be it proprietary or open source is not configured to establish a document as being in that language. The language recognition software of a Google is not yet able to recognise it by its characteristics. And as a bug keeps the Picard Wikipedia out of the Wikipedia traffic statistics, it can be argued that Picard does not exist on the Internet at all.
Evolution of percentages of English speaking Internet users and web pages |
The UNESCO research documents the issues measuring linguistic diversity from a traffic perspective on the Internet for a few languages but it does not look into what enables such traffic. It does not explain why it is so hard to extend the research to the long tail of the Internet traffic.
Part of the meta-data of document on the Internet or elsewhere, is an indication what language a document is in. Typically software only knows about a subset of the recognised languages. So one valid metric is, what languages do software allow you to write in. For OpenOffice for instance it is essential that the locale data is known in the CLDR. The CLDR data is public and, statistics can be created from its development. As you can imagine, there is no data for the Picard language ...
When UNESCO includes such statistics in its linguistic diversity report, it will become clear how much needs to be done in order to make support for linguistic diversity a reality.
Thanks,
GerardM
No comments:
Post a Comment