Thursday, March 04, 2010

UNESCO document on measuring linguistic diversity

The UNESCO publication "Twelve years of measuring linguistic diversity in the Internet: balance and perspectives" is an update to a previous UNESCO study on this subject that was issued for the World Summit on the Information Society in 2005. It is based on research done from 1996 to 2008.

I browsed this document and I do not like it. It does include some parts that I could use but it seems to me to be a regurgitation of things that others have done. The percentage of languages used on the Internet is based on Google statistics. The problem is that Google only recognises a limited number of languages.

I would expect this report to include information on the things that prevent languages from getting a presence on the Internet; words like Unicode, CLDR even locale are not found. Information on a percentage of documents that accurately flag its language, another relevant statistic are missing.

I do applaud UNESCO for having an interest in linguistic diversity, I am not convinced that the methodology used for this document helps languages find their way to the Internet.
Thanks,
GerardM

3 comments:

CrisisMaven said...

thanks for pointing out this document, a course of study that I didn't yet I have in my comprehensive link lists for hundreds of thousands of statistical sources and indicators on my blog: Statistics Reference List. And what I find most fascinating is how data can be visualised nowadays with the graphical computing power of modern PCs, as in many of the dozens of examples in these Data Visualisation References. If you miss anything that I might be able to find for you or if you yourself want to share a resource, please leave a comment.

GerardM said...

The documant is available in English and French ... this is the French version http://unesdoc.unesco.org/images/0018/001870/187016f.pdf
Thanks,
GerardM

Daniel Prado said...

GerardM, please read carefully. The methodogy didn't use at all the system of recognition of Google (which is not really useful as you say). We applied our own system of detection of words by language. Google was just used ... as a search machine, it means, to search occurences of words.
Daniel Prado