Friday, December 24, 2010

The #Malayalam enigma

Traffic for the Malayalam Wikipedia has gone down dramatically while the number of articles and the number of editors has gone up. WHY??

When you look at the statistics, you will see that traffic halved in a couple of months. So what happened... and as importantly can we do something about it and do we need support to turn this situation around.

The first clue can be found on the main page. It says:

Reading Problems? Click here

 There are two scenarios; either you see no text or there are too many spelling errors.

When you look up the word "Karnataka" in English on Google, you get about 10,700,000 results. When you look up the word "കർണ്ണാടക" in Malayalam in Unicode 5.1 you get 2,760 results and when you look up the word "കര്‍ണ്ണാടക" in Malayalam in Unicode 5.0 you get 117,000 results.

Are you surprised when Wikipedia uses the latest edition of Unicode? Are you surprised when Google makes a difference between two words that are actually the same word written with different characters? The real surprise is that the Malayalam community has found a way to show the same text with fonts in either version of Unicode.

The big fall in traffic is because many people use typing tools that produce Unicode 5.0 to type and search the keywords. Search engines understand them as different. There needs to be canonical equivalence between the old and new Unicode. In the above example, both കര്‍ണ്ണാടക and കർണ്ണാടക should give the same results.

The question then becomes how do we convince the search giants of this world to include both versions of the written words in their results. The big fall in traffic is due to the conversion of all the Malayalam texts to Unicode 5.1.

The good news is that our Malayalam community is quite happy to help the search engines with conversion software because we want our traffic back.
