Monday, April 04, 2011

#Search in the languages of #India

When Santhosh wrote about searching understanding the application for his idea was hard. The idea is to search in multiple Indian languages and the rationale is that Indians speak more then one Indian language and therefore benefit from such a feature.

There are at least 452 Indian languages and not everybody speaks the same combination. How to use it is not immediately obvious. There is however one detail that does have obvious implications:
The inflections of the words സാമ്പാര്‍ – സാമ്പാറും, സാമ്പാറു etc are also found as results. This is the kind of search we need in Indic languages, not just the letter by letter comparison we do for English.
Having a search algorithm implemented that considers inflections within one language does make a big difference. Having this available for use within Indic Wikipedias will make them a much richer resource.

No comments: