Sunday, March 27, 2011

Supporting multiple languages in one article

Many #Wikipedia articles include text in other languages. There are many reasons why this is the case but technically these foreign bodies are not marked as such. For people there is no problem, it is easy to spot what is in a different language. For computers rendering a text is not much of a problem either; if a character is not available in one font it will likely be in another.

When it is reasonably certain that the majority of our audience do not have a font, current practice is to include a screen shot. While functional, it is not the best solution because search engines will not be able to read these. For Wikimedia projects like Wiktionary, including foreign text is the rule and not the exception and many of its readers see the rectangles of the Unicode fonts missing on their system.

With multiple languages present in a text, a spell checker will mark much of a text as in error. It does this because it is not aware of changes in language.

With the languages in our articles properly marked, a spell checker can prevent obvious errors, a search engine can do its job and we can provide additional targeted support by providing input methods and web fonts.
Thanks,
      GerardM
Post a Comment