Wednesday, March 30, 2011

Using the Ş or Ș In #Romanian

The definition of the subset of #Unicode characters used for the Romanian language is quite clear; only the Ș and the ș is correct for the șe. This does not mean however that everybody uses the comma under the S and not a cedilla.

Before Unicode became common place typing the comma under a character was really hard. As a consequence many, many expressions of Romanian are erroneous. People got used to writing with a cedilla.

With the later versions of MS Windows, the keyboard mapping for Romanian made it easy to write correct Romanian. For the people stuck with a wrong keyboard we can easily have Narayam provide a modern input method for Romanian.

This is currently a big deal for the Romanian and English language Wiktionaries. They are in the process of correcting every occurrence of a wrongly written șe and țe. This is quite an undertaking because it affects interwiki links to other Wiktionaries as well.

It also means that they want to ensure that only a correct șe or țe is written in Romanian. This is complicated by the fact that a Ş is correct in for instance Turkish. Being able to identify a text for its language is therefore quite important.

The solution currently implemented on the Romanian Wikipedia is that any t or s with a cedilla is converted to a proper șe or țe. As a consequence Turkish names of people and places are likely to be spelled incorrectly.

PS Please note that the font used for the title does not cope with these characters.

No comments: