Monday, April 17, 2006

What to do with stuff that is good but not standard

What to do if you can get a lot of content that is good in some respects and lousy in others. German uses different characters then the English language and there are ways to indicate an Ä/Ä, Ö–/ö, Üœ/ü or ߟ. When you are using German these characters should be used and not an ue for instance. So when should we accept content with German that is non standard ?

I have been thinking about this for a few days. The answer for me became obvious; it is in the database. When we have the MisSpelling table, we can have the community identify the words that should have an Umlaut. With proper logic the representation for our public will be the proper German with the umlauts.. But the first thing is to have the MisSpellings..

Thanks,
GerardM

4 comments:

MovGP0 said...

When you are meaning the US-ASCII Versions of the Umlauts, then there are the following Representations possible:

Ä ↔ Ae
ä ↔ ae
Ö ↔ Oe
ö ↔ oe
Ü ↔ Ue
u ↔ ue
ß ↔ sz → ss

The last one is also needed for Words witch are written with all letters Uppercase, because there is no big Letter for ß.

But I can't see any misspelling here - maybe you can give a example to clarify you're meaning.

GerardM said...

For German it is plain wrong and I do not really care for the US-ASCII when documenting German spelling.

Thanks,
GerardM

Anonymous said...

I don't think that a mispelling table is the right thing. Mispellings as transliterations and other characters representations are spellings after all. But we must be able to distinct those from the «right» spellings.

GerardM said...

When you consider a transliteration would you consider an expression of the original language or would you consider it an expression of the target language? Transliterations are seen by some as something you can "standardise" while others consider it to be specific to the target language.

Indeed, the "right" stuff will not be represented in the MisSpelling table. When we are to have transliterations, they will be seperately stored. At this moment in time WiktionaryZ does not support transliterations.

Thanks,
GerardM