It becomes even more interesting when the text is rich in all kinds of annotations like in this dictionary. It is the the first Malayalam-English-Malayalam Dictionary by Dr Herman Gundert. There are all kinds of opportunities here, ehm problems.
There is no OCR for Malayalam yet so it has to be crowd sourced. The annotations are cryptic and make only sense in a dead tree dictionary. When you digitize such a text and it remains a flat text and consequently it is extremely hard to use the public domain content elsewhere; in OmegaWiki or in Wiktionary for instance.
Santhosh is experimenting with Semantic MediaWiki . It will allow for exportable information and, that will make the data gained much more useful.
Thanks,
GerardM
No comments:
Post a Comment