Monday, May 28, 2012

#Language, #script, #Unicode, #font and web fonts

Making the Internet globally accessible is more then running cables. It is also about making sure that you can read and write any language. Once all this is in place, people enabled in this way can share in the sum of all knowledge.

There are few people as intimately involved in supporting languages than Michael Everson. He is known for encoding scripts into Unicode, this requires both technical and linguistic expertise and finally he is a publisher of books written in minority languages.

Michael at Chogh Zanbil - Cuneiform .. :)
Are all scripts registered yet ... do we know them all in ISO 15924?
No, not at all. The best-known scripts have been given four-letter codes in ISO 15924, but we tend to be conservative for lesser-used scripts, and try to co-ordinate with proposals for encoding them in the Universal Character Set (a.k.a. ISO/IEC 10646 or Unicode).

Several scripts are not yet encoded in Unicode. Many of them are used by living languages.. How do languages cope?
A script (or character) not encoded can't be used in interchange. People can either use the Private Use Area or hack an existing encoding.

What does it do to the cultures involved ?
The lack of an encoded script prevents a language from using its script effectively in any computer environment.

Is it known how many scripts used by living languages are not yet encoded ?
I don't think we have kept a quantitative inventory. And we always discover something new. I know of a number of specialist scripts like SignWriting and Blissymbols which have not been encoded. We are working on some other scripts, like Woleai and Afáka, and a number of West African scripts, but it is very difficult to contact the user communities to get feedback. There is a huge technological divide. (Not for SignWriting or Blissymbols: for those the problem is a lack of funding to do the work.)

Is it known how many scripts used by dead languages are not yet encoded ?
Again, we don't keep a quantitative inventory. The Roadmaps on the Unicode site are as good a checklist as anything.

Several scripts are encoded but there is no freely licensed font for them. Why is this not part of the process of encoding for Unicode ??
The Universal Character Set is a character set. Both ISO/IEC JTC1/SC2/WG2 and the Unicode Technical Committee work to study character and script proposals, give the characters the right properties, and get them encoded. It is not the function of either committee to establish implementations, or to give them away. The work is already voluntary (and expensive).

MediaWiki supports web fonts ... What relevance does this have for you, what opportunities are there for the Wikimedia communities
It is a great opportunity for Wikimedia to exploit some of the generosity of the many people who have donated to the foundation, and to make good use of the skills of people who have expertise in the Universal Character Set and in font design.

What impact will the availability of freely licensed fonts have on the availability of information in those scripts
For instance, right now anyone viewing any Wikipedia in any language may encounter text in Ol Chiki, or in Runic, or in the simple International Phonetic Alphabet, and pages have to apologize to the reader because their computer may not display the material correctly. This is *bad* for the encyclopaedia.

What difference would it make if the Wikimedia Foundation were to become a player in the development of fonts
People using the encyclopaedia would be able to see the information without worrying about seeing ☐☐☐☐☐☐ ☐☐☐☐☐! From a personal point of view, I can say that at various conferences over the past two years, I have spoken with people in the Wikimedia Foundation, and with people from another very large organization, about this matter -- specifically about exploiting my own expertise in the Universal Character Set and in the provision of rare scripts and characters in web fonts -- yet nothing has resulted. I think the message has got through. But so far no one in either organization has decided to take the necessary principled decision that in order to ensure that the information in the Free Encyclopaedia is actually available to people who use it, complete UCS support should be provided in a suite of freely-available and maintained webfonts.

Provenance is the basis for the establishment of facts. Is transcription in the original script essential ?
Why wouldn't it be? That's the source text. Encoding it correctly means that it can be interpreted by the reader if he or she wishes to consult the primary source. Anything else obliges the reader to use someone else's interpretation. Of course expertise is needed, but the closer one can get to the primary source, the better.

Michael, why "Alice's Adventures in Wonderland" ?
I love languages, and it has been a great honour for me to publish Alice for the first time in a number of minority languages which might otherwise never have seen the text. Alice is available in the following languages: Cornish, English, Esperanto (Kearney), Esperanto (Broadribb), French, German, Hawaiian, Irish, Italian, Jèrriais, Latin, Lingua Franca Nova, Low German, Manx, Mennonite Low German, Borain Picard, Scots, Swedish, Ulster Scots and Welsh  and several others translations are being prepared.

No comments: