Wednesday, March 14, 2012

#multilingweb - #Arabic and #Hebrew are #RTL

Amir  asked me to ask at the Multilingual Web conference in conference to people like Richard Ishida:
"How hard would it be to allow assigning element directionality according to lang.  In HTML4 and in the current draft of HTML5, <span lang="ar"> has dir="ltr", unless specified otherwise, and I find it ridiculous". 
The usual replies Amir gets is:
  • Backwards compatibility
  • Many websites already use HTML-5 even though it is not a finished product. This will break them
His reply to this:
  • If a document explicitly specifies that it's HTML5, it should have directionality assignment by default.
  • Add an attribute to the root HTML tag, something like: <html dir="bylang"> or <html dirbylang="true">
Amir has aired his view before and an answer he gets from some "standards people" is: "It would be very problematic to do it, because most web developers don't use the lang attribute". This is rather funny because this will work only when the lang attribute is used.  And anyway at some point in time the public at large do not care really about previous versions of HTML as all the websites still in use will have moved on.

What we need is proper meta data for all languages and such data can hide very nicely inside a browser. Developers of websites do not know and want to know about all the linguistic niceties necessary to support a multi lingual web. It so many ways it makes sense to provide language support from inside the browser.

At the Wikimedia Foundation we we don't just support a lot of languages, we are also well aware of the languages we support. Ours is a world-wide community of people who have the opportunity to openly complain to us about bad support for their language and they expect that their complaint is actually read and is being taken care of.
Thanks,
     GerardM

2 comments:

GerardM said...

Remember why Flash is not used by the Wikimedia Foundation? It is not open source.
Thanks,
GerardM

Richard Ishida said...

At the W3C we are often asked by developers, particularly those who are not familiar with internationalization, whether it's possible to conflate the language attribute with one thing or another, but it is always problematic to overload the semantics of the language attribute in this way.

As with encodings and language, there is not always a one-to-one mapping between language and script, and therefore directionality. For example, Azerbaijani can be written using both right-to-left and left-to-right scripts, and the language code az can be relevant for either.

Furthermore, to manage inline bidirectional text you sometimes need to use multiple dir attributes in a way that would not necessarily correlate with what you'd expect when using language attributes.

In addition, the dir attribute can have values that aren't compatible with language attributes, such as the auto value that comes with HTML5 (and who knows, there may be more in the future).
In some other markup systems you may express directional overrides as values of the dir attribute.

These are values that do very different things than what the language attribute is intended for, and it's much cleaner (and simpler, in the end, too) to use separate constructs for the separate semantics.