Tuesday, October 31, 2006

The importance of good standards

At the Internet Governance Forum Mr Vint Cerf has said that changing the way the Internet works to accommodate a multi-lingual Internet raises concerns. The question raised in the BBC article is whether it is a technical issue or not.

Interoperability on the Internet is possible when the standards used are such that interoperability is possible. The current URL system is based on the Latin script. This was a sensible choice in the days when computing was developed in America. It made sense in a world when the only script supported by all computers was the Latin script. In these days computers all support UTF-8. All modern computers can support any script out of the box. This means that all computers are inherently able to display all characters. This does however not mean that all computers are able to display all scripts; even my computer does not support all scripts and I have spend considerable time adding all kinds of scripts to my operating system.

The next issue is for content on the Internet to be properly indicated as to what language they are. Here there is a big technical issue. The issue is that the standards only acknowledge the existence of a subset of languages. The result of this is that it is not possible to indicate using the existing standards what language any text is in.

Yes, the net will fragment in parts what will be "seen" by some and not "seen" by others. This is however not necessarily because of technical restrictions but much more because the people involved and the services involved do not support what is in the other script, the other language. When I for instance ask Google to find лошадь or paard, I get completely different results even though I am asking both times for information about the Equus caballus. In essence this split of the Internet already exists. The question seems to me to be much more about how to make a system that is interoperable.

The Internet is interoperable because of the standards that underlay it. With the emancipation of the Internet users outside of it's original area, these standards have to become usable for both the users of the Latin, Cyrillic, Arabic, Han and other scripts. It seems to me that at the core of this technical problem is the fact that the current standards are completely Latin oriented and also truly focused on what used to be good. At this moment the codes that are used are considered to be human readable. I would argue that this is increasingly not the case as many of these codes are only there for computers to use. When this becomes accepted fact, it will be less relevant what these codes look like because their relevance will be in them being unambiguous.

For those who have read this blog before, it will be no surprise that the current lack of support of ISO-639-3 for language names is one of my hobby horses. As I have covered this subject before I will not do this again. What I do want to point out that insisting on "backwards compatibility" is more likely to break the current mould of what is the Internet than preserve it.

Post a Comment