Thursday, March 29, 2007

Scripts on the Internet

I am back from the ICANN conference in Lisbon and, I have the T-shirt to prove it :) At a meeting the BSI proposed to create a standard that will describe how a mandated list with a ccTLD for every country or territory is to be produced in other scripts. This would be an industry standard that would eventually be adopted by ISO. Having such a list would be truly beneficial to get to the stage where the registrars for these registries know what codes to use. I have been told that a list has already been compiled for the Arab and the Cyrillic script.

On the BBC-news website there was a great story that explains why it is so important to allow for content in the language that people speak ..

One other thing I learned is that you can already have a .org domain name that is in an UTF-8 script. To me this is quite important because it proves that supporting UTF-8 is something that can already be done. It would make such a difference if the Internet was as functional as it is for us people who read and write the Latin script.


Monday, March 26, 2007


Today I am at the ICANN conference in Lisbon. For me it is quite special to be here. One of the nice surprises was a gentleman editing on the Wikipedia article on Implicit Certificates. These are some special certificates for use in smaller devices. He mentioned that after writing this article, within the hour relevant edits were made to the topic.

The great thing for cryprographers is that Wikipedia provides both support for LaTeX and as relevantly, it can be updated when new developments happen. This makes Wikipedia an excellent environment to maintain information on this domain.


Thursday, March 22, 2007

Citizendium and licensing

In yet another diatribe Dr Sanger informs us about the differences between Wikipedia and Citizendium.

The thing that struck me most are two things; contributors have to give a non-exclusive license to Citizendium AND the license is now to be the Creative Commons CC-by-nc license. The consequence is that only the Citizendium organisation can license commercial use, obviously for a price. They assume that as it is to be written by experts it will have value. It also means that once licensed, the licensee can do whatever.

When you compare Citizendium with Wikipedia, you have an English only versus a multi-lingual project. You have a project that covers almost everything and a project with a few thousand articles. You have a Free project and a project that is increasingly restrictive. You have a project that informs the world with NPOV information and a project that is to be written by experts.

Really I think Dr Sanger is doing a great job promoting Wikipedia by increasing the differences.


Wednesday, March 21, 2007

Supporting the visually impaired

The content that is created in many Wikis is really relevant. Wikipedia for instance provides you with unsurpassed encyclopaedic type of information. This popularity can be deduced by Alexa's current ranking for today .. number 9. For us users and editors of Wikipedia it is like a roller-coaster; it is exhilarating :)

For people who are visually impaired, several of the Wikipedias have projects where people record the articles .. Really nice, really relevant.

Today I learned about new software promoted by UNESCO called Sakrament Libreader, it allows for text to speech conversion for English, Russian and Belarussian. Just consider, the English Wikipedia takes, according to Alexa, 53% of our traffic. There is a certain percentage people that are visually compared to the unimpaired.

Consider what would happen if the Wikimedia Foundation would support and promote this kind of functionality. Not only would many people be helped by this, it would also give an impetus to make this functionality available for other languages..


Saturday, March 17, 2007

IATE became available

IATE or the Inter-Agency Terminology Exchange became available as a resource on the Internet. An introduction to IATE explains nicely what IATE is about; it is about the terminology of an organisation; the European Union. It aims to demystify the jargon used by the organisations of the EU.

In the past IATE was accessible as well; it proved popular and it was quickly hidden behind a password. This time you can get access by using the URL and it hopefully means that this resource is now officially available.

When you compare IATE to what it replaces, it is a massive step backwards from a copyright point of view. EURADICAUTOM was available under a much less restrictive license. It would be nice if the EU would steal a page out of the US book; most of the information provided by that government is available without restrictions.


Tuesday, March 13, 2007

Google Summer of Code

Google has announced its third Google Summer of Code. This is an annual event where students develop on Open Source projects. This is definetly one of those activities that does a lot of good. It is one way whereby Google makes its mantra of "do no evil" work well.

For Open Progress, we have entered for a first time; we have a nice mix of MediaWiki and OmegaWiki based projects. All these projects are dear to us. We have shown Brion our list, and we are likely to work together on these.

What struck me is that when you apply for the GSOC, it is compulsory to have a mailing list. This is the traditional way of doing things. I am subscribed to many mailing lists. I think mailing lists suck big-time. There is so much repetition, the signal to noise ration is typically quite bad. I do not understand why people do not use a wiki to document and discuss.

I think this is one of those instances where software development proves to be conservative. When you follow the subjects you are interested in on a wiki, you can use watch lists to make a selection, you can use RSS to follow the changes on a low bandwidth wiki.

Because you refactor what is there, there is no need to repeat so much. When you have discussions that are getting out of hand, backrooms can be opened for those quarrelling. Maybe I am an idealist that I see it in this way .. oh well ..


Sunday, March 11, 2007

Fon or Meraki .. I want the functionality of both !!

In the field of Internet connectivity, WIFI is the thing that keeps a road warrior and Internet junkie sane. It also keeps him poor. The amount of money some companies dare to charge for connectivity is tantamount to high-way robbery. Particularly at places where you have to spend lots of time, like hotels and airports are really unfriendly places. Particularly in airports it is galling; you have to be there so many hours in advance, you are often delayed .. it would make more sense to provide it for free and in that way keep the punters happy.

With many Internet organisations increasingly dictatorial in what you can and cannot do, with intellectual property organisations only interested in filling the pockets of the big companies, WIFI is a next place where the people can create a commons away from these established malpractices.

Fon is a WIFI sharing organisation where you provide access to other fonistas by sharing the Internet connection. In one of the more innovative approaches they are targeting the neighbours of Starbucks with free routers. The business model is that you can pay a small amount for access to the network.

Meraki is a WIFI sharing organisation where you provide Internet connection to an area using a mesh network. By including repeaters in strategic places, the area covered can be quite extended and, multiple Internet access points can be part of the same network. When you operate a network, you can determine who can access the network and even provide access for money.

I would like to have a mix of both. At home Meraki would be ideal because my ISP uses a different technology from the ISP of my neighbour. This means that I will improve the connectivity both for myself and for my neighbours. It also allows for providing truly local information. Fon would be ideal when I am away, with an increasing number of fonistas it means that me providing connectivity is the assumption of good faith that will find its sweet rewards.

Combining the two would be awesome. I would not hesitate long to go that way.


Wednesday, March 07, 2007

Localisation of MediaWiki

There is a policy for new languages in the Wikimedia Foundation. One of the key things is that we want to prevent new abominations of projects where the language is not what is advertised. We have seen these in the past; one of the worst in this is what is called the Belarus Wikipedia; the people in control of this project prevent the use of the Belarus language as it is used in Belarus. This is a really awful situation and the language commission has asked the board repeatedly to act on this.

What we want to achieve is that new projects will promote cooperation in stead of establish division. Even though linguistically there is a substantial difference between the different forms of English, it is generally accepted that there will be only one English Wikipedia. This means that when the differences between two languages are less than for the different forms of English, the language committee is not likely to approve a new project.

When there is merit for a project in a new language, the people promoting this language have to show their commitment in the Incubator. This is where the environment in that language for the requested project is set up. What is expected, is that there will be a number of well written articles about different types of subjects, there should be a main page and, the most visible parts of the user interface should be localised.

The sad thing is that the aspiring projects cannot conform yet to the requirements; when a new Incubator project is set up, the message file for the new language is not created. What is needed is for one of the developers to create the necessary files so that the localisation can start.

For many languages in the past, there is no message files either; it means that the localisation is done locally and that this effort does not lead to the localisation of the MediaWiki software.
I am really pleased that for one language, Marathi, many of the messages have been imported into SVN by Nikerabbit. In a few days Marathi will be supported in all WMF projects. :)

I would welcome it when the Wikimedia Foundation gives the support of the minor projects a priority. At this moment it has none. The creation of message files in the Incubator for all languages and, when a language becomes a project, the inclusion of the first localisation into the MediaWiki software is the bare minimum. When we boast that we have localisations in some 250 languages, it should be a verifiable truth.


About dictionary writing

Connel MacKenzie is one of the English language Wiktionarians who has had a big influence on the development of the Wiktionary project. He published his notions about what a dictionary should be. As I have posted a response to what Erin McKean, the current editor in chief for the NOAD, said in a presentation at Google, it is nice to write about in response to this as well.

For Connel, the project is about the English language. This is a big difference in approach to what Wiktionary is said to be about. He wants to limit it to those words that are part of the 600.000 most common terms. This is problematic because how do you judge something to be common and, what is common in one branch of the English language might not necessarily be common in another. He is of the opinion that "freak" terms should only be there in a sanitized form. To me it is important that a term is clearly and fully explained. When you "sanitize", it is not clear if the full meaning survives for someone who does not know the term. By disallowing multiple word entries, you loose the connection to those entries that are single entries in another language..

In his commentary, Connel writes about the restrictions that faces Wiktionary that are the consequence of its flat file format. You can not segregate different types of content when the basic technology does not support it. At that he would be better off being part of OmegaWiki as its technology allows for all the things he is looking for.

He hopes to get a useful Wiktionary when he has a dozen programmers available for such a project. At the same time he despairs because of "the current anarchy" it may take ten to twenty years..

I do admire the constructive work Connel has put into Wiktionary. I doubt that Wiktionary will ever become useful other than as a resource where you can look things up on the Internet.


Tuesday, March 06, 2007

Upper ontologies

"An upper ontology attempts to create an ontology which describes very general concepts that are the same across all domains". As almost always there is a Wikipedia article about this. Given that the subject is difficult, there is since August 2006 a request to clean up this page .. I have to agree that the subject is interesting and potentially controversial.

For OmegaWiki, an upper ontology is one of those things.. An upper ontology defines the broad strokes, and while it descends downwards more and more aspects may be inherited from higher levels. This means that an upper ontology has practical implications. It also means that we will eventually have in effect an upper ontology by default.

As an upper ontology creates the concepts that are true across domains, Wikis for Professionals will want to define how their ontologies fit into the upper ontology. As they are bound to have overlaps with other domains, there will regularly be found to be in conflicts between domains. As one WfP needs to link into other domains, the question of primacy will raise its ugly head. One WfP was there first, but this other domain is not its competency... A new WfP does have the competency but is disrupts the existing WfP...

An initial selection of an upper ontology will be crucial, its evolution will be exceedingly important when OmegaWiki will is to be bound by the integration into it. Personally I expect that a different model will arise; one where on the one hand great care will be given to the evolution of the upper ontology while on the other hand functionality will be created irrespective of the upper ontology.

Consider, when you know that something is a plant, you can infer all kinds of things about it. It is not really necessary to know how it fits in the greater scheme of things. It would be nice, but it is not required. When a lot of functionality is determined in such a way, there will be the question how this will fit together; it is therefore my prediction that these two forces will eventually find a balance. As OmegaWiki matures, this balance will become increasingly stable.