Tuesday, June 26, 2007

What language would be the one that provides the DefinedMeaning

At OmegaWiki we are of the opinion that the most valuable resource we have is the time of our contributors. It was what motivated to think beyond Wiktionary. We were wasting our time because we were not only interested in only one language. What we were working on was nice, but it was hard work to be relevant and we were with too few people to do the work well.

Now that we have OmegaWiki, we can enter data once and it is there for everyone. We now already have over 3.000 Expressions in Georgian, they are available to people in a Georgian, an English, a Dutch and many more interfaces. This makes OmegaWiki more relevant for Georgian then the Georgian Wiktionary; it has only 34 articles. We have some 11.000 Italian Expressions, this is roughly equivalent to the number of articles in the Italian Wiktionary that are actually about an Italian word. Our Expressions are available in all those user interfaces..

We are open to work together with other organisations, people. This is something that is also a standard practice in the Wiktionaries; much of the content is from different sources and is uploaded by bot. It is however sad that once the data has been imported, it is no longer possible to contribute back to the original source.

We are discussing how to collaborate with a great resource with more than 50.000 concepts in one of the major African languages with translations in English. This will get us enough concepts to have a relevant amount of content in two languages. The point is, they create their data in a completely different way; people suggest a word with a definition and an editor validates it and publishes it. So what could such a collaboration look like..

Well when our community adds content in this language, it can be seen as a suggestion to their editor. With new content accepted by the editor, it can be added to their database. Now here is the bit that some may find controversial; they have the copyright to their database and our choice of the CC-by license makes this collaboration possible. They retain their copyright, their license. This is not possible with the GFDL.

When we are going to import this data, what language would be the one that provides the defining Expression and the defining Definition.. I would say this African language :)


Monday, June 25, 2007

Bolango anyone ??

Bolango is a language. It is spoken by some 20.000 people in Sulawesi, Indonesia. There is also a place called Bolango where some 5.000 people speak this language. That is to say, in 1981 there were some 5000 people who spoke that language in Bolango, how many people are still speaking Bolango I have no way of knowing.

Bolango is the 1000th entry on OmegaWiki of the ISO 639-3 collection. For most if not all the languages recognised by this standard we have a portal, there are over 7000 of these, so many more words will need to be added before this collection is complete. This notion of a work in progress makes it very much a wiki. As people work on particular content, OmegaWiki becomes more relevant for such a domain. Typically there are not that many words needed to cover the concepts associated with a specific subject and typically once these associated concepts have been defined texts about any of these concepts will not yield many more new terms.

Working on a list like the ISO 639-3 is different. It is a list and all these languages are quite distinct. It is particularly the language of linguists that connects any language with another. Having a collection like the one I am working on becomes relevant when people involved in languages find it relevant. Until that time, it seems like stamp collecting :)


NB I just created a Bolango stub on Wikipedia

Sunday, June 24, 2007

Did I miss something

With increasing amusement I have been reading Kelly Martin's assessments of the people aspiring to become WMF board members. Many of the candidates have had the pleasure of Kelly's caustic pen. At some stage however, it got too much for me. Her latest had a few gems in it. Frieda, who happens to be the president of the Italian chapter, was attacked because a photo was showing some cleavage.. I had to go back and see.. did I miss something ??

Giving Frieda's position it is only natural that she wants to make the chapters more prominent. Kelly's assessment that the Italian chapter is run as a social club and chapters are for social networking.. Well, it is clear that Kelly needs some education on this. Given that she refers to Clay Shirkey's hell, I wonder what her role would be in such a social network.

In her "Nonbovine Ruminations" Erik Moeller was seen in at least as friendly a light. Erik took the trouble to answer; his conclusion after a lot of debunking: "Nothing I could say, no explanation I could offer, would allow you to rationally analyze what is really happening. You know all about me already, after all."

The high point comes in what Kelly has to say about Danny.. Here I feature as "Erik's trained attack dog". Well, I must show you sometime the certificate to prove the training I had to qualify.

Given all these pleasantries, there is a blog entry that explains why Kelly does not run herself: "it would only be good to get a great deal of attention (that is, drama)". Well, apparantly she has a craving for drama, it is the best way I can explain her ruminations. It also expect that this strenghtens her position in her home community.. in her own words: "I am widely disliked at the English Wikipedia".

The voting for three seats of the WMF board is imminent. I hope that the people who bother to vote take the time to read what board members do, what candidates have as their platform and particularly come to an understanding if the person they consider to vote for will have a positive and collaborative influence on our organisation. There is a lot of work to be done and when the board is burdened by people with a negative attitude, outlook it will make it that much harder to get something done.


Friday, June 22, 2007

Godwin's law

The Polish want to get their way in the European Union. They are talking about the number of Poles that would exist when there would not have been the second world war. There would have been SO many more Poles...

Well, think of it this way, if there had not been a second world war, how many Germans would have been there... or Russians... or Jews.

Really, our Mr Godwin is right; do not use the Nazis, the second world war as an argument.. it only demonstrates that you failed.


Wednesday, June 20, 2007

Logos dictionary

The Logos dictionary is one of my favourite resources on the Internet. I know the people well, I respect their work and their dedication, I even worked there for a month.

Logos has the ambition to make their resource the premier lexical resource on the Internet. They work hard on it and, I regret that we are not working together. We are not working together because the basis for cooperation needs to be very much a more wiki way of doing things. The software Logos uses does not provide a Wiki environment and the structure of the database does not allow for functionality that I would expect. This is one reason why we have persisted in developing OmegaWiki.

One issue that prevented cooperation was a license. the Logos dictionary did not have one, all it had was a copyright statement. This was fixed at the time, a GFDL license was added to the copyright. This means that all of the data was and has always been copyright Logos.

Sadly I became aware that Logos has removed the license information. Being the copyright holder to all the data they are allowed to do this. For collaborators and users of the Logos dictionary it is now less clear what the status is of this data. When you translate the Logos "Quote of the Day", are you entitled to attribution for instance ... ??

Facts cannot be copyrighted, collections of facts can be copyrighted. Really, without a license it is less clear what you can do with the data that is and has been for such a long time been freely available on the Internet. Be advised that this resource is older than any of the licenses that are currently so popular..

Really I love to collaborate with anyone but I would especially love to collaborate with Rodrigo, Cinzia, Gianni and Magdalena..


Tuesday, June 19, 2007

Why I should join Citizendium

Today was yet another day to swell my head. I was invited to a discussion with one of the greats of the Internet. We talked for more than an hour and I was told that I was one of the thinkers of the Internet...

Thirty minutes later I received an e-mail from this amazing professional and academic organisation that started with "Dear Dr. Meijssen". In this e-mail I was urged to do something. Something I now have done...

I am part of an organisation that states that I am so really cool. Well, I work feverishly to achieve what this organisation wants to do. The other people in this organisation ARE doctors, professors, leading in professional organisations...

So Dr Sanger what do you say, should I join Citizendium ??


Saturday, June 16, 2007


Yesterday I had a meeting for the second time in the village of Lage Vuursche. The first time we were in a nice restaurant. It did not have WIFI for its customers. "There is no demand for this." I found this reaction astounding as I had just asked about it.

This time we were in the next door restaurant the "Kastanjehof". It did have WIFI; we had called prior to going there. We sat outside, it had stopped raining, we had our talk and our WIFI.


Friday, June 15, 2007

Meta data for education

Wikipedia has proven itself to be really valuable for educational purposes. This is best illustrated by the support the Wikimedia Foundation gets from Kennisnet. When other countries would follow the example of the Netherlands, the WMF would not have a cash crunch.

Given that Wikipedia is particularly used by students, it makes sense to provide a better service to the educational process. The IEEE-LOM is a standard way of describing learning object meta data. Meta data can be associated with many of the resources that exists in the many projects of the WMF. There are two issues to consider; the data is relational in nature and the data has to be made available.

At the Holland Open Software Conference I met Erik Duval who is a professor from Leuven and who has a wealth of expertise on the IEEE LOM. He astounded me when he said that people should not enter the IEEE LOM data, automated processes should. Given that Erik is in the organisation behind the IEEE LOM standard, it means that for some of the dialects of this standard such processes must exist.

The data is relational. With the functionality that we created for OmegaWiki, we can have relational data natively in a MediaWiki environment. It is just a matter of associating this data with the articles that need tagging. This is not rocket science but the application of parts that already exist.

For the Wikimedia Foundation supporting educational meta data is a great method of making its content more relevant. The use that it will generate in education will provide a powerful argument why providing sustainability and investment in its organisation is in the interest of the national educational systems.

I invite the Wikimedia Foundation to consider this, not only as a method of raising funds, but first and foremost as a way to make good on its aim; to provide information to the world.


Thursday, June 14, 2007

More Anthere at the Holland Open Software Conference

I have been on the lookout for the video registration of Anthere's presentation at the Holland Open Sofware Conference, it has not surfaced yet. What I did find was an exclusive interview that Anthere gave to the ANP. As ANP is a press agency, the same content could be found in several places, for instance here and here.

Several topics that were not covered in her speech can be found in the interview:
  • Anthere expressed the wish to adapt our software and improve the accessibility, the visually impaired are specifically mentioned
  • There is more about promoting our content in developing countries, not only do we want to make our content available as relevant is making their information available in our projects
  • With the growth of the Wikimedia staff, it is increasingly important for the WMF to be well organised and not only have finance be the top priority
  • Quality has a high priority particularly in the bigger projects
    • Self promotion is mentioned
    • The lack of sources is mentioned as a reason for deletion on some projects
    • Anthere indicates that she would not follow the Citizendium model where specialists check up on articles, they are fallible too.

What do you do for the smaller projects

As I mentioned earlier, Anthere gave a great presentation at the Holland Open Software Conference. One question was asked and answered and it is still going around in my head. "What does the Wikimedia Foundation do for the smaller projects and languages ?" Anthere's answer was truthful; her answer was that we do not do much. When it does not happen now, we will wait for it to happen later.

It is a truthful answer and given the resources we have as an organisation, there is not much more that we can do. There are all the issues, all the dramas all the opportunities of the big projects to deal with. And these are to be dealt with either by the community of a project itself or by one of the staff of ten people that is to keep some of the biggest websites of the world going.

I discussed this with people like Sabine, and the conclusion is very much, there is no Wikipedia. There is an English language, a German language, a Neapolitan language etc Wikipedia. They are all individual projects. They may share many of the basic values, but in the end these communities, these projects are very much left to themselves.

So what do we do for the smaller projects. We very much want these projects to succeed. We now insist on some initial content and some initial localisation before we start a project in a new language but really once they are started they are on their own. There is no evaluation, no monitoring of the project and only when things are deemed to be REALLY problematic it may get attention.

So what should we do for the smaller projects. There are people, organisations who are willing to pay money for content in specific languages. This content can be truly in the spirit of the Wikipedia, it may be geared towards certain subjects. One of the best reasons for accepting and promoting this is that by creating a supply, a demand will follow.

Having created content on many Wikis, we have a grasp of what it takes to create content. The most relevant deliverable however is not the content, the hardest and most valuable deliverable is the creation of a community. This is hard because you cannot buy a community, this is hard because it is not clear what a community will consider to be important and this is hard because their opinion may not coincide with what is important to you or to an organisation that makes a content creation project possible.

We live in a world where deliverables need to be measurable. Content creation can be measured; you can pay a translator or a writer. You can spend money and deliver a product that includes interwiki, wiki links, images and conforms to style guides. But you can not guarantee that you build a community at the same time. Building a community takes time, it means that the people that make up this community need to be able to influence the process. All the right things can be done, but there is no guarantee that an autonomous community will evolve.

Anthere is right in many ways when she says we do not do much to help new projects. The Wikimedia Foundation cannot do much because it does not have the resources and it will happen; the smaller projects will take off. This process can be helped along by the creation of content. Growing the projects is a process, it takes people and effort. The growth of projects can be accelerated with investment however the mix has to be right to make a project truly part of the Wiki movement.


Wednesday, June 13, 2007

The day after the Holland Open Software Conference

Anthere gave the keynote speech in Amsterdam at the Holland Open Software Conference. It was a great speech, it was so great because the speech was wholly dedicated to the Wikimedia organisation and its finances. This was possible because there was nobody at the conference that indicated not to know what Wikipedia was. For me there was little that was new, it was however great to hear it put coherently together. Many avenues of funding are closed to the Wikimedia Foundation because of the real or perceived problems that will result from within the community. Many avenues of funding are still very much closed because there is not sufficient staff to deal with acquiring funding. Anthere did a great job, it was a great opener because it put the sustainability of Open/Free projects as an issue very much on the map of the conference.

It was well published on the Dutch Wikimedia projects that the community was invited to attend the conference. It was cool to see Siebrand, Galwaygirl, Effeietsanders, Brabo. It is however a mystery to me where all the others were. This was a high grade conference with many speeches extremely relevant to what we do and there were so few of us.. A missed opportunity..

For me there were several new connections to make. I liked what Erik Duval said about forms to fill in for meta data like IEEE-LOM. He knows that much of this can be done in an automated way. I liked many of the things Massimo Mauro had to say about the European Union; I hope we will be able to cooperate in OmegaWiki. I loved the presentation of Eliane Metni, she connected the spirit of the open and free community with education in a compelling way; it is great that there is so much but it has to be adaptable in order to be useful in the setting of an educational process. A tool may be great but when it does not fit it is useless. There were many more but these are some of the highlights for me.

Next year the Holland Open? I will do my best to be there again :)


Sunday, June 10, 2007

The point of OmegaWiki

Sabine wrote a great blog the other day about how combining content from Wikinews Positano news and OmegaWiki creates something that is really fascinating. Please read her blog first if you have not read it yet.

The great thing is that things have moved forward already. Martin Mai, who is with the University Bamberg, has written a first incarnation of software that assist the reading of an article on the Positano News. It provides you pertinent information one click away; it gives you translations and definitions it even allows you to go to for more details to OmegaWiki.

For me the most important point is that much more will be possible because OmegaWiki has its data in a database and not as a MediaWiki page. What Martin created is very much a mock up. It even works, but it is only a start. More is possible, it needs a fertile mind and some cool programming.


Monday, June 04, 2007


There is a long article in Informationweek about reputation. I read it with much interest. It covers most of the ground. The question that I do not find answered is, what does reputation buy me and, why would I care for an on line reputation.

My reputation is a consequence of the things that I have done. Some of the things I do have made me recognisable, I learned at a Wiki meet that my standard salutation, "Hoi" made me the first person that was recognised as an individual by someone who is working hard to understand the Wikimedia Foundation. This is my 250th entry in this blog, a growing group of people read this. I have written tons of articles in Wikipedia, Wiktionary and Commons and am now particularly active in OmegaWiki. But my reputation as I have it is a consequence of what I have done. It does not give me necessarily credibility; in Doctor Sanger's eyes it won't as he advocates certification in stead of reputation.

Much of the talk about reputations is defensive in nature; it is about vandalism, about pretensions, about why we should trust a resource and to what extend. This negative emphasis is self defeating because it does not value how a reputation helps in achieving goals. The cost of this absolute negative appreciation is that after a controversy a person like Essjay is no longer considered. He was once one of the most valuable Wikipedians and this was based on the good work work that could be observed.

The biggest problem that I can see with looking at reputation in a negative way is that you do not allow people to be wrong and the consequence is that this does frighten people off. People with a stellar reputation find it necessary to write under a pseudonym in Wikipedia because they are fearful of their reputation. When people are to be identified it does prevent people from contributing.

It is valuable to know what things are wrong. Scientific publications are a celebration of the positive discoveries. The dark side is that the discovery of things that are wrong is not as readily published, known. Many,many experiments are repeated over and over again because the fact and the proof why something is wrong is not published.

To me, a person gains his reputation by the work that he does. The more work done means the more opportunity for issues to arise it is however only the people that do that make mistakes. The people that allow themselves to be wrong should be celebrated. They are the giants on whose shoulders we can see further.


Friday, June 01, 2007

A long tale ISO 3166-2:US

At OmegaWiki we are awaiting some new functionality.. Erik is away for a meeting and it will go life when he gets back. In the mean time I have been playing with a new collection; the ISO 3166-2:US is a list of US American administrative units including states and territories.

I was prompted by a guy who is really working hard to add vocabulary in Khmer. He had already done the countries and territories of this world and he followed that one up with this info. What I really liked was the distribution of the translations in other countries.. There was already quite a lot. There were the languages that were almost complete, while adding the collection I added the Dutch translations.

I hope people will find the collections a good start for adding translations in their language. Particularly the Swadesh list and the language lists are relevant. The language lists are particularly relevant because they are used in the user interface.

What you notice in all these collections is that there are some that are complete or almost complete and then there is a long tail of languages with some translations.. With time we will get more languages but we will also get more languages completed for the collections.