Wednesday, January 31, 2007

To skype or not using Skype

Skype is one of the most important tools for me. I have loads of people I do contact this way, they are all over the world. OmegaWiki is possible because of using VOIP is easy. I talk with people on every continent except for Antarctica. For me it proved to be an essential tool.

The problem is that several of my most relevant contacts do use Windows and the quality of the Linux implementation does not cut it. Skype freezes the sound system erratically or it just does not install. One of my friends wants me to use SIP; he threatened me not to talk to me any more.. Remember, this is a relevant contact to me.

We had a look around and we found that we can connect a SIP service to Google talk. This worked great; I can connect using Google talk. If there is one thing lacking, it is that I can not initiate the connection to the SIP servers, they have to connect to me.

As I needed to find out if SIP would work for me, I installed Gizmo project. It basically does everything that Skype does, it uses a Standard protocol, in stead of a proprietary protocol.

My conclusion; to skype I do not need Skype. I would probably be better off when all of my contacts would not be using Skype at all. The message to Skype is simple; the one thing they have is a user base. The support for Linux sucks and this makes people look for pastures green. So, FIX IT !!!


Monday, January 29, 2007

Scientific publishing

Scientific publishing, I read a year ago, is the most profitable part of the publishing business. It has increased its profit continuously more than inflation for the last fifty years. To be beneficial scientific publications need to be an essential tool for scientific development but they have degenerated to a point where many University libraries cannot afford the price for what should be an essential tool.

Newton said that "he could see further by standing on the shoulders of giants". In this day and age where it is effectively not possible to read all the literature published within a scientific speciality there is a crisis. The crisis is that it is impossible to read everything anyway and much of what is out there can not be read because it is impossible to afford it.

When a system is broken, inertia will prevent things from changing. When a tipping point is reached, things break, they break fast and if there is nothing to replace it, things break badly. For the publishers the prospects are bleak. They have made it plain that they plan to play dirty; they have hired a PR guy that is known to defend the indefensible. This can only bring the moment when things break that much closer.

By going for half truths and lies, they will alienate the very people they rely on. They will destroy this carefully created façade of an industry that is a benefit to science. They will destroy much of the goodwill that they depended on. In a way it is sad, in another way some call it evolution.


Saturday, January 27, 2007

Dictionary content in Wikipedia

In those days, there were articles in Wikipedia that were short. They were deemed to be too short, they were only a definition.. they were not even a stub. They got deleted. People started to add the kind of information you find in dictionaries. They got deleted. Wikipedia is an encyclopaedia was the argument. The quarrels were intense and at some stage Brion Vibber created Wiktionary, a project aimed to include all words of all languages.

Today I was back in the "kroeg", the Dutch Wikipedia village pump. The same argument still exists; there are still articles that are little more than definitions. They still get deleted. These deletions are protested against. In a way it was comfortable, I was back and it was as if nothing had changed.

The same old argument is used; copy the information to Wiktionary. What these Wikipedians do not know or forget is that Wiktionary has its own format and stuff that does not comply with the rules gets .. deleted.


Friday, January 26, 2007

Sample sentences

One of the functions in OmegaWiki that is currently not popular is the ability to add sample sentences. Sample sentences are a type of annotation on the level of synonyms and translations, they are "text attribute values".

You may appreciate that I look at many resources to learn what functionality might be relevant. What I am starting to appreciate is that there are two relevant approaches to sample sentences. One way to approach them is to have a modern sentence that demonstrates the word well. The other is to have quotes from great authors like Shakespeare. The "bard" may be a great writer, but many of his sentences are hard to appreciate by students that are new to the language. Often there is also some semantic drift that makes it even harder to understand what is meant.

For linguists, sample sentences of the earliest use and the later use of a word are important. There are therefore two constituencies for sample sentences. The question is how to deal with these; having annotations with the quotes make sense for the linguistic inclined ones. There may even be a need for a flag of "modern usage".

One approach that I really admire is the one taken by Logos in their Logosdictionary; they have functionality that shows you the word in a corpus. This way the sentence is never too short as it is sustained by the corpus itself. It would be cool to emulate this with Wikipedia content.


Wednesday, January 24, 2007


LinkedIn is one of those social networking websites; it allows you to build a network of people that you are associated with in a business sense. One of the persons I am associated with is Polish and his name contains the ń.

This should be no problem for a networking website; what software does not support UNICODE ? The name of my friend is spelled incorrectly and, for my network of people, it is essential that people can life every where on this globe and have their name in whatever character set.

I have asked LinkedIn to look into this. For me it is important.


Tuesday, January 23, 2007

Google ads

Google has noted that I have something with Tagalog. Their adsense targets me with an advert for translators who offer there services for Tagalog. I find it funny and note worthy as we do support this language for a month now in OmegaWiki. I am probably one of the few that have added Tagalog translations to OmegaWiki. It is uncanny because this is the only thing I am aware off having done with respect to this language.

When you understand this as a request for more people helping out with Tagalog, you are also right :)


Saturday, January 20, 2007

Toilet and doorknob

Terminology is everywhere. The need for translations is needed all the time. A dear friend of mine, is an engineer and is improving her Spanish. She does this with regularly speaking, chatting to someone who does speak .. Spanish and wants to improve his .. English. Both of them are into wikis and, I learned that some technical material in English was send and this resulted in this beautiful article on how to fix a running toilet. OmegaWiki does have the word for toilet but do we have flush, tank, lid, flapper, bowl, float, valve, overflow tube ??

The wikiHOW article is in need of translation for instance into Spanish, because there are running toilets wherever you find toilets. OmegaWiki would love to have the terminology... it is a doorknob (another word we should have :) ) that will open many more opportunities for adding terminology to OmegaWiki.


Wednesday, January 17, 2007

Standards .. we need them

I had another conversation with someone who has a big interest in the Apertium machine translation software. OmegaWiki has something to offer to offer to tools like Apertium; it is a database, it intends to have all words of all languages and last but not least it's data will be available under a Free license.

To be of relevance to tools like Apertium we need to have conjugations and inflections. This is something that we have always planned to include. The new thing for me is that it is desired to have morphological information as well. With the current costs of computer hardware, including the data is not that much of an issue; as it is only text this will not really amount to a big increase in hardware costs.

I then asked the question; is there a standard format for morphological data.. Apparently there is no standard for it. So what to do when there is no standard. You can select one way of showing morphological data amd this will piss people off or you can create a new standard and piss off every one.

The good news is that we first have to deal with the things that come first; being able to include conjugations and inflections. So there is some time before it actually becomes relevant. The thing is it will, and it is likely that we will do something. When we select one way, I hope it will be flexible enough to allow for conversion to other formats as well.


Tuesday, January 16, 2007

Blogger out of Beta

The Blogger software is no longer a beta product. I have been blogging for some time now with Blogger and now I can use this new functionality.

With the software going out of beta, a lot of great new functionality has become available to me. I am pleased to the extend that I want to say, "thank you". I like it that I can label my posts, that the options can be set for each individual post. That users can select the posts that have the same label. The way it is possible to customise the layout of my blog it is so much more WYSWYG. :)

Really, thanks,

Sunday, January 14, 2007

Some homegrown statistics

OmegaWiki grows, the statistics provided by Kipcool show that quite clearly. Both the number of DefinedMeanings and the number of Expressions show a similar rate of growth. The current ratio between them is 14,49 Expressions for every DefinedMeaning. At the moment it is slowly decreasing.

This is in a way counter intuitive; the number of languages is steadily growing and often more translations are added to existing DefinedMeanings. It demonstrates that many concepts are added without too many translations. These translations are likely to be added at some stage.

Particularly the DM curve I expect to flatten soon. I expect that realistically there is a limit to the number of concepts out there. With the bio-medical data that we want to include, I expect that the ration will get a beating but this will slowly but surely creep again. After all the ISO-639-6 expects to include at least some 25.000 linguistic entities ...


Saturday, January 13, 2007

Two friends changing their operating system

Two friend of mine changed the operating system on their laptop this week.

One changed from the top of the Windows Vista with all trimmings to Windows XP. He was not willing to pay for new applications. Vista is incompatible with previous Windows software I was told. This is very much a non event, it is just interesting

The other changed from XP to Gentoo Linux. To me this is a pain. He is not able to get skype to work. To me this is a major pain. What makes it worse is that as I consequence I cannot conveniently call him any more. His reaction is that I should not use proprietary software and use SIP compliant software myself. I already have two clients; Skype and Google talk and I am sad that both of these do not support Skype.

I got into a bit of a fight about the subject. To me, there is no such thing as a Linux. There are hundreds of distributions and all of these have there own requirements. I hope that that the LSB
will provide the necessary integration so that applications will work never mind the distribution. At this moment Linux is still very much an operating system for developers. Where it is a damnation when I utter this word, it is high praise for my friend..

Really, to me a computer is first there to be used. It has to be easy..

At this moment I would like to have a SIP client under Windows.. The thing I would like best is for Google Talk to finally support SIP..


Wednesday, January 10, 2007

Elder Futhark

Elder Futhark is the oldest form of the runic alphabet, used by Germanic tribes for Proto-Norse and other Migration period Germanic dialects of the 2nd to 8th centuries. It has some 24 runes and these and others are included in the Unicode.

I can easily imagine that people will want to include words in Proto-Norse in OmegaWiki. There is probably not much actual knowledge about this language. There is no ISO-639 code for it.. Well, when we really want to include it, we will. It will look funny on may people's computer; who has the fonts for runes ??


Sunday, January 07, 2007

A toe, a finger or a "prst"

In OmegaWiki, we have a problem in that we are busy beefing up the content. And sure, I am to blame for it myself as much as anyone else. The problem is that in order to achieve a substantial size in translations and thereby achieve some relevancy, it is easy to forget about the basic underpinning of OmegaWiki. The problem is that in several languages, there are no words as precise as the words in others. In Croatian, and some other languages, prst is the word used for both the finger and the toe. The consequence is that there should be another DefinedMeaning for prst describing it as "One of the five extremities that can be found on a hand or a foot".

Creating such a DM is not a problem however, I can not define a DM in Croatian. I can define it in English, Dutch. The issue is that a DM is defined as a combination of an Expression and a Definition in the same language. It does not stop me from creating a DM.. it does mean that we need to consider this scenario as well.


Friday, January 05, 2007

The OmegaWiki mantra

According to Guy Kawasaki every start up organisation should have a mantra. It should be short and to the point. Sabine ownes the perfect domain and that domain is the perfect mantra for OmegaWiki; "Words and more".

I think it is the best way of describing what we aim to do with OmegaWiki. I am sure that Sabine agrees..