Monday, January 09, 2006

What words do we want in WiktionaryZ

WiktionaryZ wants to have all words in all languages. So what words should we concentrate on? Brian0918 created a big list with some 306,390 English expressions. This list is a compilation of the entries in the "OED and AHD". To me there are several issues that should be addressed.
  • The way copyright infringements are signalled in relation to lexicological content
  • Do we want to emulate what others have done, or are we masters of our own destiny
  • Where are the French, the Swahili, the Kannada and the words of all the other languages that are equally deserving attention
People who compile lexicons, include information that normally would not make it into a dictionary. The reason why they do is because it functions as a "watermark". When this watermark is found this is considered proof of copyright infringement. When you combine the list of both the "OED and AHD", the prevailing theory would be that infringement is proven for both bodies of work.

With WiktionaryZ we have the opportunity to use Wikipedia as our corpus. In Wikipedia we find the words as they are used today. When we concentrate our effort on these words, we provide added value to the information contained in Wikipedia and at the same time Wikipedia adds value to WiktionaryZ because it allows us to show words in context. In Wikipedia we have people from many countries that contribute, they do use words that are normal in their locale. With the "OED and AHD" we do not necessarily get these words.

It is not a bad thing that people interested in English concentrate on English. As such I welcome this effort. However, in my opinion the emphasis is too much on main languages like English. In these languages it is hard for WiktionaryZ to become relevant. To become relevant we have to do things that others do not. Relevancy can be gained in many ways; translation to minor languages is a way for some, counting the characters in a word makes is a way for others.

From my perspective, we become relevant by harbouring communities, special interest groups and allow them to make WiktionaryZ their project. When we maintain our core values of Freedom, of inclusiveness, of non-discriminatory access, of open standards we will be relevant to some if not to all.

Post a Comment