Thursday, February 28, 2013

A challenge for #Wikidata

Not only #Wikipedia but also #Wikisource and #Wikiquote are in need for the #Wikidata treatment. OmegaWiki can demonstrate why. Julius Caesar has an article in all the languages with a translation. All that is needed is a link to Wikidata where Julius is known as "Q1048".

Linking to the quotes of Caesar on Wikiquote is not so easy; it has to be done for each language to the Wikiquote where there are quotes by Caesar. This is also needed for Wikisource for books or other publications. You may notice that Julius Caesar is a writer and a person. A writer publishes and any person may produce quotable quotes.

The challenge for Wikidata is to go the extra mile and rid us from those pesky interwiki links... on all the Wikimedia projects. These links can in return be used on all the projects. It also makes searching more interesting... Julius Caesar; what do you want? An article about him, his sources, his quotes..

What we at OmegaWiki would like is for Wikidata to go even further and incorporate our data. This will be a big benefit because it will allow people to search in Nepali for जुलियस सिजर and have a similar result at no extra cost as in English.

Denny, I do not agree

#Wikidata is a wiki and consequently Caesar can have a capitol according to Denny's blogpost. While I applaud the sentiment, it is not really practical.

Denny is the big man for Wikidata and he is a human and, he can be both right and wrong. So let me suggest why I do not support him in this and, let me suggest a compromise.

The one thing that will help is when there are classes and when those classes have attributes. For instance; a country has a national anthem, a capitol, a currency, a head of state etcetera. When you state that Caesar is a country, it then follows that he has a capitol, a currency, a head of state.. For Wikipedia this is really powerful. When all the countries have been labelled as such, it is easy and obvious to fill in the missing values for use in an info-box. The most valuable part of it is, that these values can be translated for use in info-boxes in other languages. Yes, the values can have their own Wikipedia articles as well as a label to use in an info-box.

When a label is used in an info-box, in any language, these labels are obviously of higher value than the  labels that are not used. So Denny can have his way and as long as labels are values when they are used in a Wikipedia all the extra labels will not be very much in the way. Hard-drives are cheap. It is possible to have classes with associated labels. These classes can be a associated with a particular info-box.

This is what Julius Caesar looks like at OmegaWiki. As you can see it is linked to Wikidata, it is linked to Wikipedia and, it is linked to Commons. It would be so cool when we have the best of both worlds in a joined project.

Please Denny, lets cross the Rubicon.

Wednesday, February 27, 2013

What statistics for #Wikipedia

The question was: "What statistic inspires you most". My answer; it depends what hat I am wearing.

The statistic that means a lot to me are the page views. However, they do not truly show how many PEOPLE view the pages. There are the bots that spoil the view. But there is so much history there.. Even with the bots you get a picture of growth.

However, wearing another hat, I want to see an aggregation of views of GLAM objects for the GLAM's we cooperate with. This is so vital for making the point that Wikipedia is making cultural heritage visible. It is the one argument everyone "understands".

Yes, there are many languages but for this I want to know how well does MediaWiki work for a language and this can be found best in the stats at

Really Erik, there is no one statistic that is best. There are horses for courses. What is vital that the statistics can be trusted. If anything, improved reliability of the data is what would help us most.

Wednesday, February 06, 2013

More traffic for the smaller #Wikipedia projects

The biggest Wikipedia in pageviews is the English language Wikipedia. Last month it generated 49.36% of all the traffic. This left 40.34% for the numbers 2 to nine and 10.3% for all the other Wikipedias.

Traffic is growing really well; 19% on a year to year basis for all the Wikipedias and 20% for the English Wikipedia. What the chart to the left shows is that slowly but surely the "other" languages are doing better than the top 10.

The current picture will change as it did in the past. Consider that the Dutch Wikipedia was bigger than the Chinese Wikipedia. The Dutch is growing at 16% while the Chinese is powering up the ladder with a healthy 46%.

There are many languages that have the potential to move on in the traffic rankings. This may be because the content in those languages improves or because the infrastructure in a country changes. I have reliable information that in India mobile traffic went from 55m to 85m in a matter of 6 months and this while it is too soon to attribute this to Wikipedia Zero. This is not obvious from the statistics published for India.

In a way I have been cheating; the numbers in the charts above exclude mobile traffic. This is because these numbers make it more obvious how well the "other" languages are doing. However, as you can see something similar is happening when mobile traffic is included.

With Wikipedia Zero growing in relevance, it seems obvious that content that is relevant to the people who use this service will grow in demand. As this is addressed in the languages people read, it will mean that the growth of popularity of Wikipedia articles will continue for some time. I also expect that the "other" languages will become more relevant and assertive.

Sunday, February 03, 2013

#Wikivoyage #statistics II

The first month of Wikivoyage statistics is in. The page views statistics are now regularly updated. There is now a baseline to compare future months with. Last week I blogged about Wikivoyage statistics and at that time traffic was expected to reach 12.8 M. As you can see in the screen dump, it ended up at 17.3 M.

Over time Wikivoyage will become increasingly Wikivoyage. It will slowly but surely morph into something that can be distinguished from Wikitravel. Once that process is well under way, I am sure that there will be a growing public for this Wikimedia project.

Saturday, February 02, 2013

#CLDR gets the sorting right

When you sort, order will be created in the predetermined way. Another word for such a predetermined way is called the "collation order". When you sort tea, you make sure that only the tea leaves of sufficient quality are left. The characters in the words determine where the word can be found in a sorted list.

The collation order is a standard and, the CLDR is the name of the standard. Unicode, the organisation behind the CLDR has made a big change in the order. From now on, the character of the script of the language take precedence over the Latin script.

This change affects many languages and, there is a document mentioning them all.