Saturday, August 31, 2013

A #chicken and #egg issue for #Wikidata

Many a #Wikipedia uses bots to generate articles. The same data is often used to create articles or to enrich articles on multiple Wikipedias. When this is first uploaded to Wikidata, there are already Wikipedias that do not need any update. With some clever templates, the information will appear automagically. Only when an item does not have an article, the article needs to be created with an appropriate template. The article needs to be linked to Wikidata for the data to be found.

There are issues with the execution of this model but the most important one is a cultural one; when you work with data to generate stubs, there are benefits to concentrate first on Wikidata. The biggest benefit is that the data is integrated from the start and a close second is the ability to easily use it in another Wikipedia and maybe in Wikivoyage as well.

At this stage of the Wikidata game there are important issues:
  • many datatypes are not yet supported
  • many datatypes cannot be imported yet with a bot
This means that we cannot import all the data to Wikidata yet and at this time the import will be a combination of a traditional import to a project and the use of data from Wikidata.
You may find that the data already exists in other projects and you are likely to find differences. Finding these issues and curating them is crucial to all of them. The trick is to make sure that this is done only once and, this is where Wikidata shines.

Mapping from #DBpedia

Mapping DBpedia properties to Wikidata may be difficult because a Vice President of the Unites States is defined by the same property as a Vice President of ....

"Obviously" you may deduce information from the context. Obvious when you have the technology.

Mariano Baptista, a president of #Bolivia

At the #Occitan #Wikipedia, minimal information is required. It gets most of its data from Wikidata. As an experiment I added information for Mariano Baptista, who was president of Bolivia from 1892 to 1896. The choice for Mr Baptista was made because I changed his profession on the Spanish Wikipedia in order for my bot to pick this up.

The most relevant thing to Mr Baptista is that he was "president of Bolivia" so I selected the Wikidata item of that name for the property "Office held". I added the usual qualifiers and had another look at the effect. It shows as "Lista dels presidents de Bolívia".

It is probably the nearest Wikipedia article to the office of President of Bolivia but for the purpose of identifying the office it is not that good. There are two options, change the label in the items or create a new item for the office of the president of Bolivia.

PS While I was writing this blogpost, Capsot had expanded the article with some text :) There is also no script error for this article..

About the presidents of #Argentina

Most if not all #Wikipedia articles of the Argentine presidents have an infobox informing us of the occupation of that president. Many of them are lawyers, some are military and there are some more occupations mentioned.

Leopoldo Galtieri was only president for less than a year and, he was a military man. I know this because my bot is running on the Argentine presidents and it is extracting their professions for use in Wikidata. I had to edit by hand one president for my bot to complete all of them.

The opportunity is for other Wikipedias to learn about this new bit of information. The Occitan Wikipedia now shows this information twice. It is one of the articles with a script error..

Thursday, August 29, 2013

Reviving my bot for #Wikidata IV

An appropriate response of a fixed bug is .. more testing. The did get a bug fixed so testing the latest and greatest was in order. The bug has been fixed. I am grateful so it was time to do run attempt to import more data from Wikipedia and do some more testing at the same time.

At this time I am importing political party information from templates of obvious politicians; people who are for instance in a category like "Presidents of Argentina". I found new bugs..
  • When there is no party indicated, the bot aborts
  • When the party is a "red link", the bot aborts
  • When there are multiple parties indicated, it only adds the first one
I also found that there are Wikipedia articles without a Wikidata entry, they are ignored. 

I am really happy with the help I get in running the pywikipedia software. It becomes however obvious that importing data from Wikipedia into Wikidata in this way is not mature.

PS I imported some 400+ records while testing :)

The next billion #Internet users III

At the end of the year the number of Internet users is expected to reach 2,7 billion users. The expectation is that in 2017 there will be 3.7 billion users. This means that in 2017 most people will still not be connected to the Internet.

Recently Mr Zuckerberg of Facebook fame presented It aims to bring the people who are waiting for the "last mile" to be bridged. In an interview in Wired, you can read about the vision behind this project.

The numbers of the Internet Service Providers are from before the announcement of this new project. When people are getting on the Internet even faster, it will certainly change the dynamics of the Internet even more. It is not obvious what data will be most sought after by the people who will be new on the Internet.

With all its encyclopaedic information, Wikipedia seems to be well placed to serve many more people the sum of all knowledge. However, much of that information will not be accessible to the many people because it is not available in a language they know. With Wikidata it will be possible to maintain data and present it in the many languages it supports.

We do want well written articles but presentations of the core data of subjects may fill in the gap.

The next billion #Internet users II

#China, #India and #Indonesia are estimated to have 3 billion connections in 2017. Consequently the opportunity to attract people is very much in these countries.

When you look at the Wikipedia statistics, you will find that Chinese and Indonesian Wikipedia are growing 262% and 122% respectively on an annual basis for mobile traffic and 49% and 34% for non-mobile traffic. This does not make them the fastest growers as you would expect from the prediction; the Russian and Arabic Wikipedias are growing faster at this time.

The story for India is more complicated; it is not clear from our statistics what part of the traffic of the English Wikipedia is coming from India. Traffic for the many other Indian languages, mobile and non-mobile, indicate that mobile is really taking off. When you consider the absolute size of the mobile traffic from India however, you have to appreciate that only the latest smartphones have a chance of rendering Indian scripts successfully. It will probably need an intervention from the Indian government to get the latest language technology implemented on existing smartphones.

The next billion #Internet users

The aim of the #Wikimedia Foundation is "to share the sum of all knowledge".  We are reasonably successful at that as we are one of the largest websites in the world. The Internet Service Providers have looked into their crystal ball and made a nice presentation of where the next billion Internet users will come from.

They asked me if I would be interested to share it with my readers because of a previous blogpost of mine. I am, so enjoy.

The Next Billion Internet Users: What Will They Look Like?

Wednesday, August 28, 2013

The #Catalan #Wikipedia provides great information

Adding information to #Wikidata means that you want to use the existing items with interwiki information. When you find red links in the English Wikipedia, it is easy to think that there is no information in Wikipedia. It really pays to check the interwiki links of the article that contains all the red links.

The Catalan Wikipedia I find is really good at providing information on the Islamic rulers that are part of Spanish and North African history.It means that when I add information I need to take their existing articles in consideration.

I got the impression that the Zayyadin dynasty were "caliphs". So I created an "Zayyadin caliph" item. My hope is that someone will change it to "emirs" when this is wrong.

Tuesday, August 27, 2013

About succession and countries

When you are adding labels to #Wikidata the infoboxes are the main inspiration and source of information. To the right you find a detail of the infobox of the Almohad caliphate. It shows really interesting information.

When you look at the size of the Almoravid dynasty, you will agree that the Almohads have absorbed other countries as well. In a similar way, the succeeding countries are unlikely to have killed of the Almohads all at the same time. It also becomes obvious that there is little relation to the countries that exist in the territories of these past domains

There are multiple ways to bring some clarity in all this. One is to show overlaying maps and the other is to use qualifiers with the data. The succession is likely to have coincided with events, events like battles. These events have dates and coordinates. These events link back to the countries that are in conflict.

To get such data in a consistent manner in Wikidata will take a lot of effort. I am curious if there are sources that already found their way to express it effectively. I am a fan of importing data but I am equally convinced that there is a lot where judgement and involvement is the only option.

The Hammadids of #Algeria

#Wikidata can be #fun. This time it was possible by combining data from several Wikipedias to make a complete list of both the succession of the "sultans" and to complete the genealogy of these people.

Completing such things are not always that straightforward, in this case I found that three rulers were cousins so they had the same grandfather. When one of them is called Nasir ibn Alnas, you can deduce that his father was called Alnas ibn Hammad. As there is no Wikipedia article for several of the Hammadids, I added these missing links in Wikidata.

The Hammadid dynasty occupied their territory and at some stage this territory was absorbed by the Almohad caliphate. Something that does not become clear in the English Wikipedia. The growth and decline of many of these African territories can be nicely seen in this animated map. What would be nice if maps like these are linked for the rulers to Wikidata. It would provide the key to data in many languages.

The tragedy in a #DBpedia announcement

There was an announcement of achieved goals in the Google Summer of Code for a DBpedia project.
Wikidata integration inside DBpedia we are happy to announce that an initial RDF DBpedia Dumps for Wikidata Data is now available.
Enough reason to read again what this integration is about. The sad thing is in the Additional goals section; "Data quality assurance and tests to improve data quality for academia and enterprises (not for wikipedia infoboxes) would be nice".

The improved data quality is not intended to improve the quality of the data in the Wikipedia infoboxes. This is really sad because as the cooperation with the Deutsche National Bibliothek and the German Wikipedia proves, improvement is beneficial to both parties.

My understanding of why the Wikipedia infoboxes are out of scope is that there has to be a willingness to accept outside 'interference" and we have not learned to cooperate, not even among Wikipedia communities. Wikidata however is the new community and, it has a much higher standard to comply with. The data has to be superior to the data in any Wikipedia because it aims for its data to be used on any Wikipedia.

The best thing to do is side step this issue and make sure that all the questionable data in Wikidata is flagged so that we can start to find out what is best after all.

Monday, August 26, 2013

Reviving my bot for #Wikidata III

The objective of running a bot is to achieve an objective. It used to be adding interwiki links, now it is adding data to Wikidata.

It is a good idea to start small; a small group of records and only one value. If only to be able to fix things by hand without too much trouble. One bug found, fixed and up to the next challenge.

The presidents of Argentina probably are also likely members of a party so they were next. No record changed, so I checked Juan Perón to find that he was a member of the Justicialist party.

I found another bug. This bug is now being fixed as well; people are really helpful. Probably they want more data in Wikidata as well.

PS I have also done some work on the "kings of Nekor"

Reviving my bot for #Wikidata II

It did not work as designed. In the end there were several mysteries. To solve it, I deleted the and it was re-created by the software.  That made a difference. In the end I also got the order of the parameter and the property right.
python  -family:wikipedia -lang:en  "-cat:Presidents of Uruguay" "-template:Infobox Officeholder" -namespace=0 party p102
What my bot did not get right is the attribution. It is said to have been imported from the Northern Sami Wikipedia ...

PS thanks again to so many helpful people :)

Reviving my bot for #Wikidata

Wikidata needs more data and, to eat that dog food I am in the process of reviving my bot, RobotGMwikt. I used it a lot in the past to create interwikilinks for Wiktionary. This is a function Wikidata could serve for Wiktionary as well as a start..

Things have changed; I have a new laptop and the Pywikipedia framework is now developed using GIT. There is documentation but it deserves an update. It seemed for instance that there are four things to install but actually there are three. I do not know why there is something about a "clone".

The good news is that at the pywikipedia IRC channel there are people quite happy to help. You have to start for instance in the folder with the "core" software and, this is where the has to be as well. And, make sure that you have all the quotes at the right places. How did you copy and past again from a terminal session ... I am grateful for their patience.

I ended up with the following command:
python "-cat:Presidents of Uruguay" "-template:Infobox Officeholder" -namespace=0 p102 party
It did exactly nothing and I have no clue why. I saw all the president of Uruguay being processed and I know it is in the template and, p102 is the party... 

Please help because Wikidata needs more data.

Sunday, August 25, 2013

Presidents of #Uruguay

At #Wikidata there are "items" for all the presidents of Uruguay. However, when you want to add simple information like "start date", "preceded by", "end date" and "succeeded by" you find that it is "complicated".

This may mean that the information in Wikipedia is wrong. When you retrieve information by inferring the data from the text, the lists, the infoboxes you will get the best information as available at that time.

The problems found so far are from reading the texts are:
  • there was a civil war with two opposing Uruguayan presidents and governments
  • for a long time there was no presidential representative democratic republic
  • when both the president and the vice president are out of the country, there is an interim president
Obviously there is a solution possible for all that. What that solution should be, requires some thought. The thinking of such a solution however is important because it needs to reflect the "Neutral Point Of View" that is dear to the hearts and mind of Wikimedians. Also relevant is the consideration that similar problems exists for other countries past and present.

Issues like these do not mean that we should not mass import data. It means that we need to actively curate the data. Lessons learned should be shared with where there is contrary information.

Friday, August 23, 2013

Paper cuts for #Wikidata

Quite a while ago Ubuntu did a paper cuts initiative the idea is to come up with those things that would make the user experience so much better. Those things that make people happy.

I love the idea and, I will think hard of the things for a Wikidata paper cuts. I wonder if external tools like geneawiki and reasonator will be in scope.. They are really motivating when working on Wikidata.

What would you like to see improved in the Wikidata functionality ???

I am very likely to blog more about the "paper cuts".

#Wikidata: Infoboxes task force

At #Wikidata you can only add those properties that are accepted by the Wikidata community. This means that much data that exists in many infoboxes does not make it into Wikidata .. yet. This is what the "Infoboxes task force" is there for to facilitate.

When you look at their page, you find how they are organised; they are organised around the P107, the main type (GND). As has been reported before, that property is about to be deleted and consequently this approach is obsolete.

So another approach is needed. Given that there is no longer the fixed structure that has burdened progress so far, something more flexible is possible. At the same time we need something that has been given a lot of though and is workable. Waiting however for the thinking to end is not really an option given how fast Wikidata is growing bigger.

DBpedia has done nothing but try to make sense out of the infoboxes of the many Wikipedias. If anything I would like the Infoboxes task force to accept the work done at DBpedia and consider the properties that they came up with.

#Wikidata needs more data to be useful II

The proposal to require bots to provide sources has been closed. I am really appreciative that it has been rejected.

The stage is set for bots to add much more data. Data from sources that are deemed reliable. There is a lot of data out there and as it finds its way into Wikidata, it finds its way into many Wikipedias like the Occitan Wikipedia.

Even with bots adding the bulk of the data to Wikidata, there is so much more data in Wikipedia that they just cannot parse. Check out the article about the Kingdom of Nekor and particularly the part of the rulers. It is a mess, I can hardly make sense out of it. But it is relevant as part of the history of Morocco.

When data has been entered, it can be sourced later and it can be compared to other repositories of data. The German Wikipedia community has a project with the Deutsche National Bibliothek where they collaborate on dates of birth and dates of death. This is the kind of cooperation where we need our community to concentrate on. This is how we make an even bigger impact.

Anyway, the proposal has been defeated and Wikidata can now move on and increase its relevance.

Thursday, August 22, 2013

#Wikidata needs more data to be useful

In case you did not know it, Mr José Mujica is the current president of Uruguay. When you read his Wikipedia articles you will know it. Only recently this "statement" was added to the Wikidata item about Mr Mujica.

When you check out the infobox on the English Wikipedia, there are many more statements to be made about Mr Murjica. Statements that any competent bot operator can either get from Wikipedia or from DBpedia. Mr Mujica is not the first Uruguayan president, many presidents preceded him. And for all of them there is a date that they started as president, there are predecessors, successors and end dates. Most of them are not in there.

For Wikipedia to be useful, it needs data. A lot more data IS available and is waiting to be added. Magnus Manske has information on 300.000 people waiting to be added. It is not included because some people want the information added by bots to be sourced.  As you know, Wikipedia is not a source...

When you consider how many people have an entry in Wikidata, you will realise that 300.000 people is a nice effort but also a drop in the ocean. When you consider that many of these people have links to external sources like VIAF en GND, you will realise that much of this "unsourced" information can be compared with information elsewhere.

Wikidata is not yet at the tipping point where it is actually useful. Until it is, artificial restrains to adding credible statements will only move this point even further in the future.

#DBpedia and #Wikidata revisited

Many people add "statements" to Wikidata. One obvious resource is Wikipedia because they are already intimately linked through the interwikilinks. Another fine reason is because with statements in Wikidata the same infobox can be served in many more Wikipedias.

Sadly it is to a large extend a waste of time. DBpedia has already extracted a huge amount of data. When the data flows into Wikidata in effect it will flow back into Wikipedia. There are two possible objections:
  • maybe some people prefer to do it by hand
  • DBpedia is license under the CC-by-sa and Wikidata uses CC-0
The people who prefer to everything yet again I prefer to go elsewhere. What folly; our aim is to provide information not to keep them occupied.

The license issue is in my opinion a non issue as well. All it takes to make a DBpedia under a new license is run the process of extraction again with the intention to license it under the CC-0. If you think this is a bit too much, I agree so lets agree to make the data available for use in Wikidata. Motivation: DBpedia wants to give back to Wikipedia and, this is the way to do it.

When statements are added from DBpedia with the specific Wikipedia as the source, the people who want to be occupied can have a field day comparing and curating the data. If they want to look really smart, they can add the sources in Wikidata known in Wikipedia.

The new #Vietnamese #Wikivoyage

Recently a new Wiki was created for the Vietnamese Wikivoyage. At that time the language committee is informed that its process is done because the relevant Bugzilla bug is resolved

Once a project is created, there are always these small details that still have to be sorted out. The most important thing is that the new project can be found in the many places where it should be added. The most obvious one is at another is the statistics page for Wikivoyage. Finally the Vietnamese articles have to become known in Wikidata.

Now that the interwiki links of Wikivoyage are maintained at Wikidata, it is not clear to me when the links are picked up by a bot. I do not have the skills to change the template for the main page of Wikivoyage. The one thing I have done is inform the relevant person to add the Vietnamese Wikivoyage to the statistics page; that is something that takes its time.

At some stage all these things get resolved. It would help if it is part of the process to create a new Wiki.

Follow up of a #Wikipedia editathon

Hypatia Bradlaugh Bonner is notable enough if nowadays a bit obscure to be worthy of her own Wikipedia article. You may have seen the picture in a blog post of Wikimedia UK.

I was intrigued if the article was already known in Wikidata. To my amazement the only article about her was in the Dutch Wikipedia. I added the English article and from the text I added information that is pertinent about Mrs Bradlaugh Bonner. The reasonator makes a nicely formatted view of the available information for you.

As Mrs Bradlaugh Bonner is an author, I was sure that there will be a record in one of the standard resources like VIAF. There is and from there a wealth of other information is available.

One thing I learned in the process is that according to Wikidata, "atheism" is a label used together with "religion".

What I am interested in is if a bot will pick up on the data that I gleaned from the Wikipedia articles and bring us something we do not know yet.

Tuesday, August 20, 2013

The #Wikipedia toolbar refers to #Wikidata

Quietly an option has been added to the Toolbar; it is the "Data item". It refers to the data item for a Wikipedia or Wikivoyage article in Wikidata.

When you add content to Wikidata, this functionality is really valuable. With one click you find the corresponding Wikidata item even when there is only one article on the subject. It is particularly helpful when there is no label in the language you selected in Wikidata to be the default."

The bug 47911 is largely redundant as a result. It was created because at that time this option was not available. What is left but not that relevant is the inconsistent behaviour of the "Edit links" at the bottom of all the interlanguage links.

Any way, the "Data item" deserves recognition as a "Featured Wikidata effort".

Monday, August 19, 2013

Information for the English Wikipedia.

The Dulkarid dynasty is in the English Wikipedia best known for the women who married to Ottoman emperors. There are  however articles on the Turkish Wikipedia about most of these beys.

With red links on the English Wikipedia and information in Wikidata, it makes sense to link the red links to Wikidata. Because you would believe that all these Dulkarid beys do not have an article, this is however not the case. Dulkadiroğlu Ali Bey has an article under another name.

The information you find in the list in the image is now available in Wikidata. When it is possible to present this succession in a nice way, it could be linked from articles on these people or from the page on this dynasty.

#Wikidata gadgets - mouse over

I am tickled pink when writing on my blog has an effect. Yesterday I wrote about the "mouse over" gadget. Check out this bugzilla bug. It is now very much likely that it will become part of standard Wikidata functionality.

Sunday, August 18, 2013

#Wikidata gadgets

Wikidata has gadgets, several add functionality that make sense to have. One of them shows the description of a label or a property.

It improves the user experience a lot so much so that it might as well be turned on by default.

Saturday, August 17, 2013

Be careful what you ask for ...

In a previous blogpost and on #Wikidata I asked for the deletion of P107, the infamous "main type (GND)". As you can read in the closing statement, P107 will be deleted.

For persons it will likely be replaced by "instance of" "person". It will be more complicated for the other GND types. Many of the data items will not easily fit in any way and those entries will probably be deleted.

In all this it is really important to understand that this does not reflect at all on the work done at the Deutsche National Bibliothek. We will still refer as many items as we can to their database. If anything I hope the cooperation with the DNB will continue to improve.

The one thing I am anxious about in all this is how long it will take to fix tools like the reasonator and the geneawiki who rely on the existence of the "main type GND".

A challenge: 20.248 #Wikipedia articles with script errors

When you add references to #Wikidata of relevant images, by magic they will appear on the Occitan Wikipedia. I did that by adding a picture to the Wikidata entry for Ahmad Tajuddin, the 27th Sultan of Brunei.

It was a random selection of a person mentioned on a list with 20.248 articles with script errors. The number is big but it is probably only one template that is generating all those errors. The edit page says:
I checked out a small number of other articles on this list and they all include this same template.

Some say that Wikidata is about big numbers and I am impressed that all these articles probably refer to Wikidata. The one thing I am curious about is how long it will take for the list to clear once the script has been fixed.

First things first; who is going to fix the script.

Friday, August 16, 2013

How about featured #Wikidata effort

Both #Wikipedia and #Commons feature their best efforts. As a consequence there are people working really hard to raise the game of articles and images.  The requirements for a successful featured picture or article are quite steep. There are even workshops that teach people how to fulfil all the requirements. It does have an effect if only to raise awareness of the need for quality.

Wikidata is a young project. Some people say it is a project where meta-guys and girls work on large data sets connecting to other resources and blah blah blah. There is that and I find it hard to follow all the blah blah blah because what they do is not explained in words that I understand. It is not shown in a way that I can see and understand.

At the same time there is a lot of interesting and relevant work done. When this can be explained and visualised it will make it more appealing to work on Wikidata. The trick will be to come up with criteria for a "Featured Wikidata effort".
  • new functionality in Wikidata itself that enlarges what you can express in Wikidata
  • new or improved functionality to visualise related data
  • new data entered into Wikidata manually of related information
  • new or improved functionality in a Wikimedia project that makes use of Wikidata data
  • new or improved functionality that improves the multi lingual use of data
  • new or improved functionality to use Wikidata to curate information
  • an effort in a WM project that will positively impact Wikidata
These are the things I am looking for in Wikidata. When something ticks one of these boxes I am likely to blog about it. I would love to have more reasons to blog but it would be much better when even more eyes learn about the wonderful effort that is taking place in the Wiki data community.

What to do when #Wikipedia does not provide the information II

King Abd al-Rahman of #Morocco was a nephew of the king that preceded him. His father of mother were probably brothers of king Slimane. As long as I did not know exactly how they are related, the genealogy tool made by Magnus will not show all the rulers of the Alaouite dynasty.

So I did all the usual things to find out:

  • I googled
  • I went to the library
  • I blogged about it
  • I asked on Facebook in a SIG with many people from Morocco
All this did not work.. So I asked a friend at the Tropenmuseum. He asked Mohamed Saadouni who told me that "Moulay Abderrahman is the son of Moulay Hicham, a brother of Moulay Slimane". He even provided me with a reference.

When you are adding statements to Wikidata, it helps when you can can get satisfaction out of the visualisation of the facts. Moulay Hicham exists at this time only in Wikidata, t would be good when be showed as a "red link" as there is no article in any Wikipedia about him. i

When you add data on subjects that have their basis in Africa, you will find that the information in Wikipedia is spotty and that it is very much seen from Western eyes. I am for instance not convinced that Morocco was a sultanate. He is called Moulay Hicham and Wikipedia makes "moulay" equal to "Prince". The current "King" is actually a "Malik".. I do not know enough so I remain puzzled.

The #Occitan #Wikipedia and #Wikidata

The Modèl:Infobox or the Infobox template on the Occitan Wikipedia relies heavily on Wikidata. The information shown depends heavily on the statements made for a subject.

I checked out a random person, Bernat de Ventadorn from the list of articles that make use of this template and, there was no picture. I checked out an article in another language and found one that had a picture in its infobox. I added the picture in Wikidata. With a refresh of the screen the pictures now shows for everyone to see.

It is fun to contribute to a Wikipedia without understanding a word :)

I since found out that the date of birth and the date of death are not part of what the current template is interested in.. Adding this information now is an investment for the moment when the software is improved.

Thursday, August 15, 2013

I #spy with my little eye

When people go to events, they take pictures. They post them to Facebook for the whole world to see. This picture was taken by Ziko and everyone in the picture can be identified by the people who know them.

There is effective software that makes an educated guess based on previous identifications. I wonder how effective it would be on historic pictures.  At Commons we have many pictures without the names of the people in the portrait.

These three young ladies are known to be the daughters of the Sultan of Jogjakarta. It is likely that there are other pictures of his court with these young ladies. When they are identified in one picture, it may be possible to identify them in this picture as well.

A lot of money has been invested in such spying software by many organisations. It would be cool if the resulting technology was put to use for something uncontroversial.

It is all about #presentation

The edit screen of #Wikipedia' is similar to the standard screen of #Wikidata: it is ugly and it does not present the data well. Look for instance at the two screen shots. The same data about the "office held" by Mohammed V of Morocco is available in both screen shots.

The screen shot above is the standard screen of Wikidata for Mohamed V. The screen shot below is the same data formatted by "reasonator".

There are issues with the reasonator. When you compare the data you will miss the second term in office as Sultan of Morocco. There are other bits of information that are formatted in a weird way. But that is not really important. What is important is that proof of concept tools like the reasonator and the geneawiki motivate me to enrich an item with statements.

PS I really got into the reasonator when I learned how to add a picture.

Wednesday, August 14, 2013

Let us have page view #Statistics for #Wikidata

The statistics of Wikidata are fantastic. It grows like wildfire. Sadly I can not find a clue as to how relevant it is.

One traditional clue is by comparing the growth in page views. This is how often a page is requested for viewing. I am probably one of the few who enjoys browsing the information that is provided and add the statements that take my fancy.

I am convinced that Wikidata needs a large community of people making a statement. Traditionally growing page views statistics have been really stimulating. With all the available technology it should not be that hard to provide statistics for Wikidata as well.

When page view statistics are available, the success of any and all measures intended to make Wikidata popular can be measured. It is quite likely that other statistics will prove to be just as relevant in finding out what needs doing to make Wikidata popular and relevant. But that is another story.

The #Chinese #Wikipedia is not only #Simplified

#Wikimania 2013 has come and gone. China and the Chinese Wikimedia projects have had their day under the sun. The statistics show the Chinese Wikipedia is doing exceedingly well; it is growing 62% annually at the moment.

What we know as the Chinese language is defined by what is written because there are multiple languages that use the same script. To make it even more complicated this written language exists in two scripts: the traditional and the simplified script.

The Chinese Wikipedia can be read in both the simplified and the traditional script. The default script is the simplified script and for those who want it, there is an option for readers to choose for the traditional script.

I wonder what Google finds when its bots crawl the Chinese Wikipedia. They will obviously find the simplified content, but do they find the same content in the traditional script? When Google cannot find the content in the traditional script, people will not find information in Wikipedia when they google for information.

It is probably easy to find the answer to this question when you know Chinese. Please let me know :)

Tuesday, August 13, 2013

#Multilingual #Wikipedia

Providing information in many languages will be a challenge. The idea is to define language rules that will make sentences out of available information. Obviously, the languages that are well known and well researched are most likely to get the best results.. They are languages like English, German, French, Japanese.

But before all this language technology can be applied, it is important to get the facts. Fact is that all the subjects of all the Wikipedias can be found on Wikidata. Many statements have already been made about these subjects. It is good fun to spend time adding statements.

With the "reasonator" as seen above you find a nice presentation of what these statements indicate about a person. Providing a nice presentation that is available in participating languages is probably the first development challenge. Getting "all" the facts is probably the first community challenge.

Monday, August 12, 2013

Similarities between #Commons and #Wikidata

There are many similarities between #Commons in its early days and Wikidata. At first there were the big dreams, the wonderful opportunities on the horizon. Slowly but surely many of these opportunities are realised.

The first major goal for Commons was for it to be used for displaying images in all the Wikipedias. Once this became possible it took quite some time to move most of the images from the Wikipedias to Commons.

The first major goal for Wikidata was to replace the old interwiki-links. This has been accomplished and many new opportunities are undertaken by a community that is both old and new. The next big thing will be to move as many assertions as possible to Wikidata. For instance assertions like the date of birth and the date of death of a person.

Many resources like the GND of the Deutsche National Bibliothek contain many such assertions and, the German Wikipedia community actively compares the GND them with the assertions in the German Wikipedia. The challenge will be not only to move these facts to Wikidata but also to continue this great undertaking with Wikidata as its basis.

When assertions are increasingly served from Wikidata, corrections will flow to wherever they are used. When the labels of the statements are usable in multiple languages, these assertions can be served in a multilingual Wikipedia.

A #multilingual #Wikipedia supported by #Wikidata

A proposal has been made for a multilingual Wikipedia. To make it work it will use available information in Wikidata.

Above you find genealogical information of Nintoku, a Japanese emperor. As you can read in the screen shot 1990 people are included and you may notice many of these people do not have a label in English.

As there are labels in at least one language, it is possible to transliterate it into other scripts and or other languages. There are standards for transliterations and consequently it is technically possible to create the missing labels.