Wednesday, July 30, 2014

#Wikidata - Sheik Umar Khan a physician from Sierra Leone

Do not be mistaken. Mr Khan is a hero of our times. Mr Khan died of ebola. He was in charge of the fight to contain this awful disease.

There is a category of people who died from ebola; with currently three entries it is mercifully empty. Then again, that man who died at Lagos airport is not in there.. Probably more people who became notable because of ebola are missing as well.

It is important to recognise ebola for the threat it represents. One of the things you cannot do is run away from it. The only thing that is achieved is spreading the disease even further.

Tuesday, July 29, 2014

#Wikidata - #Badges are here

All articles are equal and some prove to be more equal than others. They are for instance the featured actions and, they can show up among the interwiki links thanks to the new badges functionality. These badges are quite special because they are in effect an attribute to an interwiki link and as such they are in a class of their own.

Bene developed this new functionality as a volunteer and, it proves that Wikidata is very much open for collaboration. It may be a member of our community, it may be students interested in a project that does make a difference. The great thing is that all these collaborations make their difference.

It makes Wikidata a true Wiki; not only in its data but also in its software.

Sunday, July 27, 2014

#Wikidata - Juri Pelivan; prime minister of #Bosnia Herzegovina

Mr Pelivan was the first prime minister of his country. He was a Croat and a Bosnian. There are Wikipedia articles only in Bosnian and in Croat for him. This is a likely moment for activity about Mr Pelivan as he died in Split on the 18th of July.

There was no item for the function of "Prime Minister of Bosnia Herzegovina". His successor is not known to Wikidata. Bosnia and with it so many other countries are not well represented in both Wikipedia and Wikidata.

Friday, July 25, 2014

#Wikidata #statistics - waiting for storage

With the Games being really popular and with the mass additions of statements using AutoList2, you would expect that it shows in the stats for Wikidata.

It does not. There is a problem. Labs ran out of storage. New hardware was ordered and with a bit of luck, the three shelves of new storage will provide ample space for the foreseeable future.. There is a truism that has it that discs will fill up in half the expected time..

Yesterday they were initialising the hard drives and next will be moving and copying the data to its new location. With a bit of luck the software that generates the statistics will find it and, we will have something to ponder again.

Thursday, July 24, 2014

#Wikipedia - the death of Eric Garner

Mr Garner died in New York. For whatever reason he was held in a choke hold by a police officer. Because of the stress or whatever, Mr Garner died of a cardiac arrest. Sadly there are many such incidents in the United States. It is not that strange given how much violence in all kind of forms is celebrated. It is not strange when many police officers think they are invulnerable to any critique.

It is a sad story, it is a news worthy story but it is not a story worthy of an encyclopedia. At best it is worthy of a footnote in an article on police behaviour in 2014. Given that there is a Wikipedia article, there is a Wikidata item. Given that Wikidata has the aspiration to include Wikinews, it may not be that bad but still.

#Wikidata - Henri-Guy Caillavet; a French MeP

At #Wikidata several people are adding information about former and present Members of the European Parliament. In the "Mix-n-Match" tool we link information from the European Parliament to Wikidata and, by inference to many Wikimedia projects.

It involves identifying people who were also an MeP. Often items can be merged when multiple articles exist for the same person or Wikidata does not know a person was a MeP.

What is surprising is that the demise of Mr Caillavet is not known on the website of the European Parliament. Linking data will make it easier for them and for us to know about such things in a more timely manner.

Wednesday, July 23, 2014

#Wikidata - let us get rid of fixed descriptions

In Wikidata, there are "descriptions" for each item. They may have been a good idea at the time.. everybody does them. But really what helps most:
When there are two Mr Gerokostopoulos to choose from, which one to choose .. or you do not know English and prefer another language.. Even more adventurous, new information is added and, the generated text gets updated automagically while the fixed texts weighs us down even more.

Really.. Why have them? What is the added value? What stops Wikidata to get rid of all that junk? The best argument is that it frees up time to add more statements!

#Wikidata - a letter from Mr #Modi

I receive a letter on behalf of Mr Modi, the prime minister of India. It is an invitation to give input to the PM for the transformation of India. The idea is to connect people and their elected representatives effectively.

I am not from India but I am pleased with this invite. My suggestion is obvious; I want the people of India and the world to know about all the members of the Lok Sabha past and present.

In all the Wikipedias and Wikidata we know about many of them. For some we know their political affiliation, for others we do not. For some we know their gender and for a few we do not. For most of the representatives we do not know if they studied and where.

Mr Modi, having this information in Wikidata makes it easy to learn about the elected representatives of India. To find them, their name as written in any language needs to be provided only once.  Mr Modi, India is relatively well served in this but it would be appreciated when more facts are available.

You will be inundated with ideas that may transform India. Having information in all the official languages of India about politicians past and present will bring them closer to the people they represent. It will be appreciated when you give our project your blessing.

#Mediawiki - the #Media viewer

The #Wikimedia Foundation has a problem with people accepting new functionality. The reasons why are often irrational and steeped in conservatism but that is another story. A blog post does not help much at that.

What may help is the assessment of bugs. In bug 68372 it has been identified that in certain browsers a name like MilutinDostanić.jpg will show up properly in the URL when seen from Commons and not from within the Mediaviewer. The Mediaviewer will show it like MilutinDostani%C4%87.jpg.

Technically, technically there is nothing wrong with that. From a user perspective it looks like shit. When a bug is closed because technically there is nothing wrong and a difference in behaviour is not considered as being of enough relevance, a user gets pissed off.

When bugs are reported and when user acceptance is important, differences between expected behaviour and actual behaviour become important because they are often what prevents acceptance of new functionality.

#Pywikibot - wants you to peek under the hood

For most #Wikimedia projects, Pywikibot has proven itself to be a trusted hard working tool. Literally millions and millions of edits were only possible because of the people operating the bot.

Before Wikidata, the interlanguage links were maintained by what was called "pywikipedia" bot. Now Wikipedia has been harvested for information with this bot to enrich Wikidata.

As time goes by, the architecture of MediaWiki has changed a lot. Consequently Pywikibot had to evolve as well. Its architecture changed as a result; it aims to use the latest API and other MediaWiki functionality.

From July 24th, 2014 and ending on Sunday, July 27th there will be a big online event to learn what more needs to be done for Pywikibot. What bugs need an urgent fix, what features are missing or incomplete. Obviously, it is a also time to look at the code and look for "bit rot".

It would be awesome when more people who care for the Wikimedia projects and know their Python get involved and ensure that Pywikibot will remain the meanest and most powerful bot platform around.

Tuesday, July 22, 2014

#Wikidata - Awards of Vienna

There is Vienna the city and, there is Vienna the state. The Wikipedia article merged them into one while they are not the same thing.

A person who was awarded the "Goldenes Verdienstzeichen des Landes Wien" died recently and as information was available in a category it was possible to include all the people who were awarded in this way.

One slight problem; there was no item for the award because all the "Verdienstzeichen des Landes Wien" are in one article. It is easy to create an item. I did. Doing the same for the state of Vienna is for another day or for someone else.

#Wikidata - for some balance; survivors of KZ #Dachau

With some regularity I referred on my blog to people known for their part in Nazi Germany. It always leaves me with a bad feeling and as a result I blogged several times about the victims.

A survivor of KZ Dachau died recently. It not only said so in the text, there was also a category indicating he was once a prisoner in Dachau.

The total number of people who were prisoner at Dachau is significantly higher in Wikidata than in any Wikipedia. This is the consequence of each Wikipedia knowing about a different subset of humans.

I am sure that when the references to Wikipedia articles for KZ Dachau are analysed many more people will be known to have been imprisoned in Dachau.

Monday, July 21, 2014

#Wikipedia & Wikidata - set theory and categories

According to many Wikipedias, Mr H. is a German. When this fact was introduced in Wikidata it was reverted; Mr H. was to be considered a national of "Nazi-Germany".

This raises an issue; when Mr H. is not a German all his victims are not German either. Arguably even the people who lived in the territory of Nazi Germany and were judged by its laws, are not necessarily Dutch, Belgian, French either.

This example is stark. However, the same issue exists in so many other contexts as well. Are the people who died before the break up of the Netherlands Dutch or Belgian? How to consider the people who lived in colonial times and lived in the colonies? What about the people who are only notable because of their actions in the USSR and now live in Russia, Armenia, Estland, Ukraine... ?

The categories of the Wikipedias are used to provide specific information for Wikidata items. For over 400 categories queries have been defined showing what Wikidata recognises as its content. All of them are in many parts; these are all about "humans" and items will only show when subsequent statements are true as well.

When "nationality" is involved, it follows that both the Wikipedia categories and consequently items in Wikidata suffer from the complexities indicated above. When for instance Spanish governors of Cuba are part of the category tree of Cuban people, it is arguably wrong. However the argument also has it in for the people who lived their whole life on the island that is Cuba..

Sunday, July 20, 2014

#Wikimania - a visa denied

Meeting Amir at a conference, a hackathon is a pleasure. Having him around is a sure way of getting all kinds of problems solved. Giving his intimate expertise of pywikibot and our projects he makes it seem easy to resolve issues. The biggest issue is often that it takes time for his bots to complete.

Amir has been to many conferences but we will have to miss him in London.. It took the British embassy several weeks and all kinds of excuses why it took so long to decide that they are afraid that he will not go back.


The success of Wikimania 2014 will not be as great because one of our best and brightest will not be among us.

Friday, July 18, 2014

#Wikidata - Joep Lange, HIV researcher

Mr Lange was on his way to a conference in Melbourne about HIV. He and several other HIV researchers died because his airplane was shot out of the air over the Ukraine.

Mr Lange was a professor at the University of Amsterdam. He was considered to be a „World’s Top AIDS Researcher“.

I am appalled that Mr Putin considers the Ukraine responsible because "it happened over Ukrainian territory".. That is a bit too simplistic and obvious an excuse.

Thursday, July 17, 2014

#Wikidata - many more edits

It is so easy to add a lot of information to Wikidata. Mr Wekwerth died for instance. You read his article and, you find a category indicating that he was awarded the order of Karl Marx. They were 437 edits for me because I added all the people in the category using Autolist2.

There are many more categories on the profile of Mr Wekwerth and arguably each one establishes Mr Wekwerth more in who he was and the people, places and occurrences he was connected to.

Typically I do only one category for one person who died. When that someone was a bishop, I know him to be a priest. They are some 777 edits I am adding at the moment. It could have been a diplomat or Wikidata does not even know that the person is "human"..

Adding an additional 100K edits is not that hard. It does enrich Wikidata and the results are obvious when you regularly wander using the Reasonator or when you add the dates of death to those who died like I do.

Tuesday, July 15, 2014

#Wikidata - one million edits

Thanks to Magnus's tools, making a million edits is feasible. It has helped me that I have a plan. The plan is to gain functionality from the data that is included. Functionality that is available today, functionality that is not a mirage of what the future may hold.

The most important tool is Reasonator. It shows best what information exists for an item and it includes up to 500 items that refer to an item. It is so important because it provides me with instant gratification; you see things grow as they happen. The automagic is great; maps, timelines, the higher "classes" they pop up when they become applicable.

Very important is Autolist [1] and Autolist2 [2]. They are tools that add loads and loads of statements one at a time for me. It is important to restrict updates to the subclass or instance they should operate on.. For instance when adding "female", the item must be "human".

One aim of Wikidata is to be able to have information available for info-boxes of a Wikipedia. To make this possible it is a requirement that for each article there is an item. Creator is the tool that can create these missing items.

Obviously, all the harvesting needs to be done again for all those new items.  Toolscript makes this possible. I just have to figure out how to do this.

To put things in perspective, with one million edits only the surface has been scratched. There are some 15,259,555 items to operate on.. However, the effect is noticeable in the auto descriptions and in the Wikidata search results.. There are fewer items that have nothing to describe them.

[1] Categories with human that show what Wikidata thinks should be in them. Currently 337 of them
[2] Females on the Chinese Wikipedia that are not known as such. There were 1873 of them...

Sunday, July 13, 2014

#Wikidata guided tours

#Wikidata has guided tours. They are nice. They help newbies understand what it is all about and yet..

I find that knowing too much does not help. There are all those details that I want people to know about.. The #Babel effect on the number of languages with labels shown for instance.

In the guided tour descriptions are explained.. I HATE descriptions, they are vastly inferior to auto descriptions.. Check out the screen print; no description in sight but these auto descriptions do translate to all other languages.

Really, it pains me that I find fault at these very much needed guided tours. The truth is that I would not  do it differently.

#Wikimetrics - What is in it for me?

When you are into #statistics, Wikimedia project statistics, Wikimetrics is a big thing. It is an open environment where you can poor over the collected data to your hearts content.

To make it even better, there will be training in three sessions introducing the tools and necessary skills.  Sweet.

However.. Is all the data in there?

There is a long standing request for information that shows where Wikipedia fails to deliver; what are our readers looking for that they cannot find. When such information is collected, it will be easy enough to use Wikimetrics for this as well. After many years the people who could know, the WMF statisticians, have not been able to say one way or another.

The official statistics for Wikidata do not include page reads at all. The motivation given at the time was nobody is using Wikidata. Maybe.. However, projects have started using Wikidata in templates. There are even categories for such templates.. Tools external to Wikidata like the Reasonator have their own statistics so it may be interesting to know how often what tools access Wikidata. For the Wikidata crowd it is nice to know what impact their work has.

The best thing about Wikimetrics is that it is there. It is wonderful that it gets support and even when more data could be added it is wonderful to see how the Wikimedia Foundation is opening up its data for further perusal by all comers.

Saturday, July 12, 2014

#Wikidata - Ahmed Sheikh Jama, Minister of Information of #Puntland

Mr Jama died on July 9th in #Garoowe, according to his article Garoowe is in Somalia and according to the article about Garoowe it is in Puntland.

Puntland is a break away region in the north of Somalia and Mr Jama was its minister of Information.

Puntland has all the trappings of a country but it is not recognised as a country. It is one of those subjects that can do with a lot of TLC so that people may know about it.

Wednesday, July 09, 2014

#Wikidata - items with no statements

More than 50% of the Wikidata items have none or one statement. To put it bluntly, we do not know what they are about.. They could be a human, a settlement, a meteor anything really. What we do know it that there are articles associated with them.

Things are improving; statements are added all the time and the improvement shows both in percentages as in absolute numbers.

Statements are important; they help identify items that can be merged, they provide connections to external sources and, they provide the information for use in Wikimedia infoboxes. Currently Wikidata has some 15 million items and 4,437,324 items have no statement. Once items are identified as humans, Magnus's games includes four games that help identify subsequent statements.

The trick is to find items that give a clue that all the items belong in a specific category for instance "born in 1905" indicates that the item is likely to be about a human. Once an item is known to be "human", it becomes easy to link it to all kinds of other statements.

Tuesday, July 08, 2014

#Wikidata - Mr Amitabh #Bachchan, #politician

Who says that politicians cannot be sexy? Mr Bachchan, a hero of the Indian cinema used to be a politician. He is one of 2313 people known to be or have been a member of the Lok Sabha.

When you read the Wikipedia article, there is a wealth of information that could be added to Wikidata as well. It is for instance suggested that Mr Bachchan was a member of the Rajya Sabha as well. There have been all kinds of issues that made Mr Bachchan leave the Lok Sabha in disgust. Issues that could have / should have their own articles.

When you look consider all the people who have been members of the Lok Sabha, it is obvious that they either represented political parties or were elected as an independent. From the article it is not clear what party Mr Bachchan represented.

You can imagine that when one politician can raise so many questions, 2313 politicians represent a bigger challenge. Having such information available is important. It is how people can inform themselves who is or who has represented them politically in their country.

#Wikipedia: Mr K. Kunhambu and #Orkut

Mr Kunhambu was a member of the Lok Sabha, the parliament of India. He was of interest to me because Wikidata did not know what party if any he represented. It turns out that the Wikipedia article does not provide clarity either.

The thing that caught my eye was a reference to an Orkut community.. It is not unreasonable to expect that many more references to Orkut exist. Orkut is a Google product that is "at it ends of life"; its communities will have to find another home elsewhere.

I wonder what we will do with all the references in our projects.

Monday, July 07, 2014

#Wikidata - Erhard Niedenthal; member of the German Bundestag

Mr Niedenthal died recently. It is not known on what date exactly but he died in July 2014. According to the article about him he was a member of the German Bundestag.

As I am adding information about the people who died in 2014, it is possible to add statements for one person or start adding one statement for the many people who are in the same category. When a party member of Mr Niedenthal died recently, membership to his party was added for many people including Mr Niedenthal.

Given that the lack of data is holding back Wikidata the most, it is a strategy that seems to be working. If there is one draw back it is that those languages where people are categorised by the year they died benefit the most. The Dutch Wikipedia for instance does not and consequently the items represented there do not get the same treatment.

#Wikidata - Frederick II of Prussia

At this time, Frederick is according to Wikidata a composer. Based on its information the following text was generated:
Frederick II of Prussia was a composerHe was born on January 24, 1712 in Berlin to Frederick William I of Prussia and Sophia Dorothea of HanoverHe was a member of Prussian Academy of SciencesHe married Elisabeth Christine of Brunswick-Wolfenbüttel-BevernHe died on August 17, 1786 in Potsdam.
When you read the Wikipedia article, you will learn that he was also a philosopher, a patron of the arts, a flautist, a general and a brilliant administrator who overhauled both administrative and military processes, he elevated himself to King in Prussia and because all of this he is known as Frederick the Great.

It is fairly easy to add statements identifying all these roles to Wikidata. The problem though is that when he is to be identified as a conductor, a flautist, a philosopher, it is done by indicating him as an occupation, a profession. That is OK when you consider this to be a convention. It is also a convention that a king is a "politician" and that the period when a king rules is seen as a public held position.

The problem is very much that all these concepts are modern and are often ill-fitting. They are like a house of cards and many of these properties are best hidden because they confuse. Often terminology is used that is insulting by definition.. For instance Frederick the Great was not replaced by Frederic William II, Frederic William succeeded him.

Sunday, July 06, 2014

#Wikidata - #European #parliament

The European parliament is in many ways a funny institution. Its parliamentarians represent the people of Europe, but the way they are chosen can be ever so different. It reflects the electoral system of the country they are from and as a consequence it is a bit like in Animal Farm where they are all thought to be equal as well.

Compare for instance the UK with the Netherlands; the UK knows electoral districts while the Netherlands has party representation. Arguably a British MeP represents only its district while the Dutch MeP represent a percentage of the population that voted for a specific party.

Wikipedia and by inference Wikidata knows about many European parliamentarians past and present. All of them have a career before and after their term in the European parliament and, to understand them and who and what they represent, it is important to know about this. You may find soccer stars, professors and any other occupation imaginable. You may find career politicians and "self made men/women".

On Wikidata a group of people have an interest in adding information about all MEP's. Once information has been added, it is in principle available in all the languages of the European Union. When your language is not one of them, there is nothing stopping you from adding labels for your language and enjoy as much information as we can bring to you.

Friday, July 04, 2014

#Wikidata - Bill Sinegal, a US rhythm & blues musician

Mr Sinegal died in April. This was recently registered on his one article on the German Wikipedia. Apparently they like rhythm and blues in Germany because many R&B musicians have an article only there.

For this reason it makes sense to harvest information about musicians from the German Wikipedia. There were 9,882 US musicians known there who were not known as musicians in Wikidata.

By harvesting information like this for all of them, the information becomes known for all of them. Currently we now know about 8,820 US musicians. Mr Sinegal will be included when it is registered that he is a musician, because it is known that he is an American. The others will be added to the 28,722 known musicians..

Thursday, July 03, 2014

#Wikipedia - Ahmed Mohamud Hayd does not even have an article

The news has it that Mr Hayd, a Somali MP and a former minister was killed by al-Shabaab. Wikipedia has no article about Mr Hayd. Wikidata has no item for Mr Hayd. There is not much about Somali members of parliament in the first place. Only six members of the Somali parliament have been categorised as such.

When sources like Wikipedia and Wikidata do not know about democracy in a country, it is safe to say that people cannot inform themselves about those countries. When Mr Hayd gets a moment of fame only when he is killed, we have to trust the BBC to get its facts right. It probably does.

With a list of all the current members of parliament, it would not be hard to create Wikidata items for all of them. Add information like date of birth, the political party they are a member of and other relevant information and, we can present this in many languages using Reasonator.

Democracy is served by having information about democratic institutions and the people involved available for people to find.

Tuesday, July 01, 2014

#Wikidata - the order of the Red Star

Mr Anatoly Kornukov died. He was a Soviet and Russian general, he was a fighter pilot. His story is quite interesting, it includes that he was in charge of the downing of the Korean Air Lines Flight 007.

Mr Kornukov was highly decorated and, among the awards he received was the "order of the Red Star". It is easy to add it for Mr Kornukov but that leaves out the 9,242 other recipients of this award.

As you can imagine, the category on the Russian Wikipedia for recipients of the order of the Red Star knows the most. It takes a little more effort to have AutoList2 add all the missing recipients.

It takes a while for all of them. You can see how many are already done here.