Friday, June 27, 2014

#Wikidata - Howard Baker; White House chief of staff

Mr Baker died; according to the Wikipedia article about him, he fought in the second world war, he became a US senator, senate minority leader, senate majority leader, White House chief of staff and finally ambassador to Japan.

Obviously Mr Baker is notable. He is notable because of the positions he held. For Mr Baker, they are events in his live. They have a start date, an end date and when he moved on, the position was taken over by someone else.

Reasonator will show these events in a timeline when dates have been used as qualifiers for these events. Wikidata knows implicitly that Mr Baker is associated with all these roles. This is not made visible and consequently the data in Wikidata do not become information.

Wikidata is pondering an update of its user interface. It does not take much to improve on the existing interface. What I would like to see realised is that the necessity for Reasonator will go away. When this is realised, Wikidata becomes a power house of information in stead of mostly a data store.

Tuesday, June 24, 2014

#Wikidata - Please splitters sort out the mess you made

Splitters in Wikidata have carved up the property of "preceded by" and "succeeded by".

The result is hilarious if not a bit sad. When you read the text in Dutch for the predecessors and successors of the President of Venezuela, they are called in translation "predecessor of the work" and "successor of the work".

I am a lumper, as far as I am concerned the best way to rectify this mess is to merge things back together.

#Wikidata - Mr Oskar-Hubert Dennhardt; a German officer and politician

Mr Dennhardt died recently. As with so many people his live had several notable parts. He was a decorated officer and after the second world war, he became an officer in the Bundeswehr and then a politician for the CDU.

When you read the English article, it reads as if his main achievements were all in the second world war. However, becoming a Brigadegeneral amounts to more than just a footnote.

He was also decorated after the war. This is sadly something the English Wikipedia does not cover.

When you look at all the many English articles about German soldiers decorated in the second world war, the point of view is quite marked. They are not about the people involved because when they were, the article about Mr Dennhardt would be more balanced.

Sunday, June 22, 2014

#Wikidata - Mr Ashot R. Tonoyan, a MP from #Armenia

Mr Tonoyan died today according to the article about him on the Russian Wikipedia. The article has it that he was a member of the National Assembly of Armenia from 2008 until 2012.

There is even a category with many of these Armenian politicians. Surprisingly, this Russian category is aware of more of them than the Armenian category.

After a bit of tinkering, the Reasonator will know about all of them. Most of them will become known as Armenian, politician and all of them will become a member of the National Assembly.

#Wikidata - #Russia; member of the State Duma

There is great information about the current members of the State Duma. It is available on the website of the Duma. It is available in Russian and English. The only thing in the way of a claimed copyright is that you refer back to the website.

At this time Wikidata did not know that members of the Duma are politicians. A process is underway to make them members of the Duma in the first place (357 and counting).

It is important to know who represents the power that is inherent in a parliament and, it such information should not only be available for the usual countries like those in the "West". We should know about the Chinese, Russians, Iraqi, Indonesian, Filipino, Lybian .. parliamentarians. These people make a difference, they get in the news for good and bad reasons.

They are all notable.

#Wikidata - Places in #Iran

Bon Chenar is the first entry on a list prepared by Amir. The list is about places in Iran where there is an article on the English Wikipedia and an article on the Persian Wikipedia. Obviously it needs to be merged. There are thousands and thousands of places in Iran .. all with double entries in Wikidata and all in need of merging.

Amir produces reports, Magnus made a game out of it but in the end it needs people to mark all of these places as being the same or not. People who know English and Persian..

When these items are merged, the statements are combined. It is the best of times to curate the statements that have been made. However, when do you start improving things when it is so obvious that it will need to be done again and again?

Saturday, June 21, 2014

#Wikipedia - Joe Dorsey, an #American #boxer

When you read an article like this one about Joe Dorsey, you wonder what the Wikipedia article will be like. When you read the article, for me all the boxes are ticked of a person who is notable enough and, if not notable enough for an article about his person at least for the big fight he had with the state of Louisiana.

When you consider emancipation, this is the kind of story that is begging for an article.

#Wikidata - #Iraq has a parliament too

Given the #USA involvement in Iraq and given its traditional aim of bringing democracy to the world, you would expect both Wikipedia and Wikidata to have a wealth of information about Iraqi democracy and politicians.

The last list of Iraqi parliamentarians includes many red links and new elections have been held twice. The recent elections have a result and, ayatollah Al-Sistani has urged for the new parliament to convene. That makes sense in the light of the current fighting going on in Iraq.

Typically Wikipedia is great at providing back ground information. For Iraq there is room for improvement. I wonder where to find the Iraqi 2014 election results with all 325 elected members of the council of representatives. In the mean time Wikidata learned about more Iraqi politicians.

#Wikidata - #Suriname; its National Assembly

#Suriname has like so many other countries a parliament. Its National Assembly has 51 members and, its website provides basic information for all of them.

Dutch and Srananang are languages spoken in Suriname with a Wikipedia and both do not have articles for all the members of the National Assembly. At that it is not special.

For all the current members of the National Assembly a Wikidata item has been created. For all of them the electoral district they represent has been added. The Suriname electoral districts are different in that multiple people represent the same district.

With basic information about members of a parliament, it becomes possible to at least find them in Wikimedia Foundation projects. When people take an interest and flesh out the information about them, people may become informed about the people that represent them. What has been done for Suriname is basic but, it is a start.

Friday, June 20, 2014

#Wikidata - Who is Howard N. Potts

Stephanie Kwolek, the inventor of kevlar died. The death of someone notable is often a reason to flesh out the Wikidata item with relevant statements. For Mrs Kwolek none of the awards she received were mentioned.

Once these statements were made for Mrs Kwolek, the available information about those awards was added. The category "National Medal of Technology recipients" contained 125 recipients Wikidata did not know about. The "Lemelson–MIT Prize" was named after Jerome H. Lemelson. Similar information for the "Howard N. Potts Medal" is not available.. Who is Howard N. Potts?

Tools like AutoList2 and Reasonator make it easy to add information and to have the instant gratification of well presented information.

It will be wonderful when WikiData adopts these features in the user interface that is being designed.

Thursday, June 19, 2014

#Wikidata - a bug busted

Automated descriptions is the single most important productivity enhancement for Wikidata. Nothing else comes close. Most items do not have descriptions and many of these fixed descriptions do not exist in my language. Automated descriptions are not fixed and improve as more relevant statements become available.

I was getting depressed when they became absent after an update of Wikidata. How would I know what "Jodhpur" to choose when I want the Lok Sabha electoral district? I found how much I rely on this feature and how crappy Wikidata is without it.

Magnus came to the rescue and I have updated my "common.js".  It now says:
It works again for me. It can work for you as well and it will work for all of us when this functionality is available to all of us.

Please make it so!

#Wikidata - Dagfinn Aarskog, a Norwegian pediatrician

Mr Aarskog is or was a Norwegian pediatrician. One Wikipedia article suggests that he died in 2014 but did not give a precise date, there is no source for Mr Aarskog's demise either.

When you come across a person like Mr Aaskog, you read his article and find that a syndrome is named after him and after a Mr Charles I. Scott. Mr Scott, an American, did not have a label in English but he did have an article in German.

The German article has it that another syndrome, the "Hecht-Scott-Syndrom" is also named after Mr Scott. The other half in that name is Jacqueline T. Hecht. When you google for her, notability is quite obvious.. and she is a scientist and a female.

An awful lot of information is hidden in the nooks and crannies of many Wikipedias. Some of it filters through in Wikidata and as it does, it becomes available for us to share as it became "available knowledge".

Tuesday, June 17, 2014

#Wikidata - about splitters and lumpers

It would be really good when people who propose to split a property or propose to merge properties to read the Wikipedia article about splitters and lumpers.

When you consider Wikidata, it has its splitters and lumpers; a recent splitter drama wants to do away with "is part of". The idea is that because so many different things can be "part of" something else it is relevant to distinguish in the property used as well.

There are 299+1 arguments why this is a not the best of ideas.

Wikidata is very much intended as a multilingual project, all these nuanced versions of "is part of" assume that all other 299 languages are able to express these same "finer points". It should be obvious that this is not the case and having the same labels for properties that are meant to be different is an extraordinary bad idea. To give you a clue, in some languages a verb does not have a present or past tense..

The other argument negates the need for all this precision. Wikidata is proving to be really good at connecting to external sources. These sources typically have the same or similar properties. The wish to map these properties between sources has been expressed often before. When this has been done, it follows that those who are so eager to have these finer points can replace the Wikidata properties with equivalent properties as maintained elsewhere.

Monday, June 16, 2014

#Wikidata - କବି ପ୍ରସାଦ ମିଶ୍ର

In my ongoing project to document the deaths of notable people in 2014, a friend helped me with Mr କବି ପ୍ରସାଦ ମିଶ୍ର. He knows Odia, a language that Google translate cannot help me with.

It was obvious that when you only know someone as Mr କବି ପ୍ରସାଦ ମିଶ୍ର, nobody will find him if they do not know that this can be transliterated to Kabi Prasad Mishra. As this transliteration was added, we can now Google for this gentleman and find more information about him.

Subhashish asked my help in turn for a transliteration for Mr Герич Ігор Дионізович. This person is Ukranian and to be honest, I do not know how to transliterate his name into English.

Plenty of opportunities left in Wikidata :)

Sunday, June 15, 2014

#Wikipedia - To bot or not to bot

The controversy of bot created articles in Wikipedia is old. The arguments have been similar for a long time. The most famous bot is Rambot; it created stubs for all the places in the United States. This time the "controversy" is about Lsjbot. It created articles on "species" in several Wikipedias. A species is a recognised animal or plant or insect in its nomenclature.

The Wall Street Journal has taken an interest and, Achim Raschka was asked for an opinion and this can be found on his blog. Achim suggests that as a compromise information may be added to Wikidata in this way we can "provide the data to the authors for example when they start an article to choose if they want to use it".

Wikidata has its own problems with bot generated articles. Articles about species are possibly among the most problematic. The biggest problem is that typically Wikidata knows about at least of some items of a list. When this is not considered by the bot operator, it may result in a lot of not very interesting work merging items and deleting duplicates. In addition to this the conventions in Wikidata for "taxons" is quite different. Taxonomy is not static and consequently the hierarchy in Wikidata is inferred and not explicit. This is a major innovation that hides some of the problems of the taxonomy that is in use.

It is quite clear in the opinion of Achim that Wikidata should only be an aid to those people who write Wikipedia articles. The suggestion is made that Wikidata could serve as a source of information for instance for the Red and white giant flying squirrel, a subject he wrote about. Reasonator includes access to the "Concept cloud" tool, it shows all the articles referred to in any of the Wikipedia articles about the Petaurista alborufus.

Achim's opinion is truly Wikipedia centric. Wikipedia articles written by humans is his holy grail. The objective of the Wikimedia foundation however is "to share in the sum of all knowledge".  We have information about subjects like this giant flying squirrel available and, we can share it to the users of the 265 Wikipedias who do not have an article yet.

We can share in the sum of all available knowledge and we should as long as it is our aim to share our knowledge with everyone and not only with potential Wikipedia article writers.

Wednesday, June 11, 2014

#MediaWiki Talk pages

#Wikidata uses talk pages. The #Wikimedia Foundation is working on a tool called "Flow". As is usual, there are those people who want to keep everything the same. They love their system.

It is fine and dandy. When I added a comment on the Wikidata "Chat" I could not click on the appropriate edit button and get the right section. As it is ,Talk is flaky as well and it is not as easy and obvious as it is made out to be.

Tuesday, June 10, 2014

#Wikidata - P. Ramdas, a film director from #India

Mr Ramdas died on March 28. At the time it was duly noted on the English article and, that information found its way into Wikidata.

Currently there are not that many deaths left to process. There was one Wikidata item with an article in Hindi and Malayalam. As this was about the same Mr Ramdas, the two Wikidata items were merged. The sources indicated that he had been awarded the J. C. Daniel Award in 2007.

Some more people who were were awarded the same prize were added as well. It is a sign of quality when more people are known to have a similar significance.

Most of the people who are still on the list are problematic. Often languages that Google translate does not cover or cultures that use a different calendar, Iran, Arabia, Thailand for instance.

There are also the Wikipedias that does not give special attention to the recently departed. Their articles are under represented in this list.

Saturday, June 07, 2014

#Wikidata - Those who died in 2014

Magnus's "No date" game registers dates of birth and death. It shows that we need information for 1,258,038 of the 2,073,886 humans Wikidata knows. It is a challenge however, if Magnus proves anything it is that for a community such challenges can be met.

For the people who died in 2014, 5001 have been registered so far, it is a bit different. When people die, their articles need some attention if only to register their passing. Many of the people had their day of sporting glory a long time ago. Others like the Emir of Kano remained relevant until their end and their passing may influence many more articles.

Quality is often seen as how quickly new information finds its way in.  It is wonderful that Wikidata has the potential to flag the passing of all those known to have died in 2014 to the projects who have an article about them.

#Dutch (National #Library, #DBpedia, #Wikimedia chapter) cooperation

So far #Wikidata did not have much in the sense of GLAM attention. When the Koninklijke bibliotheek, the Dutch (DBpedia and Wikimedia) chapter have anything to say about it, this will change.

Their aim is to bring the sum of all Dutch authors to Wikidata. You can imagine that a national library knows about these authors. You can imagine that the Dutch Wikipedia knows many of them and so does the Dutch DBpedia. The great thing of this cooperation is that DBpedia will identify those authors Wikipedia knows so that a clean list of new items can be added to completely include all these authors.

There have been several meetings about it, an RFC happened in Wikidata and importing all this data is the first objective. The mapping of attributes for authors to both DBpedia and Wikidata will be published and when this meets approval the import will happen.

Once it is all imported, DBpedia will be aware of all the changes to both Wikidata and the Dutch Wikipedia. It will keep its data up to date and when this finds approval, it can do this for Wikidata as well. There is no license issue as the Dutch DBpedia considers any updates "real time" and all updates happen under the license of the project involved in their own process.

When this project is completed, when all the details have been worked out, other collaborations between these four parties are possible. For instance, many authors received awards for their works or for a specific book and there may be more books..

Wednesday, June 04, 2014

Lila, a personal request

What would you do when you find that Wil, your partner, finds the experience of reading a Wikipedia article much enhanced by a MediaWiki feature that is really well hidden. Somewhere, there is this option that makes the Wikipedia experience much easier for people with dyslexia. Do you know it exists? Could you find and enable it for him? Please do try and find it..

Yesterday we had friends for dinner. We got to talk and I found that one of them was helped with this feature. Interestingly enough he does not suffer from dyslexia. We talked about perception of text and my wife mentioned that she has to read one letter at a time to make up a word. For her this feature proved to work as well, she found that she could now recognise some words at a glance..

The feature is hidden really well. So much so that I have to search for it every time. I know it exists, I have written about it before. It is "only" 7 to 10% of a population that is dyslexic. The Wikimedia Foundation wants more readers, editors. I really wonder how much effort it takes to make this feature sufficiently prominent. It does not cost any development; the software exists. All it takes is the realisation that we can open up to so many more people.

Tuesday, June 03, 2014

#Sex ratios in #Wikidata

When you consider sex ratios in Wikidata, it is good to appreciate that there is more than just male and female. It means that every human can be associated in one way or other that is appropriate. What is often forgotten is that for many of the humans in Wikidata there is no information about gender.

When Max Klein came back from the Zurich hackathon, he revisited his work on Sex ratios in Wikidata. In his stats you find the ratio for many Wikipedias but I would like to have "none of the above" included as well.

It may give us an insight which Wikipedias provide us with the most gender information for known humans. When the data is available for May 2013 as well, we may learn from what Wikipedia we gained the most information. The number of links has grown but not by as much as the amount of statements in the same period.

Wikidata currently knows about 2,062,461 "humans" 1,544,137 are male and 277,830 are female. We know that many more humans will be identified and thanks to the gender game you can help reduce the number of humans without gender information.

Monday, June 02, 2014

"Have no feed reader"

Sjoerd has no feed reader, he uses something else. Nice !!

#Wikidata - Rejoicing for new rollback functionality

When Wikidata is to be relevant as a source for info-boxes, it has to include substantially more information. For humans it starts with really basic information.. human, gender, date of birth/death, nationality, spouse, child, occupation, office held.

It may sound odd, but once you have data, you can manipulate it. Without data you are stuck. Take for instance the Olympics, originally all the athletes were amateurs and the current practice is to make all of them professionals at their game.

You can imagine that to get much LOTS of data in, things go wrong occasionally. Consider for instance the directory "Brazilian people by occupation" or "American politicians" with a name like that you expect all the humans mentioned in those categories to be Brazilians or politicians.. They are not.

I am really happy that Autolist2 allows for rollback; it took care of expatriate footballers in Brazil. I am really happy with friends like Amir he took care of the problem with all those card carrying Americans. He even promised to make it a pywikibot program once he finished his exams.

Those expatriate footballers live or lived as residents in Brazil, those Americans may be Republican or Democrat. That is another opportunity for data inclusion. However, when you enter data wholesale, chances for lots of problematic edits are a reality. Statistically maybe not that much but enough for them to be noticed and, that is a good thing.

Rollback functionality is essential to maintain the quality we all seek. It allows us to make mistakes and make amends.

Sunday, June 01, 2014

Thank you Sue

Thank you Sue for a job well done. Thank you for moving us forward to where we are today. I expect that tons of private messages are going your way.

It is great that you are able and willing to move to the role of advisor. It must be hard. I can imagine it feels like your baby has grown up and is moving on. I am happy to say that we have been blessed with many wonderful women like you who all feature in the story of what is your and our success.

The substance of #Wikipedia

Ting Chen said it well in a recent post on the mailing list: "Don't bother with things that are too complicated, it is the content that counts".

I could not agree more. Writing a Wikipedia article is not something I do. It is not that I am not able to, for me there is too much that is in the way. There is the arcane user interface that I hope will be soon replaced and then there are the vagaries that are the Wikipedia policies and separate from them, their interpretation.
"Nowadays Wikipedia articles (across all major languages) are highly biased in style and in content to academic thesis. How references are used and put, the criteria for references as valid, are almost one-by-one copied by the standards from academic thesis. Content without references are by itself considered as delete candidates. Both of these strongly put up constraints on who can put new content in Wikipedia and what content is considered as viable."
To make matters worse as a result many articles are hard to read. They assume an academic understanding. The prose is often meant to appease the deletionists and are not meant for reading. A half finished article gets deleted, not improved. What made Wikipedia a reality was the ability to contribute, to be bold, to find a cooperative community.

People wrangle with the question what is happening with the community.. If you ask me it is because where we once were bold, we now find caretakers and people insisting on their "academic" ways. For me it is why we do not attract new people, it is why you will not find me edit Wikipedia much.

What to do about too many American politicians

#Wikidata knows disproportionally about US-Americans and the politicians among them. Many more politicians are waiting to be identified for what they are. When you add it all up, there should be as much as 47,872 politicians from the USA.

When you compare the number of US-politicians with all the Brazilians, all 30,123 of them, systemic bias is obvious. At this time we know in Wikidata about 36,384 politicians for the rest of the world, proving the same point in another way.

What would be neat is when Wikidata knew about all the politicians who are currently a member of a parliament anywhere in the world.

No individual #WMF grant for a #Wikidata economics project

One of our best and brightest is from #Iran. He is a very prolific contributor both to Pywikipedia and to Wikidata. His bot has the highest number of edits as a bot and, he is trusted to run his bot as admin.

Amir is part of a team that hoped to bring economic data to Wikidata. Their plan was sadly denied. The reasons given are imho a bit upsetting. Amir is from Iran and we cannot pay money into Iran it is said. Really? We do not have to pay into Iran, what is asked is to make funds available for a project. It does not follow that the WMF itself has to transfer money into Iran.

Another reason is that currently Wikidata does not yet support units properly.. That is in this instance not really much of an issue either because before the data is ready for upload, a lot of preparatory work is to be done. Getting the data, proposing missing properties, analysing the data you name it. Oh and yes, preparing the software that can handle this type of data.

In my opinion the decision not to grant this proposal is a mistake.

#Wikidata - Brazilians who died in 2014 II

As an effort was made to know which "humans" are a "Brazilian, it became less difficult to know how many Brazilians died in 2014.

Currently Wikidata knows about 30,123 Brazilians, more than 10% that are "known" on the Portuguese Wikipedia. Thanks to this effort, the Wikidata number of dead Brazilians at this time is 57.

It is obvious that more notable Brazilians died in 2014. Sadly, Wikipedia does not know about them or maybe it does and Wikidata does not know that a Wikipedia does.

Relevant is that Wikidata knows about more Brazilians than any Wikipedia.