Monday, September 30, 2013

#Wikidata qualifiers

When you state that someone is a "Fatimid Caliph", it helps when you can qualify such a statement. The Fatimid Caliphate ended centuries ago so it helps when you can state the start date, the predecessor, the end date and the successor of that Fatimid Caliph.

Qualifiers are particularly relevant when there are multiple "offices held" by a person. It is customary for Islamic nobility to gain experience as a governor of a part of the realm. This ensured relevant experience it also was the start of a power base and lead frequently to insurrections.

The current iteration of the qualifier functionality is powerful. It can do with several improvements.
  • every qualifier deserves its own source. 
  • when standard qualifiers are identified, it makes sense to have them in a particular order
  • the software harvesting information for Wikidata is not yet qualified to enter information properly
  • external data resources have to learn how to deal with Wikidata's qualifiers
At this time the only option is entering data manually. As you read the articles, you learn about so many factoids. You also learn how much work can be done to make Wikidata shine even brighter.

Sunday, September 29, 2013

Red links on the #Arabic #Wikipedia; the sultans of #Sennar

On #Wikidata, you will find a succession of the sultans of Sennar. The information provided is limited because the articles about them are limited. There is no date of birth or death information and for only a few it is known who the fathers, the suns are.

On the Arabic Wikipedia, the article about Sennar has red-links for the Sultans. When you include them on the articles at Wikidata, do they become "infra red"-links?

Sennar is an important part of the history of Sudan, and consequently it is relevant to have information in Arabic. Copying in the red links as labels seems obvious. The question is how to make it more obvious for the Arabic Wikipedians that Wikidata has information relevant to them.

Saturday, September 28, 2013

Amara Dungas, a sultan of #Sennar

Sennar is geographically part of what is now #Sudan. The history of the Sudan seems to be one of conflict with the sultanate of Sennar providing some continuity for over three centuries.

When you read about it, Sennar was a backwater and this allowed it to develop its own culture which was distinct. As so few people are interested in this culture, these people you will not find much information in Wikipedia.

The information you do find of the Sennar sultans in the English Wikipedia is problematic when you want to enter the data in Wikidata. The problem is not only that there is no article for many sultans, the biggest problem is with the dates.

The date when the reign of Amara Dungas ended is given as AH 940. This date is provided in the Arabic calendar and as the illustration indicates, this is either 1533 or 1534 AC.

Technically an acceptable solution is not that hard; when it is possible to say 1533s, meaning it can be up to one year earlier or later it would already be a big improvement. A proper solution means that dates can be entered in the original calendar. This is a lot more complicated but it will allow people to enter more accurate information.

Friday, September 27, 2013


Importing data from the #Polish #Wikipedia

All new information in #Wikidata has an origin. It can come from many sources and the quality varies. When Matmarex mentioned his source as a Polish project about information about persons, I wanted to learn more.

This is the kind of project that we should welcome at Wikidata. Please have a read and be happy with great undertakings like this.

What is the data you are importing based on
The data is based on the index of biographies maintained by hand by a few dedicated Polish wikipedians at Noty_biograficzne . The nature of created-by-hand data is that not all of it can be automatically parsed, but it was surprisingly consistent – accepting just several common variants resulted in over 60 000 items the bot could understand and only several hundred that could not be used (I am hoping to sort these out by hand). Some typos in the source are unavoidable, but overall the quality seems to be very high.
Getting quality personal information has been a project on the pl.wp for quite some time, can you explain what it means ?
I am not sure myself what the index was intended to be, not yet being a wikipedian when it was started in 2004 – possibly a crossover between a category system (the concept of a category was only introduced on that year, I don't know what was first) and a list of articles needing creation. Currently it serves as, well, an index – list of all biographies on the Polish Wikipedia, ordered alphabetically by last name (or, in some cases, by pseudonym). It's easier to find what you're looking for if you only remember last name of a person and possibly their occupation than using the built-in search system (for example search suggestions are ordered by article title – thus first name – and it's not possible to limit results to only biographies).
You are running a bot adding descriptions in Polish, what software are you using..
Unlike most bot operators I'm not using the Pywikibot framework – I opted for my own custom-written library in Ruby called Sunflower and a a set of scripts using it.
Does it use the Wikidata API and why is this important
Yes, both for uploading and using the information. The API is basically what made the project possible.
What other data do you have about all these people
The old index contains birth and death years in addition to the descriptions. I didn't upload it because it's basically unsourced (and unsourced data is seemingly as frowned-upon on Wikidata as it is on Wikipedia, if not more) and because, when I tried comparing them with birth and death categories on the biographies themselves, I found over 1500 conflicts.
Can you use your data to compare against the data on Wikidata
Not really; there isn't much to compare in this area, especially since I was uploading basically free-form text. Only several hundred items out of the 60 thousand my bot edited already had a description in Polish, me and a couple other editors reviewed them all in a few days.
The birth/death data could be compared, though, but I haven't looked into it. Any help would be welcome!
Can you add your data where Wikidata has none
There are a few things the index uses that are not yet present on Wikidata. The birth and death dates are the biggest one, real names of people using pseudonyms (such as Sting or Madonna) would be a valuable piece of information as well. I didn't try to upload either – the dates would need better sources (the 1500 conflicts are a strong indicator that information from Wikipedia might not be good enough) and there are currently no properties defined for first / last names because of how complicated the topic is (there is currently a discussion under way).
Did you know that this type of data becomes available on several Wikipedias in stub articles
I considered parsing the articles themselves to extract the descriptions, but decided that this would be too error-prone to automate entirely. Instead I developed a gadget that helps users write short descriptions for biographic articles by extracting the information from the lead-in paragraph and presenting it on the index pages – they can be adjusted by a human and saved to Wikidata with one click! This benefits both projects at once and I think is a good example of how they can work together.
Is it possible to transfer this biography project from the pl.wp to Wikidata
It could be done entirely on Wikidata and using Wikidata information, but there are two preconditions – presence of the required information on Wikidata (birth/death dates, last names for correct sorting) and ability to generate lists from the data (so-called "phase 3" could accomplish this).

Thursday, September 26, 2013

Thank you Denny

Denny has been the main man at #Wikidata. He has done a splendid job. I have known Denny for many years and I admire his achievements. One fond memory was meeting in Rome because we both happened to be there :)

Denny will join this big search company and I am sure we will have a friend in Denny at Google.
Thanks you,

#MathJax is localised at #Translatewiki

#Wikipedia has potentially articles on any subject. When it is about other countries, MediaWiki supports fonts when needed. When it includes mathematics, there is another need; the need to display formula properly. Mathjax has as its slogan: "beautiful math in all browsers" and it does do a great job.

The one thing that was holding back adoption of MathJax was its lack of support of other languages.

Now that the community of localisers at has adopted MathJax it is a different story. The lack of a localisation can be fixed in the time honoured way; you can do it. There is not only support for the localisation in potentially 300+ languages, there is also strong internationalisation support available when needed.

When your Open Source software needs an international audience, consider translatewiki.

Wednesday, September 25, 2013

Phillipe de Plessis, Grand Master of the Knights Templar not a citizen of #France

When Mr de Plessis was born in #Anjou in 1165, Anjou was a county that was inherited by the British king. It was in 1204 when Anjou became a part of France.

In Wikidata, it is said that Mr de Plessis was a citizen of France. This is obviously wrong. The notion of the modern French state did not exist at the time.

The whole notion of French citizenship is complicated as it is. Many people from France are registered in Wikidata as "country of citizenship" "French nationality law". This may be technically correct but it does not cover what is asked in the label. It should be France.

#Wikipedia zero makes more of an impact

The aim of the #Wikimedia Foundation is to share the sum of all knowledge. It has been often stated that we can only reach those people that have the necessary technology available to them. There is one other hurdle; the cost of the data coming to that technology.

Wikipedia zero is an initiative of the Wikimedia Foundation to enable mobile access, free of data charges, to Wikipedia in developing countries. The objective of the program is to reduce barriers to accessing free knowledge—one of the largest barriers being cost of data usage.

The WMF aims to convince mobile phone operators in the "global south" to provide the high quality and free information of Wikipedia. This is increasingly successful; currently Wikipedia Zero has been rolled out in 17 countries with more countries to follow.

Traffic from Wikipedia Zero is now growing really rapidly not only from new partners but also from autonomous growth from the existing partners.

With more people enjoying Wikipedia as readers, a next question is how to convert some of them to editors. The good news is that both and Wikipedia recently added functionality for editing from a mobile phone. Combine this with the language technology that allows Firefox and Chrome to support many more scripts and it is clear that the WMF does enable people to truly share in the sum of all knowledge.

The #Vietnamese #Wikivoyage is joining #Wikidata

The Vietnamese Wikivoyage has been live for some time now and, many of its articles were linked in the classical way to articles in the English Wikivoyage. Thanks to Dexot, all the articles are getting the royal treatment. The old links are deleted and they are linked into Wikidata.

Tuesday, September 24, 2013

As it is all about #DATA, let's use #Wiki principles to make it truly #Wikidata

#Wikidata is very much like a toddler; it is learning to stand on its feet. The #W3C did not put it in the centre of the data universe like it did DBpedia. It has however a few enormous advantages over DBpedia; Wikidata exists to be used.

The first use of Wikidata was to solve a data problem for Wikipedia. The interwikis are used and it is a success. Because of the LUA integration, Wikipedias are using Wikidata to provide information in their info-boxes .. an example ..

What people seem to forget is that Wikidata is also a Wiki. This means that the data may be wrong and, it is still the best information we have. As a Wiki, it is important that we aim to collaboratively work towards perfection. We identify Pablo Neruda in Wikidata with seven external data sources, 100 Wikipedias and Commons. With all these external sources, it should not be hard to corroborate all the "other" information like his date of birth, the year of his noble price etc.

Really all this academic "ontology" stuff is absolutely relevant and important in its own domain. The use of Wikidata however is not academic. Its use is to help us bring the sum of all knowledge to whoever is interested. It will be wonderful when we can use the data and the expertise from other resources as initial information or as corroboration of what has already been stated.

We know how over time quality improves. We know that a community makes light work of what is academically speaking not feasible. So let us ditch academic preconceptions and let us concentrate at the work that is at hand; getting the sum of all knowledge in good order for us to serve it to whoever is in need of knowledge.

Pablo Neruda, a #poet and #politician from #Chile

#Wikidata has a new feature; #Commons can now be linked as well as Wikipedia and Wikivoyage. It is a first step in a process that will  get the information that exists about all the 18,599,261 freely usable media files in a more usable format.  

The most obvious problem with all those media files is that it is hard to find to find anything. Another is that you have to know English and, it is not possible to easily find all the pictures of famous photographers.

Anyway, I experimented with Pablo Neruda because there was something about him on Facebook today. I added the most relevant link on Commons for him and then I checked out the other information as well. I qualified that his Nobel price was given in 1971. From the article I learned that he was a diplomat and a politician as well as a poet. I added the political party he was a member of. 

Someone as multi faceted as Pablo Neruda deserves good coverage on Wikidata. It is known for instance that he had two spouses and at least one child. Currently there is no way to indicate this without the creation of more Wikidata items. They may not be notable in their own right but given their association with Pablo Neruda they may be.. What do you think?

Monday, September 23, 2013

Te Rata Mahuta, a Māori king of New Zealand

Te Rata Mahuta was the fourth Māori king. When you google for him, you can find several pictures of him. The same is true for the three monarchs that succeeded him.

I am pretty sure that there are Wikimedians who have pictures in their personal albums of king Te Rata Mahuta and queen Te Atairangikaahu. It would be cool because at this time we do not have freely licensed illustrations for them.

When you consider the history of a country like New Zealand, the Māori are an integral and essential. It is important to have proper information and illustrations about the people and the culture of the Māori.

With material about subjects like the Māori in Wikidata and Commons basic information exists for a stub in any language. Check out for instance what we have about the first Māori king.

Saturday, September 21, 2013

Arthur Woodward, a #politician from #Scotland

Mr Woodward was the minister for #Scotland from 1947 to 1950. The party he called his own was the Labour party. The funny thing is that this fact is not easy to find on the English Wikipedia by a bot.

That fact was easily found on the Polish Wikipedia. They did have an info-box for Mr Woodward. It was easy to retrieve this bit of information from there.

Mr Woodward is one of a succession of ministers for Scotland. At this time it is not possible to retrieve all the facts of the many political offices people held and hold. There are several reasons for this;

  • the link of the office is to a "list of ... "
  • it is not possible (yet) to have the harvested information inserted as qualifiers to statements
  • often there are red links to predecessors or successors
As always, there is a need for more information because the information will be used in places where you do not expect it. So people from Scotland, why not ensure that there is great information from the people important to your history?

Friday, September 20, 2013

Maria Trzcińska-Fajfrowska, a #politician from #Poland

The #Polish Wikipedia has an article about her. As my bot is running, I look at some of its edits to check if the information is correct and I like to add some other useful information. In this case it is obvious that she is a person and female.

When you read an article, it is quite obvious if the subject is male or female. For a bot however, it is not that obvious. The sex of a person is typically not stated in an info-box.

Adding the sex in Wikidata is the right thing to do; Wikidata will be heavily used by automated processes and they work best when something like a sex, a nationality is explicitly stated.

With information about the sex, we can research to what extend women are present in the different language versions of Wikipedia. We can also find people of what nationality are under represented in our Wikipedias.

NB there are currently 1,337,293 persons, 783,746 males and 149,890 females in Wikidata..

Article not found in #Wikipedia? Check #Wikidata!

When you #search for #information, it may not exist. However, it may already exist on Wikidata. It is even possible that several homonyms on Wikidata exist without there being a Wikipedia article.

The suggestion to "create the page on this wiki" has always been a good one. It would even be better when the page is linked from the start with the information known to exist on Wikidata.

Thursday, September 19, 2013

Ata'ollah Mohajerani, a #politician from #Iran

According to the English #Wikipedia, Mr Mohajerani was born in 1954 in Arak, Iran. According to the Farsi Wikipedia he was born 2 Mordad 1333  in Arak, Iran. The Esperanto Wikipedia has it that he was born in 1953.

Dates of birth of people from Iran are often incorrectly stated. Typically the exact date is known, it is just that it is in a format we are not comfortable with. It would be good when the software used by Wikidata enables the use of Persian dates.

There are two parts to this; dates are used in the Persian date format on the Other Wikipedias use different date formats as well. When Wikidata information is to be used in these languages, it has to be able to show dates according to these different calendars. The information made available in these date formats is important in every project and therefore it should be possible to enter dates in the different calendar systems.

Tuesday, September 17, 2013

Jane Naana Opoku-Agyemang, a #politician from #Ghana

Mrs Opoku-Agyemang is the current minister for Education of Ghana. Her Wikipedia article was a mess; there were two of them. A long time ago it was suggested to merge these two. Both articles were not linked to a Wikidata item and, my bot complained as a result.

As I had decided to create the Wikidata item, I had to get the information right. I did merge the article and added many statements. I am sure that many people can write a better article, I hope you will find fault and improve the article even more.

I am sure that Mrs Opoku-Agyemang is notable and relevant.

Dick Cheney, a #politician of the #USA

Surprises like this is why #Wikidata is such good fun to work on. You have a sense of endless opportunities. I got to Mr Cheney because I noticed the father of Al Gore. I wondered if it was known they were father and son. To my amazement I noticed that it was not even known that he was a Vice President of the United States. As a consequence of adding this information I noticed that Mr Cheney and Mr Quayle were not known as Vice Presidents either.

I am sure there are enough people to lavish some TLC on these gentlemen in Wikidata.

When information like this is lacking, it is obvious that information about ministers from Sweden or Madagascar may not exist at all. It probably takes people from those countries to lavish their TLC to the people that matter to them.

Some answers about the heady stuff of #Wikidata

I asked Emw questions as a result of his email about the migration away from the "GND main type". I am happy with the answers I received and I hope you will enjoy reading them.

At Wikidata, I contribute to discussions about properties, where I espouse using W3C recommendations and conventions from the wider Semantic Web. I'm also active in discussions about how to model molecular biology data.  Outside of Wikidata, I've been an active contributor to Wikipedia and Commons for several years.

I only had time to answer three of your questions, but I did that much pretty extensively.  The remaining questions are mostly beyond my knowledge and I don't have any well-formed opinion on them.  If you'd like, I can try to answer those questions or others next week.

My answers to your questions:

1) The GND system has been ditched. Can you explain why this is a good thing?
The GND main type property has several major problems. Deprecating that property helps us focus on better solutions for classifying knowledge on Wikidata.
Major issues with P107:
  1. With the GND main type property, "person" can mean things well beyond the common understanding of that word.  It can mean things like Coco Chanel -- i.e. 'person' as conventionally understood -- or it can mean a god, literary character, pseudonym, collective pseudonym or spirit.  The standard response to this glaring issue is "'person' is meant to generalize, don't take the term literally". That is not a sufficient solution.  If a classification system for all human knowledge considers Vishnu and Coco Chanel to be both be 'persons', that's a big problem.  Beyond giving users bizarrely unexpected query results, it means properties that should be safe to assume for any given 'person' item simply cannot be.
  2. Any item that is not a person, place, event, organization or work is classified as a "term", which contains virtually no information.  We need to be able to classify things like gravity, carbon, DNA, cancer, clarinet, Twelver Shia Islam, fashion boot, dog and potato as more than simply "terms".  One sixth of the property is kruft.
  3. Not even the GND directly uses GND main types.  The GND Ontology has a hierarchical class system and the Deutsche Nationalbibliothek -- which developed it -- uses the lowest-level, most specific GND class available for a subject.  This indicates that the GND senses the GND main types are not appropriate to use as they are with P107.
  4. The nature of P107 implies that the property is only for the highest level of classification, and that additional properties would be needed for each level in the hierarchy of classification for lower-level types. This would entail lots of unnecessary work to create and update classifications. For example, want to specifically classify Nauru? If property P107 were to persist, then you would need to add something to the effect of "main type: Place" and "subtype: Administrative unit". The problem gets drastically worse for subjects with more levels of classification, like organisms, instruments, molecules, diseases, towns, etc.
The GND system itself -- the GND Ontology -- is not the real problem.  The real problem is that P107 is a "main type" property.  In a project to structure all knowledge -- which Wikidata is -- restricting all items into a small set of types will inevitably lead to many, many classifications that are either A) too broad to be useful or B) simply incorrect.
2.  You sent an email where you asked for attention for what is to be next. Why should there be something next?
Because -- although it is complex -- the world has structure, and classes or types are a useful way to express that structure.  The lopsided debates in the Primary sorting property RFC indicate that so-called "main type" properties (sometimes also called "principal group" or "primary sorting" properties) are a bad idea.  However, that does not mean that the basic notion of grouping things into "types" or "classes" is also a bad idea.
A much better solution for classifying things is to use "type" properties recommended for the Semantic Web by the W3C -- that is, use rdf:type and rdfs:subClassOf. These properties exist in Wikidata as instance of (P31) and subclass of (P279). These properties have been part of W3C recommendations for the Semantic Web for almost a decade. They are fundamental properties used in large controlled vocabularies to structure data into knowledge.  They facilitate classification at an arbitrary granularity.  Together 'instance of' and 'subclass of' can classify all subjects and be used to determine precisely where each subject exists in the hierarchy of knowledge -- or, perhaps -- a collection of hierarchies of knowledge.

Not only do they solve those structural problems of P107 and other "main type" properties, but by being based on W3C recommendations, instance of (P31) and subclass of (P279) also make Wikidata more interoperable with the rest of the Semantic Web.
That said, deciding on properties like P31 and P279 is only the beginning of forming a better way to do classification on Wikidata.  We need a way to map the information in P107 to use P31 and P279.  That's a topic of active discussion on Wikidata.
3.  The GND is a library system then you mention upper ontologies. What is the difference, and how are they practical in the Wikidata context?
The GND (Gemeinsame Normdatei) authority file is used as a library classification system, but it's based on the GND Ontology.  The ontology has a hierarchy of high-level entities and sub-classes.  The P107 property is based on those so called "high-level entities", which were called "main types" in Wikidata as shorthand.  The main GND types are person, place, event, organization, work, term or "undifferentiated person".  These main types are fine as a way to classify items of general interest in a large library, but they're much too small to form a sound basis for a classification system for all human knowledge. 

That's what upper ontologies are for.  An upper ontology is a way to have standard vocabulary about high-level entities in our world.  The idea is to formalize these very general concepts in a way that captures the richness of human language while also being precise enough to be machine-understandable. 

For example, the Suggested Upper Merged Ontology (SUMO) sets a class "entity" as the most general type of thing -- everything is an "entity".  From there, SUMO classifies things in the world as either "physical" or "abstract".  "Physical" things can be "objects" or "processes".  "Abstract" things include so-called "set-classes", "propositions", "quantities" and "attributes".  (More information on SUMO is available in Towards a standard upper ontology.)
There are several other upper ontologies available, like BFO and UMBEL.  I am not an expert in ontologies, and I have not learned enough about each of them to make an informed statement on their advantages and disadvantages.  However, because they seem to offer unifying terminology for different domains of knowledge, upper ontologies strike me as something worth consideration by the Wikidata community.

Monday, September 16, 2013

Why the statement of #political #party is relevant in #Wikidata

The #Occitan #Wikipedia uses Wikidata to populate information in many stub articles. There are several Wikipedias that use information in this way. Running a bot that reads "political party" information from the English Wikipedia will enrich all of them slowly but surely.

This does not add any value to the English Wikipedia. When you want them to adopt the information from Wikidata, there has to be a strategy that will demonstrate the added value of Wikidata. The most obvious method is to run a bot on another Wikipedia that collects "political party" information. As this adds information lacking in the English Wikipedia, it makes a strong point for using this information from Wikidata.

Somewhere in the future a tipping point will be reached. This happens when people from multiple communities start updating their information in Wikidata. This tipping point will be different for each project. Arguably the projects that have little data of their own will be the first to reach this point. Their first challenge is to add labels in their language for all the items they use.

There are two ways of helping this process on the way; adding information by bot or by hand. Reaching these tipping points will happen but it will take a lot of hands to make it happen sooner and not later.

Mary Burke, a might be #politician from the #USA

The #Wikipedia article on Mary Burke is the kind of article where I seriously wonder why the hell is this person considered as notable.

She MAY run in an election and when you read about her, you will find she is buying her way into the election. She did study at some universities, maybe this is what makes her qualify...

Sunday, September 15, 2013

The #bankruptcy of #Detroit is of international relevance

When this bankruptcy happens, a lot of money invested by people for their old age will be vaporised. These people live in many countries, including the Netherlands. Some people suggest that this is a victimless crime but I suppose it is not their money that gets lost.

Anyway this article informs you about Detroit. It features quite prominently the mayors of Detroit who bear much of the responsibility.

The story of the bankruptcy of Detroit has not come to a close. It is likely that more Wikipedias will become interested in these gentlemen. Wikidata has a lot of information on them as a start.

Simma Holt, a #politician from #Canada

When you are interested in the message of Mrs Holt, you can read her books. When you read about her, it is bound to be relevant and surely of interest.

There is a Wikipedia article in English about her and, there is now a Wikidata item about her. Given that she is a Canadian politician, I would expect that there will be an article about her in the French language in the future.

When you run a bot adding the party affiliation to persons, you find that the overwhelming majority are American or British. There are hardly any politicians of other countries. When you see a name that seems to be not American, you find that the context of these people is lacking.

On the other hand, it is fun to work on the information that is lacking. Mrs Holt is easy; she does have her VIAF registration. People like Mohammad Nabavi may have a VIAF registration. He does not have a Wikipedia article yet.

When you look at criteria for notability, it seems to me that anyone who has ever had a political appointment has his English Wikipedia article. I wonder if the same criteria are used for politicians of the other countries.

Saturday, September 14, 2013

Amr Moussa, a #politician from #Egypt

Mr Amr Moussa is an Egyptian politician and diplomat. When you look at his infobox on the English Wikipedia, you will agree that he is quite a distinguished politician. The question is what does it take to enter data for Mr Moussa in Wikidata.

I had to add two new Wikidata items:
  • Ambassador to the United Nations
  • Secretary General of the Arab League
I added the text "of Egypt" to the item "Minister of Foreign Affairs".

The label of "Ambassador to the United Nations" on the info box links to "List of current Permanent Representatives to the United Nations". Mr Moussa became minister of Foreign Affairs after the end of representing Egypt at the UN.

I did not create items for his wife and daughter.

About #political parties in #Wikidata

Adding data to Wikidata does not have much of a priority. Yes, there are some people who run bots but when you can only run bots and rely on the quality of the programming it is not nice.

Within the environment it is the that has the most potential. At this moment I am running a script that adds one political party to people who have the template "Infobox Officeholder" on their Wikipedia article. These are people like Nancy Pelosi or Moritz Leuenberger I am adding the first political party mentioned for them.

I am really happy that I am running a hacked version of the software; two bugs that are show stoppers for the autonomous running of the software have been fixed in my version of the software. I can only hope that some real work will be done. A list of all the problematic content would be nice.

Friday, September 13, 2013

An un-#American #Hero ?

I received an e-mail about Antoinette Tuff. According to some Wikipedians she is not notable enough because as the text says: "nobody died here". What is notable, what made me sit up at the time, is that she prevented another school massacre by reasoning with another armed loon.

When someone can only be considered to be a hero when shots are fired something is so wrong. These values alienate me.

For me this issue has more relevance than PFC Manning deciding what sex (s)he has.

Wednesday, September 11, 2013

#SignWriting organisation adopts #CC-by-sa

The problem with "#Free content" is that some people do not appreciate its meaning. Free means: you can do whatever you like with this. The problem these people face is that they apply what their understanding of copyrights is and insist on a license.

Only for these people, the CC-by-sa license has been added. It is to make sure that it is understood that all the information is there to be used. It is to make sure that you can and are encouraged to make it your own and adapt it as you see fit.

For everyone else, there is no way where the SignWriting Foundation will enforce the legalities of the license. Yes, it is nice to be acknowledged but please do share SignWriting as widely as possible because it is a gift to be able to write the language you sign. You are *FREE* to write without any thought or any further consideration. Please do write your sign language.

#Wikimedia Foundation has an "Official #Website"

#Wikidata has a new property type; the "URL datatype". It is always fun to add the more obvious examples first.

Monday, September 09, 2013

#Wikipedia #Featured list: 11 Presidents of #Pakistan

There is a new president of Pakistan and consequently, it is fitting to feature a list with all Pakistan's presidents.

The list was feature worthy for Wikipedia, so I had a look at the presidents at Wikidata.
  • only one President was marked as such
  • it was obvious for one President that his party affiliation was out of date
  • the new President was not marked as a "person" or "male"
  • President Zulfiqar Ali Bhutto has some well known family members
Maybe the Presidents of Pakistan can become an featured effort for Wikidata when more people give it some tender loving care.

The history of #Islam in #Wikidata

You would expect the #Arabic #Wikipedia be the richest resource on the history of the Muslim world. The reality is a bit different. I have been adding information for several Muslim dynasties, particularly those who had their territory in Northern Africa. To my amazement I find that the Catalan Wikipedia does a better job at providing information.

To some extend it makes sense; Spain and Portugal have been ruled by Islam for a long time and consequently these dynasties are part of their history as well.

The genealogy of the Zayyanid dynasty is complicated. What you see in the picture are Zayyanid emirs. The problem is that you have to trust the Wikipedia information it is based on. Abu Hammu III for instance ruled from 1517–1527 and is said to be the son of Abu Abbas Ahmad who ruled from 1430–1461. This means that he was at least 55 when he became emir.. It is possible. His brother is said to be his successor for another 13 years; this I find unlikely.

To create this image, it was necessary to create several Wikidata items, many English labels and all of the genealogy information. It would be really cool when this information is verified and completed with source information. It is only a stub for information in Wikidata.

Saturday, September 07, 2013

#Wikidata supports 280+ Wikipedias and 10+ Wikivoyages

Every #Wikipedia needs great quality facts. Every Wikipedia faces the challenge of maintaining the facts it has in its articles and in its infoboxes. It is like the Herculean task of cleaning the Augean stables; it never stops and you never know where and when another fact has to be cleaned up. It is even worse, the same fact has the potential to be crappy in 280+ Wikipedias.

With Wikidata there is a new way of making the most of a bad situation. It is still necessary to update facts whenever they change but this change is made at Wikidata. Every project that uses this fact will be updated. Changes go to the recent changes of those projects and to the watch lists that include the articles involved.

There is only one thing needed to make this happen. People need to trust Wikidata and the existing mechanisms to fight abuse. There is no alternative as there are no mechanisms to flag changes in Wikidata where the data in Wikipedia is not served by Wikidata. The most compelling reason to serve data from Wikidata is that we just do not have the capacity to fix by hand all changes everywhere. The fire-hose of changes of facts will prove too much.

Improving #privacy and #security

There is little privacy left and there is a lot of FUD about the extend our privacy and security has been compromised. On a mailing list there is even talk about the possibility that the Wikimedia Foundation may be forced to divulge information to one of the American Secret Security organisations.

The good news is that the security and privacy for most users of any of the projects has been improved substantially. As our commitment is first and foremost to bring knowledge to all the people of the world, the only exceptions have been made for people accessing our projects from China and Iran.

There is talk about implementing OpenID for use at the Wikimedia Foundation. Making OpenID available for all of our users would be really welcome given that the use of single sign-on has been largely usurped by companies like Google and Facebook. The Wikimedia Foundation is probably the only organisation that is not commercial that has a fighting chance to be accepted as a provider of single sign-on. announced that it will end its services in 2014. I would dearly love to have a replacement in the Wikimedia Foundation as my provider of a single sign-on service.

Thursday, September 05, 2013

10 Questions about #VIAF, #Wikidata and the #World

Max Klein is the Wikipedian in residence at the OCLC, which is a worldwide library cooperative, owned, governed and sustained by members since 1967. Max runs VIAFbot, this is the bot that has added VIAF identification to people who have a record in Wikidata. Who would be better placed to ask what this is all about? So enjoy

Please describe what VIAF is and why it is relevant
We've all joked about what it would be like if people had numbers instead of names. The funny thing is though it would be much more convenient to organize our information about people if we did have numbers as well as names. National Libraries have already done this behind the scenes to make the reader's life easier. It's just that each National Library did it independently, so VIAF (the Virtual International Authority File) matches the National Libraries to each other.
In Wikidata links to VIAF and other repositories are mentioned why is this done.
Wikidata, however great and revolutionary it will be, is not the first big online database. What's cooler than any one big online database, is connecting their records up with "same-as" and other relation, so we can use the databases together. Putting VIAF IDs in Wikidata allows operations between the two databases.
What kind of information do people get when they click on a VIAF number in Wikipedia or Wikidata
When you open a VIAF link you see all the information that National Libraries have about that person. Usually this includes details of their life, like dates, and the titles they wrote.
Wikidata, Wikipedia, VIAF etc all have their own data, how do differences get reconciled
One constant annoyance I have is that in algorithmically matching records there is always a small error rate. That means I'm constantly getting messages of records to correct. Luckily with Wikidata or Wikipedia, the user can just edit the record to make it right. It's my challenge to watch those changes and see if I need to correct another source like VIAF.
How relevant is Wikidata as a data repository
Wikidata is going to be vastly, hugely, unimaginably important. We gave up on trying to give a structured representation of the world some years ago because it was just too big a task. That was before Wikipedia proved a method of doing big tasks. A fantasy could be realized with Wikidata.
What does it take to gain relevance for Wikidata
Wikidata will become not just relevant but crucial in subtle way when researchers and programmers start using it as an knowledge base for Artificial Intelligence.
How important is it that Wikidata serves so many languages
That's always difficult for me to think of as a Native English speaker. But when I recognize the danger of assuming an English-only world, I realize I'm the most important person to be multilingual-sensitive, since I have the least interest in being so.
How extended is information about the "third world" in VIAF
I don't know precisely. My research shows that VIAF is only slightly less sexist than Wikipedia, but basically just as sexist. It's a hypothesis then that it's just at biased about not including "third world" information.

When Wikidata has information missing in VIAF, is it interested?
I would hope so, and I am working on it. In some cases Wikidata has more information that VIAF is missing, than VIAF has information that Wikidata is missing.
Can we trust VIAF to keep its information
If It's good enough for the Library of Congress, Deutsche National Bibliothek, Bibliotecheque nationale de France, and about 20 more, I'd hope it would be trust worthy enough for the Wikimedia community.

Wednesday, September 04, 2013

Paul von Hindenburg. #German President because the text says so.

The #Wikidata information for Paul von Hindenburg says in the English description that he was the President of Germany. Given what Wikidata is about, such a statement should exist as a "statement" and it could be accompanied by "qualifiers" like the start and end date and a predecessor and a successor.

My bot aborted on Mr von Hindenburg because it said in text that he was independent of any political party. There is now a link to the [[Independent politician]] article. Wikidata now knows for all the German Presidents what party they were affiliated with.

Information lost in #VIAF

In a blogpost someone was really happy to find a link to VIAF for Aziz Ali al-Misri in the Wikipedia article. Today the information given at VIAF is:
This VIAF Cluster has been deleted. It is no longer part of VIAF.
The good news is, that you can still find that the information existed. The bad news is that you cannot rely on VIAF to persist in keeping all the information it collects.

Finding information about people who are significant in the history of countries like Egypt or Turkey is not easy. Probably there are other repositories that include people like Mr al-Misri. When they exist, they should be referenced as well in Wikidata.

I did add a few statements to Wikidata, many more could be added based on the information in the Wikipedia articles. I did not do this because it would side track me too much from other things.

Add data to #Wikidata

When #Commons started, people added pictures. These could not be used because the MediaWiki software did not support it. With the support of displaying Commons pictures in Wikipedia, people started to remove images from the Wikipedias because there was no longer a need for them to be stored locally.

When the values for statements are the same in Wikidata and a Wikipedia, there is no longer a need to keep that information locally. The argument against keeping information locally is that when data needs an update it only needs to be updated in Wikidata.

This is probably the least disruptive way forward.
  • Wikidata needs to include all the data of the data types it supports
  • we need functionality that compares data in a Wikipedia and removes it when it is the same
The functionality needed does not exist at this time. But we can add data by hand. We can build templates that support data from Wikidata. Smart people will concentrate on including data that a bot will not be able to retrieve. Bot operators will run their bots as far as they will go. The Wikidata developers will continue developing support for the data types that we are waiting for.

There is something for everyone to do.

Tuesday, September 03, 2013

The best #news from the #Wikimedia #Language #Engineering Team II

Sometimes the best new gets better when you are corrected: Santhosh indicated that Firefox also supports the Wikimedia Input Tools. It follows that both Mozilla and Google have an extra incentive to make sure that their script support at least meets the standard set by the Wikimedia Foundation.

A modern browser fit to support the whole of the Internet has to be inclusive. The Wikimedia Input Tools prove to be an important part of the solution.

The best #news from the #Wikimedia #Language #Engineering Team

The #Internet is the #Browser. Consequently when the best browser experience becomes more inclusive because of the support for the languages the Wikimedia Foundation provides it is quite something. The Chrome browser is now able to support 62 languages with input tools. Supporting additional scripts starts with providing the Wikimedia Language engineers with another keyboard mapping.

Once language communities find out that their language is supported as well, the use of this functionality will surely sky rocket.

The Wikimedia Input Tools will enable more people to communicate on the Internet in their own language. It is an obvious precursor for the provision of knowledge in that language, that script.

Monday, September 02, 2013


The president of Latvia
#DBpedia indicates the current political party for  Mr Andris Bērziņš, separately his previous party affiliations are indicated. 

Wikidata uses qualifiers; it is possible to indicate start and end dates for the different affiliations. This is superior however, including all the existing information from DBpedia in Wikidata will boost the usefulness a lot. Sorting out all the qualifications is something we cannot do with current software anyway.

The need for a mass merge in Wikidata
Many of the newly created records have been identified as being duplicates and have already been merged. The example I used in my blogpost of the genus Hersilia has been given the label of "long-spinnered bark spider". A genus is a group and consequently it should be a plural. Because of renames like this it has become more difficult to identify candidates for merges.

The President of #Latvia

#Wikidata needs more data to gain relevancy. At issue is where to get the data from. Some argue that it is to be preferred to retrieve data from the many #Wikipedias. This would be acceptable when the technology exists to retrieve the data.

The current pywikipedia functionality breaks easily and it does not take in all the available data. Consider Mr Andris Bērziņš who is the current president of Latvia. As you can read in the infobox, he was a member of the Communist Party before 1990. The bot will only identify him as a member of the Communist party of the Soviet Union. It does not add the qualifier (Before 1990). Mr Bērziņš is currently not affiliated with any political party.

The best way to present this information in Wikipedia is probably to have the most relevant information at the top. That would have helped the bot. For the bot it would be good if it accepted all the information until the next label. When the information that cannot be parsed goes into an "error file", the qualifiers can be added later by hand.

As it is not possible to reliably retrieve data from the Wikipedias, the arguments against using data from DBpedia lose their relevancy. This data is available for use warts and all and is a big improvement over the current lack of data and the inability to get the data out of the Wikipedias.

The need for a mass merge in #Wikidata

At the Swedish #Wikipedia they created many new articles of animal and insect species by bot. According to a mail reporting on the progress, close to a million articles were created.

This has been a huge undertaking and this success has been repeated by running the bot on the Cebuano and Waray Waray Wikipedia as well.

The problem is that for instance on the English Wikipedia many of these taxons already exist and already have their own Wikidata items.

Two Wikidata items have just been merged, they are about the Hersilia, a genus of spiders. As an item has been created for each "taxon", I am quietly confident that over 100.000 duplicates exist in Wikidata. Probably more.

This probably means that the best approach to creating new articles with a bot is by first introducing the data to Wikidata. It is not nice to have to merge so many items. In this case the data in infoboxes can be compared. This will likely indicate when the subject of the items is identical.

This is a nice puzzle involving a lot of data.