Wednesday, October 29, 2014

#Wikimedia - Men at work; preparing a #presentation IV - #WCN2014

The Dutch community has one question to answer: what to do with available information in Dutch? How will we make it available. Currently there are 3,054,955 items [1] with labels and there are 1,890,905 items [1] that link to the Dutch Wikipedia. It follows that 62% of the in Wikidata known items do not have an article in Dutch.

This is a substantial amount of information that can be presented in Dutch. Similar numbers can be presented for any language; for English it is 39% and for German 121%..

Arguably, these items fulfill notability requirements somewhere. Arguably the Swedes have demonstrated that having more information available revitalised their community. Arguably, allowing for search results from Wikidata is an easy first step towards opening up all our available knowledge.

[1] these links take a few minutes to load; they provide real time information

Tuesday, October 28, 2014

#Wikidata - #algorithm for updating labels

Amir is the #pywikibot guru; he runs dexbot and it is the only bot with more than 20.000.000 edits. Amir regularly tinkers with the routines that he uses. Sometimes he gets better performance, sometimes he gets a better result.

The algorithm for adding labels has changed several times and, the result of the latest change can be seen in the statistics below. You may notice several spikes, the last one is captured in the last dump; it resulted in many more labels for items where already one label existed.
It is people like Amir qho make a real difference. One bot request of his for Commons will help the Commoners see that Wikidata knows about the people mentioned in the Creator templates. Jobs like this are essential when the wikidatification of mediafiles is to succeed.

#Wikimedia - Men at work; preparing a #presentation III - #WCN2014

The bane of every live demonstration is when the software just does not work. My intention is to show #Wikidata in action. Demonstrate the Reasonator and AutoList2. When the experience of the last few weeks is anything to go by, I have a 50% chance of a reasonable result on the day.

There are many factors that can play up. Time outs at Wikidata are no exception at the moment and when Wikidata does not play ball, everything downstream from it suffers as a consequence. It means that I may not have a recent list of recent deaths because ToolScript does not function.

AutoList2, relies on WIDaR. It relies on being able to contact Wikidata reliably. Without this, AutoList2 does not run.

The subject of my presentation is firmly solution oriented. I can always fall back on screenshots. That feels like cheating.

Monday, October 27, 2014

#Wikidata - #dead in #2014

A milestone is often a reason for celebration. Wikidata now knows about more than 10.000 people who died in 2014. This is more than is known for all of 2013 in Wikidata but we "know" about 4292 more people who died in 2013. For 2014 the death of some 329 humans is waiting to be registered and obviously there are two more months to go.

People wonder what the attraction is, killing people of. Registering a death is not nice; it is only worthwhile because of the potential it has:
  • Reasonator displays the latest information
  • Wikipedias can compare what it knows and what Wikidata knows
  • External sources can compare what they know and what we know
  • It can trigger attention for the people who died
It takes time for such effects to be realised..

#Wikimedia - Men at work; preparing a #presentation II - #WCN2014

Mr van Asselt was a prominent professor at the Utrecht University. He is one of many professors known to Wikipedia. Given that I regularly harvest data from categories, it makes sense for me to use the English Wikipedia as it has an article about Mr van Asselt.

The equivalent category on the Dutch Wikipedia knows about more faculty members and then there are categories in several other languages as well. All of them may know about even more faculty members.

As we aim to share the "sum of all available knowledge" with our readers, Mr van Asselt is a timely reminder to the audience of the Dutch Wikimedia conference that no Wikipedia does know it all.

#Wikimedia - Men at work; preparing a #presentation I - #WCN2014

This saturday I will present about #Wikidata at the annual conference of the Dutch Wikimedia chapter. As I have a day job too, I have started preparing. I want my presentation to be factual, challenging and inspiring.

The facts are simple; Wikidata is almost two years old. It started with incorporating all interwiki links. The development team is really small, it does an awesome job and typically Wikidata is available, responsive and up to the job. The ambitions are huge; the challenge is to add to the existing work load while keeping the ship afloat.

If there is to be a challenge in my presentation, it will be that our aim is "to share in the sum of all knowledge". Our aim should be to share all the knowledge we have available to us with our readers. At this time only a few Wikipedias go the extra mile and will inform that we have information available in one of our other projects. This is done by adding results from searching Wikidata and showing as much text as is available in the local language.

One challenge is to do this for the Dutch Wikipedia as well.

Saturday, October 25, 2014

#Wikipedia - The Manley-O.-Hudson medal

One of the recipients of the Manley-O.-Hudson medal died. The article prominently mentions that Mr Lowenfeld was a recipient and it refers to the article about the award where all the recipients are mentioned. Both articles only exists in German.

Wonderful news is that Magnus did it again; his Linked Items allowed me to associate many humans with this award.

When you consider international laws as being important, all the recipients of this award are important. A great reason to have at least the basic information available in any language.. including English.

#Wikidata - #vaccines

For Wikidata, those items that are not known to be "something" are the worst. There are many of them; the last processable dump had some 3,758,186 items without any statement. Injecting them with a healthy dose of substance makes it easier to process them.

As people increasingly read about ebola, vaccines developed for ebola gain attention as well. In Wikidata they are now known as vaccines. I have no clue how to indicate that a vaccine is intended for a specific condition like whooping cough, measles or ebola.

PS I loved the cartoon produced by the "Anti Vaccine Society". Do note the golden cow in the picture. :)

Thursday, October 23, 2014

#Wikipedia - One size does not fit all

In Wikipedia we are used to see our readers as one big group. They all read the same article, they all get the same info-boxes and they all get the same categories. It is a reasonable approach when Wikipedia is only a pile of text without data to separate out potential differences in interest.

One obvious consequence is that reasonable expectations decide what is shown and what it looks like. When there are too many categories, they no longer get attention. So what categories should be shown? The problem is that this "one size fits all" approach shows too much for some and too little for others.

Thanks to Wikidata it is possible to allow for preferences. For many categories Wikidata knows what they are about; they show for instance humans and their alma mater, their sports club, their gender... When our public has the option to choose what category of category they are interested in, there is no longer a "need" to choose what categories to keep. It is just a matter of making the choice what categories to show by default.

Any and all other category of categories are then selectable by the reader.

Tuesday, October 21, 2014

#Wikidata - Thank you Magnus

Mr A. H. Halsey is the first person who can be put to rest now that the ToolScript works again. Mr Halsey was a sociologist, he died 14 October 2014.

Thank you Magnus, you are wonderful.

Monday, October 20, 2014

#Charkop - a Vidhan Sabha constituency

Data about politics, politicians regularly finds its ways to Wikidata. When an item gets my attention, I often add all associated items to Wikidata as well. Charkop is a consistency in Maharashtra according to an associated category there are many more.

Given that the software I use is broken at this time, I can blog about one dilemma.

Charkop is a Vidhan Sabha constituency it is part of the Mumbai North Lok Sabha constituency. The question is if Charkop "is in the administrative territorial entity" of Mumbar North or Maharashtra.

#Google - Let us #share in the sum of all #knowledge

Dear Google, in our own ways, we share the aspiration to share in the sum of all knowledge. We are really happy to share everything we have with you. Our licenses are designed to share widely.

Dear Google, could you please help us make sure that our Labs webservices survive your bots? What we do not want is for your bots not to run. What we want is for our webservers to serve our own needs first and use all the spare capacity for you. As it is our software dies.

We really want you to have our data and, there are several other ways whereby you can get all out data any way. For this reason please help us with our software so that we can continue to share the sum of all our available knowledge with you.

Sunday, October 19, 2014

#Wikidata - P1472, the #Commons #Creator #Template

The work of many artists is represented in Commons. Having great information available for all of them is a Herculean job. Having all that information and more available in all the languages that are supported by the Wikimedia Foundation is very much an aspiration.. Once Commons is wikidatified, all information needs to be understood in all our languages..

France Prešeren is one of 13,481 people who currently have a Creator template and are known as such in Wikidata. All the data in those templates can be harvested and included in an Wikidata item. For all the templates NOT known in Wikidata, an item can be found or created to make them known in Wikidata as well.
A lot is already known about Mr Prešeren in Wikidata and much of that data can be expressed in multiple languages. The same can be said for the Creator template itself; as you can see, the template already shows its labels in multiple languages. With Wikidata we can show the information in all our languages as well.

Realising this will introduce the Commons community in a positive way and reduce one obstacle that needs to be overcome during the wikidatification of Commons.

Saturday, October 18, 2014

Bringing #Wikidata to #Commons, one step at a time

There is this big project that is to bring structured data to the 23,422,581 media files that make up one of the biggest resources of freely usable media files.

It is to bring many different benefits to the users of Commons. To accomplish this many steps have to be taken. Many of these steps can already be taken and will indicate why this project is done and, what its benefits are.

Take for instance Mr Daniel Havell. He is an English engraver born in  Reading. There is no Wikipedia article about him but there is information about him in Wikidata. It includes all the information that is in his "Creator" template and the category about him on Commons.

Having such information for all the "Creators" on Wikidata is easy and obvious. Having all those templates refer to Wikidata builds an anticipation of things to come. Next steps are making sure that the information looks good on Wikidata and is informative. Currently the best we can offer is by showing the information in Reasonator.

Using tools like Reasonator for now establishes that the WMF and the Wikidata team appreciates all the efforts that promote the use of Wikidata and accepts it as indicative of the type of information it will have to bring.

This can all be done today. No waiting is necessary and it makes data from Commons available in multiple languages. This is Mr Havell in Russian. Bringing the benefits of Wikidata to Commons today helps. It brings awareness to our public of the inherent benefits. It allows them to comment and get involved slowly but surely. It will prevent a "big bang" announcement of this is "it",take it or leave it. It will even bring more information in more languages to Commons sooner rather than later.

Sunday, October 12, 2014

#MediaWiki is about sharing the sum of all #knowledge

The organisational structure of the Wikimedia Foundation has been completed with the hiring of Mr Damon Sicore. In his first IRC #Wikimedia-Office chat the ugly head of Wikipedia centrism was found to be alive and well.

Mr Sicore made some important statements: "The most urgent issue seems to be software quality and shipping what we say we are going to ship, on time." and also "this urgency is compounded by the fact that we must be able to compete in mobile".

Wikidata is firmly part of us sharing in the sum of all knowledge and it is increasingly important at that. So far Wikidata was mostly about linking Wikipedia articles about the same subject. Increasingly available data is used in info-boxes. Once the wikidatification of multimedia files happens Wikidata needs to become editable from mobile phones and it needs to be easy and obvious in any and all languages..

Currently it is not easy nor obvious in any language.

This is not to say that it is not possible to make it increasingly easy and obvious in all languages. It is important because it is a requirement when the wikidatification of multi media files is to succeed. This is however only one use case where improved usability of Wikidata is essential for us to continue to share the sum of all the data we have available to us.

Only one challenge for Mr Sicore is the extend Wikidata will make a difference. There are many more he faces. I wish him well because his success is our success.

#Wikidata - the maintenance of #awards

Mrs Kizer died. She won several awards. One of the awards she won was the Robert Frost medal, another award was the Theodore Roethke Memorial Poetry Prize. Two other awards, the John Masefield Memorial Award and the Borestone Award are not linked in the article yet.

The funny thing with awards is that they have a habit of being awarded regularly. This has several consequences;
  • you can predict how many winners there may have been
  • you can predict when the next winner is likely to be known
Given that many awards are not maintained as well as for instance the Nobel Prize or the Pulitzer Prize for Poetry, it should not be that hard to produce something that lists all the awards that have no winner yet for a given year. Wikidata already provides most of the main elements; these are all the awards for instance and it shows how many Wikipedias have an article for them.

By adding a statement about the frequency of the award it becomes [possible to find the awards that were not awarded in a given year. It will stimulate adding awards, it can be the basis for a tool that shows lists of winners on Wikipedias and it would stimulate me to indicate that Mrs Kizer won the Pulitzer Prize for Poetry in 1985.

Thursday, October 09, 2014

#Wikidata - #Statistics are a #data game

The Wikidata statistics are a marvel. They exist in their own little corner of the Wikiverse and rely on the dumps that are regularly produced. When everything is fine, a refresh is generated automatically. Some crazy people find them of interest and go over the numbers trying to understand what is happening. Every now and again, they are amazed or appalled.

Recently the dumps who are available in JSON changed its format in the midst of a dump. The resulting hodge podge of data made the statistics unrealistic. Magnus was on a holiday. Yes, he has a real life, so it took a bit of time before he reasoned his way out of the mess.

It is wonderful that our community has people like Erik Zachte and Magnus Manske. They spend so much time and effort in providing us with meaningful statistics. It is important to remember that they rely on underlying data and it is their skills that ensures that the data remains comparable over time.

NB Currently 56,83% of the Wikidata items have 0, 1 or 2 statements.. :)

Wednesday, October 08, 2014

#Wikidata - Does Mr Ulibarri live and when he does, then what ?

According to the #Portuguese #Wikipedia, Mr Ulibarri died. The date of his demise was given as June 1 2014. It was marked in a category of people who died, then it was picked up by tools and consequently Mr Ulibarri was marked as dead in Wikidata.

According to some, unsourced facts should not be in Wikidata and a Wikipedia is not a source. It is part of a blame game; I was accused of entering wrong information.

I prefer to live by the motto that I am proud of the mistakes I make; they prove that I am productive. Realistically, Wikidata has hardly any sources when you remove all the Wikipedias from the equation. Errors will be included all the time by me and by countless others. There is no helping that.

For those Wikipedias who expect sources for all statements; tough. It won't happen any time soon. The best that can be expected is that comparisons are made. Differences will be found in that way and they can be fixed where needed. In the case of Mr Ulibarri it is suggested that it is a case of mistaken identities. A Mr Marinho Chagas died, he was also a soccer star.  Mr Ulibarri's full name however is Mario Peres Ulibarri, he is also known as Marinho Peres.

An unanimous user edited the Portuguese Wikipedia and made Mr Ulibarri live again. It was commented that there are no sources for his demise. I am happy for Mr Ulibarri that it turned out all right for him.

Sunday, October 05, 2014

#Wikipedia - Ümit Yaşar Toprak commander of al #Nusra and #NPOV

The "Neutral Point of View" is one of the guiding principles of Wikipedia. In science it is defined as:
the concept of a position formed without incorporating one's own prejudice
According to the article about him, Mr Toprak died in an air strike inside of Syria. The problem with the article however is in several of the categorisations; 20th-century criminals, 21st-century criminals, War crimes committed by Islamist militant groups. They imply that Mr Toprak was both a criminal and that he personally was responsible for war crimes.The article does not support this in any way.

There is no need to appreciate Mr Toprak but the argument to include him in such categories are obviously partisan. As these claims are not supported in the text, it makes Wikipedia partisan as a consequence. It undermines the Wikipedia validity as a source for this conflict and it removes the legitimacy of NPOV claims in other domains as well.

Thursday, October 02, 2014

#Wikidata - Francesca Morvillo assassinated by the #maffia

Mr Morvillo and her husband were assassinated by the mafia in 1992. It is important to remind ourselves that organisations that consider themselves a law unto themselves are lethal and parasitic by nature.

All kinds of everything are researched and, the "gender divide" is a favourite uncontroversial subject in the Wikimedia world. FYI there are over 1,907,292 men and 352,006 women known to Wikidata. Given the high numbers the ratio between them is likely to remain the same even though there are still some  275,400 known humans to be sexed.

As I was adding men, I found it peculiar that a Mr Morvillo whose first name is Francesca was considered to be a male. A picture was associated with this person on the Italian Wikipedia and the associated text left no doubt: he was a she,

When you are adding information all the time, there are bound to be numerous errors. It was a fluke that I caught on to this one, There is no doubt that by importing information wholesale from the Wikipedias, many factual errors are introduced in Wikidata. That cannot be helped. Comparing information with other sources will indicate likely errors. Such comparisons is how we can ensure a quality that is at least as good as all the rest,