Wednesday, May 25, 2016

#Wikidata - Kerala MLA constituencies

Kerala is one of the states of India and like all the others has its own legislative assembly. Like in Great Britain politicians are elected from constituencies. There are many as you can see on the map.

When there are elections, things change. New people become a representative, some remain a representative and others no longer have relevance in that way. At Wikidata, the current list of people who are "Member of the Kerala Legislative Assembly" is a bit of a mess. There are many items without a name in English, there are people who are only known in English and probably there are a lot of doubles. 

There are even representatives who are known to have an article on the English Wikipedia but do not (yet) have an item. This is all because of this big push to write articles on Indian representatives.

As more work is done for this big push to get the data complete, the data will become more informative. What we hope to achieve is:
  • associate MLA's with constituencies
  • have labels in both English and Malayalam for all of them
  • merge all the possible duplicates
Obviously there is more that might be done. We could add the dates when people became a MLA. This will allow us to create queries that shows who was a MLA at what time. When all this is done for Kerala, there are 28 other Indian states and there are many other countries that could do with a little bit of TLC.

Wednesday, May 04, 2016

#Wikimedia - [[citation needed]]

Our articles in any #Wikipedia can be trusted when an effort has gone into providing sources. Sources or citations are very much needed because help us distinguish fact from fiction. Finding sources exposes an origin and it helps us debunk fiction. The result of this continued effort is content that can be trusted as a sincere attempt to achieve a neutral point of view.

There are very practical problems. Sources are not always easy to find and they do not exist in every language. Sources are often behind a "pay wall” making access to the body of knowledge is very much restricted. Sources, particularly sources on the web do not exist forever. The consequence is that sources are problematic and, not everybody is equally able to help us with sources for the content we have.

When we are to improve the current, unsatisfactory situation we have to address multiple problems.
  • Once sources are lost we rely on the internet archive for an historic view. It has policies that allow for the removal of content and this is often the content that is controversial and removal is often intended to rewrite history. What to do?
  • Access to restricted sources is provided to the privileged few who have access to libraries. The WMF has a program that enables some of our editors access to a few pay-walled sources.
  • When this proves insufficient, it is great to know that  Sci-hub among others provides “illegal” access to any and all sources.
Open access to sources is very much what we as a community care for. One of our own died in the struggle for this access so I do not think we should be deferential to an industry that is despicable. We should teach people how to find sources and ignore licensing as much as possible.

#Wikipedia / #Commons - Brigadeer General Loree K. Sutton

Mrs Sutton is psychiatrist who is a specialist on PTSD. When you read her CV, it is impressive. She no longer works for the US Army, she works for the City of New York.

When you read the article on Wikipedia, you find her picture. It is marked as Public Domain and it is not on Commons. Given that Wikidata is working towards the point where copyright and license information one can only hope that images like this can be easily shared based on the license.

When Commons started, it was intended as a repository that prevented the same file to be uploaded to all the Wikipedias. As such it served its purpose remarkably well. With Wikidata it becomes trivial to share images like the one of Mrs Sutton.

I fear that for some this reads as frightening. It undermines the one thing they love. It actually does not need remove the need for Commons as a platform. Quite the opposite; it will bring new tools to finally leverage all the data on images. It may bring this image of Mrs Sutton to Wikidata for starters.

Saturday, April 23, 2016

#Wikidata - its sex ratio II

In April 2014 I blogged about the sex ration at Wikidata. At the time there were 1,332,383 "humans", 760,616 were male and 154,455 were female. Now in April 2016, the numbers are different: there are 3,135,792 humans, 2,442,444 are male and 466,748 are female.

The percentages were: 57% males, 12% females and 31% unknowns. This time they are 78% male, 15% female and 7% unknown.

Based on these Wikidata numbers, the gap between men and women has substantially increased. On the other hand, the number of humans that were not identified as male or female has substantially decreased.

This does not mean at all that the movement to chip at the gender gap is a bust. Far from it. Numbers only expose realities. What can easily be achieved in Wikidata is more focus on the females in any group. The subject I focus on is mental health and I concentrate on female psychiatrists or psychologists. I add statements for them and add where possible the data from categories to Wikidata. In this way they become better connected, more information becomes available. In this way the subject I care for gains quality and relevance and it is women who benefit most.

Numbers provide an indicator, when numbers are this big they should not have our focus. At best they move glacially. More relevant is to know if they as a group, gain more readers over time. These numbers reflect an increase in quality of articles and data. That is an approach that has potential.

Thursday, April 21, 2016

#Wikidata - #YLE, the #Goldman environmental Prize and the #ArticlePlaceHolder

#YLE is a Finnish public broadcaster that announced that it will use Wikidata to label its articles and news items. This is really cool because it means that they have an interest to supply missing labels in Finnish and as a consequence we actually benefit from them.

So let us consider what we can do to make both their and our life more pleasant.

When something happens that is "notable", for instance the latest announcement of the Goldman environmental Prize awardees, we can add the winners. One of the winners is from Cambodia, It can trigger a request for Mr Leng Ouch's article to be written in Cambodian. We can update lists of award winners of the award. We can link to articles in the Finnish press for each and all of them.

Once more newsagents use Wikidata, new use of labels indicates breaking news or renewed interest. This may help journalists worldwide to stay on top of what is current. This may all happen but the most important benefit is that it ensures that Wikidata remains up to date.

Wednesday, April 13, 2016

#Wikimedia - Jimmy Wales is not a constitutional monarch

Thank you Durova
Having known Jimmy / Jimbo Wales for a long time, I appreciate him for the many things he does. Particularly the many things we do not really hear about. Jimbo either has the ultimate conflict of interest, or is best positioned to do well for the Wikimedia Foundation and its projects.

At the start Jimmy was the founder and financier of Wikipedia and as it became a bigger success, he could no longer afford his hobby. He was apprehensive to let go and slowly but surely handed over more power to what is the board of the Wikimedia Foundation.

His role changed and he became more of an ambassador at large. Jimmy is not a constitutional monarch with an entourage that prevents him from being "political" or personal. I have personally experienced on several occasions where Jimbo was instrumental in bringing people together. It is why I am more than happy to express my happiness that he is who he is and does what he does.

The only question I have for his detractors is: if not Jimmy who else can perform the role that is uniquely his?

Saturday, April 09, 2016

#Wikidata - the Panama Papers

The Panama Papers brings to light how the rich and famous hide their wealth. Arguably this is typically a criminal activity because it prevents them to be liable for their possessions according to the law of the country they live in. Liability comes in many ways; it is recognition what it is you own, what you are taxed and also what conflicts of interests exist. In an interview for Amnesty International, Mr Snowden says it well; "Privacy is for the powerless. Transparency is for the powerful."

The Panama Papers brings much needed transparency and there is one big difference with the unwanted intrusions on their privacy they suffer. They are in the limelight because of their lack of transparency in their actions and the resulting negative effect on society. It would be good when organisations that spy publish any and all of these transgressions. A change of focus like this would ensure a much more equitable society and it is easy to argue that it will make us all more secure.

The English Wikipedia has a list of people who are implicated by these Panama Papers. The information has been included in Wikidata. It is therefore easy to have the information in any Wikipedia. The ListeriaBot will perform updates on a regular basis. The only manual maintenance is including the necessary labels.

Friday, April 08, 2016

#Wikidata - labelling and the ArticlePlaceholder

The ArticlePlaceholder is a Wikimedia extension. It has huge potential to rapidly increase the available information in any and all Wikipedias because Wikidata has more data than any Wikipedia has articles. The big initial issue: How to translate the labels into the language of a Wikipedia.

There are many approaches and the initial one is to concentrate on "red links" first. With "red links" linked to Wikidata items, the ArticlePlaceholder may express the data as information for that Wikipedia. That is one big incentive to write the missing article :). Missing labels associated with properties may be added to localise all the information available on an item

It then becomes interesting; What to do next? There are the categories, the lists associated with articles. It would work, it would rapidly expand the number of items linked through ArticlePlaceholders. It would also rather quickly expand the number of items that seek a translation of their label. These can be found and sorted in order of prevalence.

Another approach would be to add labels as many labels as possible in Wikidata. When this results in terms that can be found, ArticlePlaceholders may be created when they are requested. When we keep track of the requests for ArticlePlaceholders, we can even prioritise the writing of missing articles.

Adding labels can be done in multiple ways; we can use dictionaries, we can transliterate. We can even use bots to do all this automagically for us. The biggest difference will be made when we seek and find collaboration. For most languages there are schools where students need to write papers. Particularly in the smaller projects this may make a huge difference. As they research their topic, they can localise all the missing labels starting from a "Concept cloud" for a subject.

Once this work gets underway, small Wikipedias will rapidly increase the number of subjects that are being served. The big question is not if it will be worthwhile but how it will affect Wikipedia article writing.