Thursday, December 28, 2017

#Wikidata - Cyrus J. Colter and G. B. Lancaster - #diversity

Cyrus J. Colter was inducted on the Black Literary Hall Of Fame in 1999. He has a Wikipedia article in Japanese and his Wikidata item has been expanded with a link to Open Library and VIAF

According to a tweet, G. B. Lancaster was once one of New Zealand's most popular writers. When you google her, you find that G.B. Lancaster is a pseudonym for Edith Joan Lyttleton. For Mrs Lyttleton there is now a link to Open Library as well.

From a diversity point of view, both Mr Colter and Mrs Lyttleton represent minorities. Giving attention to either increases the diversity of Wikidata. Linking both authors to the Open Library has the most inclusive effect. There is now a bigger public for the books they have written.

Monday, December 25, 2017

#Wikidata - #Reebok Human Rights award

Once upon a time, there was a company called Reebok who presented an award to human rights activists under the age of 30. Every year four or five people received $50,000.--. Every year attention was given to human rights. Important enough because an award like this gives additional relevance and resonance for an extremely good cause. It may even provide some extra protection by making people more visible.

The award is no longer presented. Some people who were recognised  refused the award because in their opinion Reebok itself should take care about its human right record. Some people took actions and they were successful; the last award ceremony was in 2007. That is all; a lot less attention for human rights, defeat in victory.

The best information about the Reebok Human Rights Award is at the Internet Archive's Wayback machine.  Nothing wrong with the credentials as stated of the people who were awarded.. When you compare this with the linked people at Wikipedia, you will miss the Chilean soccer player, the Nigerian business magnate and find what are Wikipedia red links.

Sunday, December 24, 2017

#Wikimedia - #diversity and #inclusion requires #trust

All the Wikimedia projects have their culture. Each project has its own culture and there is this overall stated ambition that diversity and the inclusion that is needed to make it work is alive and well.

We have our diversity conferences and the best result is how the "gender gap" is approached. It gets a lot of attention and the positive effects are noticeable. There is however more to diversity and some of the beliefs we hold so dear prevent the inclusion from those that are at the outside looking in.

One of the Wikimedia traditions is that we do not trust; trust is in the citations, the sources. We do not trust each other, why should we? When for diversities sake, people who receive the "Harriët Freezerring" are added, it is accepted because there is a Wikipedia article that mentions them.. But when people are added because they are "artists from the African diaspora" there is a problem. There are no articles yet and the point of adding the artists first is because Wikidata enables managing projects in multiple languages.

There are many people who are targetted for attention in Wikipedia editathons. There have been editathons in the past so there is an established track record for the Black Lunch Table. That did not bring trust, the trust needed to accept that the BLT will manage the people on the list. The trust that bare boned items will get sufficient statements eventually.

The problem with trust is that when it is not given, it can not be assumed for other, similar situations either. The trust that retractions from scientific papers will be included so that we know what Wikipedia articles are inherently wrong. Retractions are absent at this time and while I trust the people involved in the inclusion of citations, why trust at all when equally worthy causes are not trusted? Why include all these scientific papers without similar quality control?

#Belief - Black Pete, Rudolph and Christmas

It is Christmas time and a good time is had by all. As everone knows; Santaclaus has reindeer and one of them has a red nose. The notion of Santa is based on a Dutch tradition "Sinterklaas" and everybody knows that he arrives by steam boat from Madrid accompanied by "Zwarte Piet" and comes loaded with presents for all the children who have been good. It is all part of winter celebrations, Santaclaus is firmly associated with Christmas and it is well documented that Jezus was not born on this day so many centuries ago.

When you start to evaluate belief and find things to criticise you can and may do so. However, it is easily understood why this is not appreciated at all. People want to believe in a Jezus that did not look at all like how it is usually depicted. The fact that Santa comes from the North pole found a lot of cheer thanks to Norad and I read an amusing story that Rudolphs red nose is due to bioluminism. 

There is a lack of appreciation for "black pete" as some consider it an example of "Blackface". Do read the Wikipedia article, its origin is in a USA when slavery was alive and well. The Netherlands has a different culture; Zwarte Piet is clothed in seventeenth century garb he brings presents through the chimney but is always spotless. He is a smart, hardworking guy and only thanks to Zwarte Piet Sinterklaas can bring presents to all the children of the Netherlands. Remember, the Netherlands were Spanish until the seventeenth century.

Like in any belief system; those who truly belief benefit the most. When children are of an age when they will start to suspect that Sinterklaas is a ruse, they will be informed about the awful truth. When they no longer belief, when they are "gortig", they are expected to make surprises for their peers. They may share in the fun of a truly Dutch tradition. For those who object, the German term "hineininterpretieren" fits the application of blackface to Zwarte Piet.

Monday, December 18, 2017

Francesco Redi and the #BHL - A purpose for #Wikisource

Mr Redi is of a stature that his statue is in the Uffizi Gallery. His books are available in the Internet Archive, thanks to the Biodiversity Heritage Library.  Wikisourcers waved their magic on several of his books and the result is a superior output for for instance "Esperienze intorno alla generazione degl'insetti".

Mr Redi has four books who received the Wikisource treatment.. and then what? In response to a tweet, I was told of the existence of these books in Wikisource. I checked them on Wikidata and added Mr Redi as the author. There is nothing to indicate in Wikisource where the book came from (the BHL provided them with a DOI).

In a tweet, the BHL indicated that they are interested in books that received the Wikisource treatment. So lets consider where we are:
  • Wikisource has many great books transcribed and available as an ebook
  • It is not known outside of Wikisource what books are available in what quality and where they came from
  • We could have this information in Wikidata. It will give a clue what is available; we can query for the books when they are in Wikidata
  • What is the purpose of Wikisource if it is not for people to read all these fine books?

Thursday, December 14, 2017

A purposeful #strategy for #Wikidata

A strategy for Wikidata? Obvious, it is all about having a purpose. It is not about policies, it is not about what we need or expect of others but it is about the purpose you, I and others have for us to collaborate on in an inclusive Wiki and data project.

The implication of making the purposes of our community rule supreme are huge. Purpose like so many other things can be measured. When people have a purpose for Wikidata and actually use it, their need for quality is self evident. They will invest their time and effort in fulfilling their purpose. The one question is how to fit in the many purposes that exist for Wikidata.

Take for instance the objective of Lsjbot for a rich Wikipedia in the Cebuano language. He uses data from an external database to create articles. Data from these articles are imported later through the Cebuano Wikipedia in Wikidata. This is seen by some as controversial because of the need to integrate data that often already exists. The purpose is obvious; rich information in the Cebuano language. The solution is obvious as well; let Lsjbot use the data at Wikidata to generate the information for the Cebuano Wikipedia. GeoNames is happy to collaborate with us on this, so when we care to collaborate and welcome its data at the front door, we can mix'n'match the data into Wikidata, curate the data where necessary and share improved quality widely, not only on the ceb.wp.

The Biodiversity Heritage Library Consortium is working extremely hard to expose their work to the general public. Over a million illustration found their way to Flickr. Fae imported many of these to Commons and most if not all the associated publications can be read on the Internet Archive or on its website. Their content is awesome, check for instance their Twitter account. We can import all the BHL books in Wikidata, we are importing all associated authors using Mix'n'Match. The images are in Commons but how is this brought together? How do we add value for the BHL and as important, for our shared public?

The Internet Archive is a Wikimedia partner. It provides essential services for us with its "Wayback machine". It is how we can still refer to references that used to be online. One other venture of the Internet Archive is its Open Library.  What we already do for the Open Library is linking their authors and by inference books to the libraries of the world through VIAF. We could share this information with the Wikipedias so that its readers may find books they can read. (Talk about sharing the sum of all knowledge).

Both the IA and the BHL want people to read. They (also) provide scientific publications that may be read to prove the points Wikipedia authors make in articles. Both can be big players strengthening the value of citations in WikiCite. At this time its strength is particularly in the biomedical field and it is already attracting bright people to Wikidata. As data from other fields finds its way, people like Egon and Siobhan will find their way. This will make Wikidata even more inclusive.

To make this future work, to become more inclusive, we should trust people more particularly when they indicate why they use Wikidata. The Black Lunch Table is a great example. The description at Wikidata says: "visual artists of the African diaspora initiative that includes Wikipedia editathons and outreach". One way of knowing how effective this initiative is is the history page of its listeria list. It shows a steady growth of information added. When you analyse it further you find artists added and selected for new editathons. Truly a great example of Wikidata having a purpose.

A strategy based on purpose, is a strategy based on trust. Not blind trust, but the kind of trust where it is seen that people are committed to improve both quantity, quality and usefulness of the data they identify with.

Sunday, December 10, 2017

When #Wikidata is good for something

When #Wikidata is good for something, it shines. It does not take much prodding to find people to improve on what it does so well and consequently when Wikidata is useful, quality follows easily.

The promise of  a useful Wikidata was delivered at its start by having it replace the native interwiki links of Wikipedia. Within a month the quality of Wikipedia links had improved dramatically and at this time corner cases are still worked improving quality even more.

The WikiCite project is really important in many respects and it has so much more to offer. It is useful because it brings many initiatives and projects together under one roof. It is why scientific papers are included, including its authors. We find that more and more authors are included as well and they are often linked to the ORCID, VIAF and other external identifiers of this world. This has great value because it allows Wikipedia articles and information maintained elsewhere to be linked. What it can be used for is limitless. End users will find new and interesting ways to use the data and make it into information.

When Wikidata is to be good for Wikimedia projects, this information brought to Wikidata because of WikiCite has great potential. It largely reflects the citations in all the Wikipedias and consequently through linked so external sources we could know what sources are problematic, retracted or bought by interested parties. We could, we don't. When we did, we would provide weight against propaganda and fake news.

The big thing holding us back is trust. Wikipedians need to consider a Wikidata that is not only used for links and that can be trusted for high level maintenance of its citations. Wikidata is to appreciate its use and trust that its information will be used and that this will increase its value and quality. WikiCiters have to understand that Wikidata is not a stamp collection only including publication data. It must include information about retractions, about papers considered problematic for political or scientific reasons (or both).

When Wikidata is to be good for something; we should expand our collaboration with Cochrane, Retraction Watch and organisations like it. There is everything to gain; quality, contributors and relevance.

Saturday, December 09, 2017

#Wikipedia #NPOV - When there is no neutral point of view

Mr Jacobson, a climatologists at Stanford University wrote a paper. Its findings were disputed in another paper. Jacobson maintains that the USA can be served for its energy needs exclusively with green energy. The contrarians have it that there must be a mix of conventional and green energy.

There are several issues with the latter paper; it is a paper supported by the conventional energy industry. The result of the paper are in the best interest of this energy and the paper is considered by many not to be the result of a scientific process. So much so that Jacobson went to court.

There is a big difference with an opinion piece and a scientific paper. The critique of the contrarians is that Mr Jacobson does not consider nuclear, fuel and bio fuel solutions at all. They argue that it could make the transition more difficult or expensive. But that is not the point. The point is that you can and, the point is that green energy is getting cheaper.

When a paper is bought by industry and the premise of the original paper is ignored, it is no longer scientific but becomes an opinion piece. Mr Jacobson is not the first predicting the demise of "big" energy, Greepeace has been doing it for decades..

There is no middle ground. It is why Mr Jacobson is going to court because the paper of the contrarians only serves one purpose; postponing the inevitable. It is not a scientific critique in any acceptable way.

Tuesday, November 28, 2017

#Wikidata - Disambiguating for the Biodiversity Heritage Library

Tatiana Carneiro is an entymologist. Her work is known at the Biodiversity Heritage Library. When you check the "authors page", there are two other identifiers known and, for "Tatiana R. Carneiro" the same two identifiers are shown as well.

When you google for Mrs Carneiro all kinds of information may be found but you do not want to do this for all the 177,271 BHL authors that are waiting in Mix'n'Match. It is no fun and only a few people take up a task like this.

So the question is; how do we make it more rewarding and how do we bring the many Brazilian papers to Wikidata as well. What is it that there is to achieve and how does it benefit all the people reading Wikimedia content.

For readers of our content, there is little merit in the fact that all these authors published papers. Many of them have been published with a DOI and, many of these papers are freely available to read. For them the papers are important. So contrary to a more normal database approach it is not the authors we should concentrate on but it is their publications. In addition to this, the BHL actively promotes the use of illustrations and publish them on Flickr. Thanks to the fine work of people like Fae these illustrations end up on Commons as well. It will be a challenge to link them to all this metadata..

There are millions of illustrations, there are far fewer publications and many authors are known for not one but multiple publications. To complicate it even further, an illustration has an illustrator and many publications are exclusively found in archives. Many publishers are no longer active and all this information is or may be considered relevant.

So what to do; first import all the publications that are freely readable. The publications with a DOI and include the author information as "author name string". When an author is known to Wikidata, we can always add the author information as well. The benefit of this approach? People can read now.

To make it interesting we can run a bot using the APIs of the BHL. We add missing books for authors and add the authors to the books where this information is missing. Running this regularly will make it interesting for anyone interested in the work of the BHL. But most importantly, people can read now.

Sunday, November 26, 2017

#Wikidata - a "sand engine" for the UK?

The Netherlands are largely below sea level and given climate change, keeping our feet dry is not at all obvious. There are many ways to defend a coast line and a "sand engine" or "sand motor" is one.

According to EcoShape, there will be a sand motor protecting the Bacton Gas Terminal and the surrounding Norfolk coast.

This does have an impact on the existing Wikipedia articles about the sand motor. It is no longer about the Netherlands and as there are more coastal areas that need protection, more sand motors are to be expected. For Wikidata, all the sand motors can have their own item. It will become possible to query where they are, where they are planned and what areas are protected in this way. The original sand motor will have its own place in history but it will not be unique. That is good.

#Wikidata - I fucking love #science

When I am on Facebook, information from "I Fucking love Science" is always a nice read. Mrs Andrew received the Stamford Raffles Award. It is why I found her article.

When you read the article, it is heavy on Mrs Andrew's problem of being taking seriously because she is a woman and there is also a lot about the accusations of plagiarism. The problem is that plagiarism and the unlicensed use of intellectual property are quite distinct. Given that IFLS is about reporting on science, there should be no argument; they do not claim the ideas they report on as their own..

It is better to split the information about Mrs Andrew and IFLS; it brings clarity and it invites additional information about the reach of IFLS and the reason why she was awarded the Stamford Raffles Award.

Sunday, November 19, 2017

#Wikidata vs #Wikipedia - Rukmini Maria Callimachi

Mrs Callimachi did not only win the Polk Award, she is both a journalist and a poet and did not only win journalism awards. One of the awards, the Michael Kelly Award is hidden on the Wikipedia article of Michael Kelly

This article is about how Wikidata and English Wikipedia can help each other. The Wikipedia article lists seven awards and this makes it easy to add other award winners for them as well.

Thanks to Magnus' awarder, this is fairly easy but some awards hide out as part of an article and the award has to be added in Wikidata.  It may be one reason why later awards are missing. The religious award she is said to have won, it is a different award with a similar name. The award and the organisation that confers it had to be created.

The point, we can compare data at a Wikipedia with what we have on Wikidata. They should match. When they do not, there is an issue. Copying the data from Wikipedia is easy and it is the obvious thing to do. When Wikipedians decry the quality of Wikidata, they should reflect on why this is the case. When we collaborate, we will slowly but surely improve our quality. In the final analysis our aim is the same; share in the sum of all knowledge.

Saturday, November 18, 2017

#Wikipedia vs #Wikidata - the George Polk Awards

Some Wikipedians consider Wikidata inferior, so much so that they agitate towards a policy that bans Wikidata in "their" Wikipedia. They are welcome to their opinion.

I do bulk imports from Wikipedia and all the time I suffer the consequences. Some three to four percent of their data is wrong for all kinds of reasons, reasons that are manageable with proper tooling.

The George Polk Award is an award for journalism and it got my attention again because the International Consortium of Investigative Journalists received it for their work on the Panama Papers. I noticed that many people listed who had been awarded the Polk Award did not have articles in Wikipedia, that many of the link in the list of award winners pointed to the wrong person and that many award winners did not even have a "red link".

I am in the process of checking all the links and adding the date for the award. I found many issues among them a civil war general and many others false friends. I am adding items for the people who do not have an English article and, I have to check each of them because several do have articles in other languages. It is a lot of work and it is not as useful as it could be because Wikipedia hates Wikidata and we do not collaborate, we do not work together.

There is a Listeria list of winners and slowly but surely it will contains the information that is similar to the English Wikipedia list article. Similar but not the same;
  • the false friends will not be there, 
  • there will be no red or black links
  • people who won the award twice will be missing
Why do this, why spend so much time on one big list? Well, in this day and age of "fake news" we should celebrate journalism but having all this information in Wikidata allows for all kinds of tools as well. We can check for false friends, we can check if the articles on the award winners include the award but also if there are "winners" who are not known in this list and in the source available for the George Polk winners..

I am not a Wikipedian and truthfully I hate the endless and senseless bickering that is going on. So let me work on the data, make it available to tools. Now you Wikipedians, you may choose not to show Wikidata data in your infoboxes but you will not make your errors go away without collaboration. Yes, you can quote a source but when your data is not in line with what the source states, having a source does not do you good, effectively you provide fake information.

My request to the reasonable people at Wikipedia and Wikidata, let us work together and see how we can improve quality. Lets link wiki links (blue, red and black) to Wikidata and improve the quality of what is on offer first.

Thursday, November 16, 2017

#Wikidata - women in red - May Wright Sewall

On Twitter, it was mentioned that archival material of Mrs May Wright Sewall was being worked on. When you read the Wikipedia article, it becomes all too obvious how notable she was. She founded multiple organisations and was known for her suffragist ideas.

The article introduces these organisatons and consequently to indicate the relations, new items have to be created in Wikidata. I only did two and I added her husbands, men that supported her in her undertakings.

By adding these new organisations, it becomes possible to link more people to them. They thereby gain notability and it becomes more likely that at some stage they will get their article as well. The least new people and organisations added in Wikidata do is complete the tapestry of information of an age gone by.

Wednesday, November 15, 2017

#Wikipedia - #Retraction exposing big issues in #science

When a scientific paper is published, it is read and cited by other scientists to further on science. It is read and cited by Wikimedians to write articles and share the sum of all knowledge. The Wikicite project provides better tooling for using these papers as a source in Wikipedia articles, it is one of the more relevant developments in combatting fake news in Wikipedia.

However.. there is an issue with a substantial number of papers; they were retracted. There are all kinds of reasons possible but the bottom line is; they are not to be used as a source in Wikipedia because its findings are false.

The challenge: what papers are retracted, how are retractions and the reasons for retractions modelled and how will we find these papers in the Wikipedia sources. Knowing retractions and acting on them will be a fine art; one publisher in South Africa for instance was pressed to retract a book exposing the president. There will be so many issues exposed once retractions become part of the Wikipedia work flow. Failing to do so will be the worst we can do. We will not be sharing the sum of all knowledge, we will be sharing the sum of what we are told.

Thursday, November 09, 2017

Judith Butler in #Brazil - a reaction in the #Wiki way

When the news has it that an effigy is burned of Mrs Judith Butler in Brazil, it is time to give some attention to Mrs Butler. There is information about her, papers she published and one way of adding to the relevance of Mr Butler is by increasing the people she is connected to.

In 2012 she was awarded the Lyssenko award. Adding that date and the other award winners works in two ways; Mrs Butler is better connected but the other award winners are better connected as well.

There is an article for Mrs Butler in English Wikipedia but given that it is a French think tank who conferred this award, chances are that not everyone on this list has an English article. There are projects that suggest articles to write.. Adding awards in this way may feed those projects. I hope so. For me that would be the best outcome that could be achieved.

#Wikipedia - Ischia International Journalism Award & the Polk Award

When people win awards, they often win multiple awards. Harrison Salisbury won several awards not only the Polk Award. The Ischia award did not have a date associated with it. I used Awarder and the data from the Italian Wikipedia because that was most convenient.

There was no article for Mr Salisbury in Italian and consequently there was no date associated with him. Mr Salisbury is represented with a red link. It indicated 1990 and it was an easy manual edit.

As you can imagine, that red link could link to the information about Mr Salisbury on Wikidata. Showing this information to those who are interested in writing a Wikipedia article in Italian does provide pertinent information, information that should coincide with the new article. By comparing the information in Wikidata and in existing Wikipedia articles you know that the article is likely to be correct.

Wednesday, November 08, 2017

#Wikidata as a Wiki versus the data consumers’ perspective

Wikidata is a Wiki. It follows that many people with many agenda's add data to Wikidata. It is a continuous process and as is usual in a Wiki, all contributions that fit the notability requirements of the project are welcome.

The consumers' perspective seen from a Wiki point of view is a bit awkward. There is nothing but active contributors that work towards any of the quality considerations. Even when there is a reasonable quality for some, it may not be enough for others.

Both Wikipedia and Wikidata are Wikis. Both have issues from a consumers' perspective. They are already explicitly integrated through the interwiki links and implicitly through the Wiki links. One of Magnus's tools makes this visible.

When you then consider George Polk and the George Polk Award it becomes obvious that Wikis have an issue from a data consumer's perspective. In some Wikipedia articles the two are conflated. In others there is a separate list of award winners. Many of the award winners do not have an article and some of the award winners refer to the wrong person. Wikidata could do with more data; the data was imported from Wikipedia and several of the wrong persons are still wrong in Wikidata.

Both Wikipedia and Wikidata consume each others data. Both are Wikis. There is no superiority in either project but they could compare their data and curate the differences.

Tuesday, November 07, 2017

#Wikipedia; Héctor Rondón did not win the #Polk Award

This is Héctor Rondón, he pitches for the Cubs. He did not win the George Polk awardHéctor Rondón Lovera did.

This is a common mistake, it happens all the time and it is where Wikidata may make a positive difference to Wikipedia.. It just requires a different mindset to see why this is the right solution at this time. There are some loud Wikipedians that abhor Wikidata. This is an easy and obvious method that will improve Wikipedia and there is no sane argument why this would not work.

These Wikipedians do not even have to notice that this is done; we can hide it from them and still do a world of good. Not just for English Wikipedia but for all Wikipedias.. Ehm, for the readers of all Wikipedias.

Sunday, November 05, 2017

#Wikidata - There is no such thing as a free lunch

Mrs Adriane Fugh-Berman wrote a paper called "Why lunch matters: Assessing physicians' perceptions about industry relationships". There is no such thing as a free lunch and arguably this is exactly what Wikidata is offering to the bio-medical industry.

All the bio-medical papers find their home in Wikidata and there is no mechanism, there is nothing to indicate the many erroneous papers, there is nothing to indicate that specific substances have been banned from use as a medical substance. When Wikipedia is to use Wikidata for information it will be so bad.

Mr Martin Keller is a psychatrist whose reputation was for sale. "His" paper Efficacy of paroxetine in the treatment of adolescent major depression: a randomized, controlled trial has been thoroughly debunked.

At Wikidata there seems to be the notion that facts like this are an affront to its neutrality. It is why there is no mention on the item for Mr Keller; "significant event" "ghostwriting author" was removed.

The problem is that without sufficient debunking potential for ghostwriting authors, their products and their ill effect, there is no possibility to establish the veracity of the bio-medical facts that have been imported in Wikidata. It is vital to the integrity of the Wikidata project that the Mr Kellers of this world are seen for what they are: frauds.

#Wikimedia - I endorse having a #strategy as it is good to have one

Having a strategy is great. There are objectives and there is an idea how to get there. As the Wikimedia Foundation formulates its strategy, it is complicated. Complicated by necessity because it involves so many interests, people who invested so much of themselves in their project(s), people who speak so many different languages, languages that define them, people with different backgrounds because they define them as well. The strategy must be complicated because it aims to reconcile all these people and the organisations that represent them.

When you are a Wikimedian, it helps when your vision coincides with the vision implicit in this big strategy. I was asked to present at the Wikmedia Nederland conference; I presented a historic view on information gathering and sharing. The presentation was given in English because it was the one common language in the room.

I love presentations but talking with people I love even more. I was asked for stategies behind the things that I do, the things I value. The Luc Hoffman award is an example. It does not have a Wikipedia article but the subject, the science is of real relevance in this time of climate change. The idea of associating links (blue, red and black)  is a non confrontational way to bring Wikidata value to Wikipedia. Adding all the USAmerican alumni from en.wp categories will allow us to keep up with what they hold and know about even more USAmerican alumni. There is method behind the madness.

Now that the Wikimedia strategy goes to the next phase; I hope for many user stories; stories explaining what we are going to do and for whom. I also hope that technical considerations will not prevent innovation and improvements. In the end that is not what a strategy is. It is the hope for the bright future we deserve in our Wikimedia movement.

Tuesday, October 24, 2017

#Wikipedia - Student or Athlete or both ?

College football, soccer, basketball, whatever is a USA phenomenon where young people attent a college or a university on a sports scholarship.

Wikipedia has categories for the different sports and members of such teams are these categories are typically a subcategory for the alumni of a college or university.

For Wikidata the alumni are typically harvested from the specific alumni catalogs and as a consequence it is as if all the athletes did not have an education.

My question, how can we best associate these college/university categories with the alumni categories?

Sunday, October 22, 2017

#Katherine - When the Cebuano issue is no longer about #Wikipedia

Dear Katherine, I loved your presentation at the Berkman Klein Center for Internet and Society. It has much to think about and <grin> it is great that you answer the question you want to answer </grin>.

You address questions like "will we let external organisations use our data for their own purposes". My suggestion to you, us all, is why not use our own data for our own purposes.

The Cebuano Wikipedia is seen as problematic on many levels. It is one of the biggest Wikipedias in number of articles and one of the smallest in the size of its community. Like any Wikipedia, its articles are harvested for use in Wikidata and that brings us to several problems but more importantly in the light of your presentation, opportunities.

Problem: the data used the Cebuano articles are based is problematic
Opportunity: import the data in Wikidata first and first do some curation there.

Problem: the data is licensed under a CC-by-sa license and Wikidata is CC-0
Opportunity: collaborate with the copyright holder and ask their permission to include the data in Wikidata

Problem: when text is generated by a bot, the text when saved in an article is fixed
Opportunity: do not save it as an article but generate the text and maybe cache the text

Problem: other organisations use our data to generate information
Opportunity: we generate information in all the 300 languages where Wikipedia does not have an article

Problem: we have information that has no article in any language
Opportunity: we generate the text and maybe cache the text

Problem: Wikimedia officials indicated that issues like the Cebuano Wikipedia are not relevant
No opportunity; opportunities for all our projects are missed

Katherine, we already generate texts using bots, we already cache our data, we do it for English, we do it for Swedish, Cebuano. Why leave it for the companies of our world to generate text where there is already so much? We can do better, do the same and do it for all our languages as well.

Saturday, October 21, 2017

#Wikidata - just an award winner: Mr Shuming Nie

Mr Shuming Nie is the 2007 winner of the Heinrich Emanuel Merck Prize. As such he was notable for inclusion in Wikidata.

A Wikipedia stub article was created. The article makes it plain that Mr Nie was a serial awardee and when you google Mr Nie, you find for instance the picture you see above. Mr Nie is one of many award winners that are "waiting for the recognition" of a Wikipedia article. By having these award winners in Wikidata, it becomes more easy to find people like someone you care for waiting for an article.

Sunday, October 15, 2017

#Wikidata - motivation; thank you #Magnus

I added a Baratunde A. Cola to Wikidata because he won the Alan T. Waterman Award. This month a Wikipedia article was written and I wanted to add some data to the item.

I did not because functionality that is key to me was broken. A new property was added and all the work that I had done on categories no longer showed in Reasonator. There was no willingness to consider the consequential loss of functionality and the result was a dip in my motivation.

Wikidata is important to me and I asked Magnus if he would help out and change Reasonator. He did.

Now I have added information to Mr Cola based on his categories. It matters that a category like this one reflects all the people known to have played in the Vanderbilt Commodores football team.

The issue is that at Wikidata, we have lost sight of these collaborative aspects. Everybody does his own thing and we hardly consider why. It is why user stories are so important; they tell you why something is done and what the benefit is.  In the end without a benefit there is no reason to do it.

Thursday, October 12, 2017

#Wikisource - the proof of the pudding

A user story for Wikisource could be: As Wikisourcerers we transcribe and format books so that our public may read these books electronically.

The proof of the pudding is therefore in the people who actually read the finished books.  To future proof the effort of the Wikisourcerers, it is vital to know all the books that are ready for reading. It is vital to know this for books in any and all languages supported.

There are two issues:
  • The status of the books is not sufficiently maintained in all the Wikisources
  • There is no tool that advertises finished books
To come to a solution, existing information could be maintained in Wikidata for all Wikisources in a similar way as done for badges. With the information in Wikidata a queries can be formulated that shows the books in whatever language, by whatever author.

Currently there are Wikisources that do not register this information at all. This does not prevent us from making the necessary steps towards a queriable solution. After all adding missing badges at a later date only adds to the size of the pudding, not to the proof of the pudding.

Tuesday, October 10, 2017

#Wikipedia discovers #OpenLibrary

On Facebook, Dumisani Ndubane posted his discovery of Open Library:
I just discovered that The Internet Archive has a book loan system, which gives me access up to 5 books for 14 days. So I have a library on my laptop!!! This is awesomest!!!
And it is. Anybody can borrow books from the Open Library (is is part of the Internet Archive). What Dumisani did not know at the time is that there are books in other languages to be found as well.

Dumisani found out by accident; he googled for an ebook called "Heart of darkness" by Joseph Conrad. What Dumisani did not know at the time is that the Open Library includes books in many languages. His next challenge: find the books in Xitsonga, and tell his fellow Wikipedians about it.

Wednesday, October 04, 2017

#Wikimedia - A user story for libraries

The primary user story for libraries is something like: As a library we maintain a collection of publications so that the public may read them in the library or at home .

Whatever else is done, it is to serve this primary purpose. In the English Wikipedia you will find at the bottom for many authors a reference to WorldCat. WorldCat is to entice people to come to their library.

It does not work for me.

My library is in Almere and, I have stated in my profile in WorldCat that I live in Almere, I have indicated that my local library is my favourite. WorldCat indicates that the Peace Palace Library is nearby.. It isn't.

When it does not work for me, it does not work for other people reading Wikipedia articles and consequently it needs to be fixed. So what does it take to fix WorldCat for the Netherlands; for me. WorldCat is used for a wordwide public and all the libraries of the world may benefit when WorldCat gets some TLC.

Monday, October 02, 2017

#Wikipedia - A user story for WikipediaXL: an end to the Cebuano issue

The user story for #Wikimedia is something like: As a Wikimedia community we share the sum of all knowledge so that all people have this available to them. 

As an achievable objective it sucks. The sum of all knowledge is not available to us either. To reflect this, the following is more realistic: As a Wikimedia community we share the sum of all knowledge available to us so that all people have this available to them.

When all people are to be served with the sum of all knowledge that is available to us, it is obvious that what we do serve depends very much on the language people are seeking knowledge in. What we offer is whatever a Wikipedia holds and this is often not nearly enough.

To counter the lack of information, bots add articles on subjects like "all the lakes in Finland". This information is not really helpful for people living in the Philipines but it does add to the sum of available information in Cebuano.

The process is as follows: an external database is selected. A script is created to build text and an infobox for each item in the database. This text is saved as an article in the Wikipedia. From the article information is harvested and it is included in Wikidata. One issue is that when the data is not "good enough", subsequent changes in Wikidata are not reflected in the Wikipedia article.

Turning the process around makes a key difference. An external database is selected. Selected data is merged into Wikidata. This data is used to generate only new article texts that are cached in all languages that have an applicable script. As the quality of the data in Wikidata improves, the cached articles improve.

With Wikipedia extended in this way, WikipediaXL, we become more adept at sharing the sum of our available knowledge. With caching enabled in this way, any language may benefit from all the data in Wikidata. It is considered important to consider the quality of new data. Data may come from a reputable source or from a source we collaborate with on the maintenance of the data. What is to be preferred is for another blogpost.

Saturday, September 30, 2017

#Wikipedia - #Wikidata user stories

User stories are important. They indicate why a certain functionality exists or the purpose of a project. A "user story" has a fixed format:
As a <insert a role> I would like to <insert an activitiy> so that I <insert a purpose>.
One user story is: As a Wikipedia editor, I can link an article to articles in other language(s) so that a Wikipedia reader can find an article in a language he or she can read.

Another user story:  As a Wikidata editor, I can maintain statements on Wikidata items so that Wikipedia readers always have the latest information available to them.

The first user story has been a resounding success. It is why Wikidata was relevant from the start. The second is very much a work in process and it depends very much how the current state of affairs is evaluated. There are dependencies for the efforts of so many to have an effect;
  • Readers of a Wikipedia can only see the result when the information has been included in Wikidata
  • Wikipedia readers will only see the result when the editors of their Wikipedia allow them to see it
The first dependency is with Wikidata editors but the second dependency is outside of the influence of Wikidata editors. For this reason it makes sense to formulate a different user story: As a Wikidata editor I can maintain statements on Wikidata items so that Wikipedia editors can take the responsibility to inform their public.

To help these Wikipedia gatekeepers there is a need for tools that makes them aware of the information they do not provide.

Sunday, September 17, 2017

#Wikimedia and its #BLP approach

There is a huge controversy about the policies about the "Biographies of Living People". Central in all this is that there is no such policy at Wikidata. Many seasoned Wikipedians are of the opinion that using data in Wikipedia is a violation of its BLP policy as a consequence. At the same time there are seasoned Wikidatans who oppose a BLP policy similar to the one at Wikipedia. The problem is that Wikidata does need a BLP policy but it needs to be different for various reasons.

  • An item in Wikidata can be really rudimentary; Marian Latour, a Dutch author, was created because she won an award. This is allowed in Wikidata but the limited information is probably a violation of the English BLP policy. This information came from the Dutch Wikipedia
  • The initial data of Wikidata were the interwiki links. This was a huge improvement for the Wikipedias and there are still many items that have no statements. This is used as an argument not to accept information from Wikidata.
  • Wikidata data is retrieved from a Wikipedia, information like "who won an award". Given the BLP policy of that Wikipedia is should be faultless but it often is not due to disambiguation issues. 
The first issue refers to a red link on the Dutch Wikipedia. When the red link is associated with the Wikidata item, there will not be a new disambiguation issue when a different Marian Latour is introduced. Currently there is only one Marian Latour known to Wikidata.
The second issue is one where Wikidata statistics indicate that slowly but surely is adding statements. They also prove that there is still so much to do...
The third issue is the main one. When an article is linked to Wikidata, articles in other languages should link to the same item or to a red link. Solving these issues requires coexistence and preferably collaboration. 

What we need in a Wikipedia is the ability to link a blue or red link to a Wikidata item. Obviously changing links is either blatantly obvious like for Manuel Echeverria or it requires a source. Technically the necessary change in the MediaWiki software may be "opt in" so that only people who care about this approach to quality make use of it. 

As far as I am concerned, when some Wikipedians find fault elsewhere and do not reflect on this proposal and the improvements it brings them, that is fine. What is relevant is that this approach allows for the best Wikidata practices and at the same time improves the BLP quality in all Wikimedia projects.

Saturday, September 09, 2017

The Manuel Echeverría "revenge"

When there are mistakes in a Wikipedia, it follows that once information is copied from that Wikipedia these mistakes find their way into Wikidata. So Manuel Echeverria did not receive the Xavier Villaurrutia AwardManuel Echeverría did.

So the edit that made Mr Echeverria a recipient of the award was reverted. I fixed things by using the Spanish Wikipedia as a resource instead. The dates were added when people received the award and a few missing people in Wikidata are now known as well.

I cannot be bothered to fix the English Wikipedia. There is no structural solution at this time and as far as I am concerned, there is no interest in one that has been proposed.

There is one additional reason why a solution would be advantageous; reverting edits is a hostile act when edits are made with the best intentions. By actively linking red links and black links to Wikidata, such reversions will become unnecessary.

The problem is that Wikipedians need to understand a problem that as far as they are concerned is elsewhere, and is only caused by the lack of quality of their project. It is with grim satisfaction that I know it serves them well.

Saturday, September 02, 2017

#Wikimedia - Where I make a stand / where I stand for

I was told that my priorities are not the shared priorities of our movement; this by a pivotal person in the WMF. I consider this a personal affront and I will spell out what I stand for and where I make a stand. When you want to personally verify the veracity of my commitment; read my blog and check out my involvement. I have blogged for over 10 years and the basics/citations are all there to find. I consider my position very much in line with what our movement is there for.

==Share in the sum of all knowledge==
This is the overarching aim of our movement. At this time we are congratulating ourselves with what we have achieved so far. There is a lot to celebrate particularly for the English reading world.

===Everything but English===
Given that only 40% of the world population can read English, our successes need to be measured for what we do for all the people in the world. I do not care for good intentions, I care for what can be observed. Financially there is no break down available on the amount spend on English versus the amount spend on all the rest. This is imho a diversity issue as potent as the gender gap. All the arguments why "English first" are structurally no different from any other "my group first" arguments. Just compare the amounts given to US American chapters versus the Indian chapter. In addition you may or may not consider the cost of the software that is developed with English Wikipedia in mind.

===Internationalisation and localisation===
I have searched briefly for "internationalisation" in the 2030 strategy papers. Could not find it. It is however the bedrock of Wikipedia. It is vital for any and all of the individual features of MediaWiki.

When you consider Wikimedia partners like the Internet Archive and their Open Library, we do not even consider how much we will to achieve when together we reach out to the other 60% as well. Our internationalisation platform is open to our open source partners and is in my opinion a strategic resource.

The successes of our GLAM partnerships prove collaboration serves mutual interests. There are plans to improve Commons, a key part is the Wikidatification that will open up Commons, not only in English but also in any and all other languages. Where we could make more of a difference is help where our partners indicate what is relevant to them. We can show them the effect of the cooperation in any language. At this time what we show is limited to images. This is something we should expand on.

====Internet Archive====
The Internet Archive provides a vital service to our Wikipedias. Its Wayback Machine allows us to proof that references that used to be on the Internet existed. Effectively it is an import tool when the aim is to prevent misinformation. Its Open Library has two parts. The part I am interested in is making free e-books available to readers. We would do better when we collaborate just a bit more and help them with their internationalisation and localisation.

The libraries of this world collaborate in the OCLC and share their links in one system; the Virtual International Authority File. In its WorldCat sytem, the idea is that people can find books in the library near to them. Thanks to the references to local libraries, it is always possible to know if a book, an author is known in whatever country. Important is for us to improve cooperation and the visibility of this collaboration for our readers and editors.

===Bringing things together===
I have helped bring data from Wikidata, OCLC and Open Library together. I am seeking the disambiguation of Open Library content using existing links to the Library of Congress to the VIAF and consequently to Wikidata. I am adding award winners because they provide arguments what articles to write or improve. Currently I am adding Dutch literature awards to show the Dutch National Library that this information exists and can be used. Recently I added botanical awards to show a group of botanists how small tasks like this add relevance.

===Outspoken stuff===
  • I am not a Wikipedian and consequently arguments specific to any Wikipedia are problematic, mostly irresponsible.
  • I care about diversity; issues around the gender gap do get extra attention from me but it is a secondary consideration.
  • I care about usability and use Reasonator and tools like Petscan and Awarder. The necessity to use Reasonator for so many years is proof perfect that usability does not have much of a priority. Having seen previous attempts at usability, I will consider it once it is available.
  • I expect that there will be more use for our data. Quality is key and collaboration on a meta scale is what will make this possible.
  • Wikidata is particularly useful in English. Theoretically other languages may profit from its multilingual nature. Institutional (WMF) interest is needed to improve this use of Wikidata. 
  • While I respect many efforts of the WMF, I find that its concentration on English Wikipedia has a very negative effect on a micro scale. It is not all bad but it is this division of labour and money that prevents us from having the most bang for our buck.

PS I resent that I felt the need to write this blogpost.

Sunday, August 27, 2017

#Wikidata - surge of new items

Lately there has been a surge of new items coming into Wikidata. They must be quite good when you consider the number of statements. The items with no statements are mainly part of the original load, the Wikipedia articles, and their number is slowly but surely decreasing (1.35% the last month).

With more items in Wikidata, there is more data to support, to edit. As it is, limits are put on the amount of edits. This can be appreciated because of the current performance problems but it is obvious that as this upward trend continues, more people and more data will come to Wikidata to edit as well as to query.

There is plenty of data waiting in the wings to be added. The big challenge is promoting the data that is of use and will enable more collaboration both with people and with organisations.

Saturday, August 26, 2017

#OpenLibrary - Charles Horn and its other volunteers

There are several reasons why Open Library and Internet Archive deserve attention. They provide downloadable books in many language and their Wayback machine comes to the rescue when links in references in Wikipedia go stale. Have a look at the presentation from Wikimania 2017 (from11:46).

The Internet Archive is officially one of the partners of the Wikimedia Foundation. When you ask who in the Wikimedia Foundation is the goto person for contacts with Internet Archive, there is no answer. It is as if there is no structure in contacts with our partners even when it plays dividends to collaborate in a more structured way. When you consider the "Coleman Boat" it is just as if the macro elements are totally missing and it is left for the micro elements to make the difference.

Macro effects of collaboration with the Open Library would be:
  • references are made to downloadable eBooks from Wikipedia - People read books
  • localisation are made at - People read books in "other" languages 
  • books at Open Library are in Wikidata - links to eBooks are available
  • identifiers are widely shared and widely curated -  work of volunteers has the biggest impact
At a micro level, collaboration is happening. Charles Horn, a volunteer at Open Library is a stellar example. Charles added identifiers to Wikidata and VIAF in the Open Library database. He provided us with a large file of redirects and was instrumental in removing multiple identifiers to Open Library for authors.  He recently produced a Wikidata query to find duplicates and the Wikidata community was made aware of this maintenance work. 

Many of the macro opportunities become possible when conditions at Open Library are met. One big issue is the need for disambiguation and de-duplication. This is not helped with the massive amounts of data involved and the lack of data on the individual author level. While individuals like Charles have an immense effect, it is in the collaboration on a macro level where even bigger differences can be made. Consider; many books include identifiers like an ISBN or a link to the Library of Congress. So it is possible to leverage a tool developed at the Wikimedia Foundation to retrieve associated meta data or to find associated data at the OCLC.

It takes just a bit of friendly prodding from the macro people at the associated organisations, some reassurance that there is support for these efforts and there will be a lot of talent at the micro level making a big difference. Cooperation and coordination is what the organisations are to provide and we will share more of the knowledge that is available to all who come looking.

Sunday, August 20, 2017

#Wikidata - Martin Reints and {{Authority control}}

Martin Reints received the Herman Gorter Award in 1993. There is a Wikipedia article about him and consequently he was known in Wikidata. There was no "authority control" information for Mr Reints in Wikidata yet and this was quickly remedied.

The most interesting part is that the VIAF registration for Mr Reints already included a link to Wikidata. Proof perfect that librarians are actively working on keeping their house in order. There was an Open Library entry for Mr Reints and the Dutch article had a link to the DBNL-website for Dutch language authors.

Open Library I found is very much about books. Their data on the books they have is great; identifiers like ISBN-10 or ISBN-13 and links to the online catalog of the Library of Congress. This makes a lookup at the OCLC for identifiers of all the authors easy and disambiguation becomes more effective.

Wikidata is very much about data. You can query Wikidata for all the winners of the Herman Gorter Award and it the results you can add the links to VIAF or to the Open Library. This ability to query makes all kinds of applications possible like: "what books written by authors who won the Nobel Prize are available in your library?"

Saturday, August 19, 2017

#OpenLibrary and winners of the Herman Gorter Award

If you want to know if the Open Library is of relevance in other languages, you have to do some research. I wanted to find out if there are publications by the authors who won the prestigious Herman Gorter Award?

This award was conferred from 1945 to 2002 often to multiple authors. The first author not known to Open Library is H. C. ten Berge. He received the Herman Gorter award in 1964. There were several authors where Wikidata did not have a link yet for Open Library.

Now consider this: what if we could query Wikidata for all the authors and their publications in Open Library? 

Just a little bit more metadata about books, publications is what we need.. It is not really a big deal, only a few million additional records..

Many if not most of the books at Open Library have links to authorities like the Library of Congress. This makes it possible to link these books through the OCLC to "your library system". It knows about authors and that is what makes it possible to use tools in stead of people to enrich Wikidata and open up all that is in the Open Library for all of us.

Wednesday, August 16, 2017

#Wikipedia - #BlackLunchTable / Brooklyn Hip Hop

The Black Lunch Table project has an editathon on August 20th. It will focus on on important but underrepresented New York Hip Hop/rap artists.

In preparation they have created entries in Wikidata for artists with and without a Wikipedia article. In this way they can prepare information for the editors to use in their articles.

Magnus created a new tool and it shows who edited Wikidata. As a result we can create a query for the edits for the New York Hip hop event for the month of August.

It shows who has been doing all the work.