Thursday, May 28, 2020

@WikiCommons - Sarah T. Roberts versus Sarah T. Roberts

I have a renewed interest in Commons because the first steps have been made to make it actually useful. According to Wikidata there are two distinct Sarah T. Roberts. One is an epidemiologist the other is into information & media studies.

At Commons it was a mess, the picture of Sarah was used to illustrate an info box of the other Sarah. It is not that interesting to tell you how I did what. Relevant is that I did. I did because you will will find things when there is a label for whatever in "your" language..

Given that we do not research the use of Commons or Wikidata for that matter, why should the WMF give priority to opening up Commons even further? After all, there is no data to support it..

Tuesday, May 26, 2020

@WikiCommons - Meanwhile in a school in India, Japan, Russia

These students in India have to do a project. The subject is Botswana. Their teacher wants them to find many pictures so he searched Wikimedia Commons among others for pictures of  Mokgweetsi Masisi, the president of Botswana. He marked the pictures that depicts Mr Masisi and now his pupils will find more pictures of him when they look for मोकेगसेसी मासी.

At the same time in Japan students have to do a project about Botswana. Their teacher is pleasantly surprised when he find so many pictures for モクウィツィ・マシシ...

Monday, May 25, 2020

@WikiCommons - meanwhile in a different universe

And again there was a discussion that it should not be this hard to find pictures in Commons. The big difference this time is that there is now a wealth of images that have been tagged for what they "depict". They are linked to Wikidata items and they have a wealth of labels in many, many languages. In essence it has always been an objective of Wikidata to share its content in any and all of the 300+ languages supported by a Wikipedia.

The ideas that floated around soon made it into a "proof of concept" and as so often it actually worked after a fashion. The first iteration was in true Wikimedia tradition English only. The proof of concept got its second language in Dutch, Hay Kranen the developer is Dutch. Now there are nine languages and we are waiting for French to be the tenth.

So what does it do. You can look for pictures in Commons, it has 61 million media files, and when you are looking for available pictures in your language, you will find it as long as Wikidata has a label in your language.  This is for instance a result in Japanese and this is the result in German.

What can you do to make it better? Add labels in your language for the things you want to find and find media files that depicts what you are looking for. When nobody translated the software in your language, you can even do that.

Why is this so relevant? Have you ever wondered how many pictures you find in one of the smaller languages using Google or Bing? Let me tell you, it is disappointing to be polite. Commons is the repository of the mediafiles that illustrate all the Wikipedias so yes, it covers "almost anything".

The Wikimedia Foundation has this big strategy for its movement to be inclusive. This is a wonderful opportunity to show how agile it is, that it understands and supports a need that has been expressed for many many years. The beauty is the the way forward has been expressed in something that already works.

ABSOLUTELY, there will be challenges in integrating this functionality where it fulfills a need.

Luckily it is not necessary for it all to be done in one go. The first step can be as little as to take the "proof of concept" an rewrite it in the preferred language of the WMF, internationalise and localise it and keep it stand alone for now. The people who know about it will use it and they will be the first to point out what more they want to be done. A priority will be to retain its KISSable nature.

The objective is to open up Commons. Open it up in any and all languages. For me it is obvious. I will gladly give it my attention in the expectation that both Wikidata and Commons actually find a public, have a purpose that is more than what we do for ourselves.

Sunday, May 03, 2020

These scientists saw the coronavirus coming. Now they're trying to stop the next pandemic before it starts.

When you read an article with the same title as this blog post, it is one among many clamoring for attention. There is so much that can be qualified as not worth your time. In this blogpost I describe my way of adding value for articles that I think are worthwhile.

What I do is look for people in the article. In this article it is a Jonathan Epstein. The first thing is to look for Jonathan in Wikidata. Disambiguation is the name of the game and, finding candidates who might be Jonathan is the first step. Jonathan proved to be Jonathan H Epstein, there was also a Jonathan H. Epstein. Because of sharing characteristics they could be merged. Vital in this are authority identifiers and links to papers that make it reasonable to assume that they are the same person. It is helpful when Jonathan is part of the disambiguation list when people look for "Jonathan Epstein" so it is added as an alias.

The next step is to enrich the data about Jonathan P.. Authorities may identify where he works and from the website of Columbia university additional information is digested into Wikidata statements, information like the alma maters. In Wikidata many authors are only known as "author name strings", meaning they are only known as text. With available tooling, papers are linked to Q88406948, the identifier for our Jonathan.

After these steps, there is a reasonable impression of the relevance of Jonathan as a scholar and this supports the likelihood that the article that cites him can be trusted. Do this for others presented as authorities in an article and by repeating the process you provide a way for Wikidata to become a source that helps identify fake news.