Tuesday, May 26, 2020

@WikiCommons - Meanwhile in a school in India, Japan, Russia

These students in India have to do a project. The subject is Botswana. Their teacher wants them to find many pictures so he searched Wikimedia Commons among others for pictures of  Mokgweetsi Masisi, the president of Botswana. He marked the pictures that depicts Mr Masisi and now his pupils will find more pictures of him when they look for मोकेगसेसी मासी.

At the same time in Japan students have to do a project about Botswana. Their teacher is pleasantly surprised when he find so many pictures for モクウィツィ・マシシ...

Monday, May 25, 2020

@WikiCommons - meanwhile in a different universe

And again there was a discussion that it should not be this hard to find pictures in Commons. The big difference this time is that there is now a wealth of images that have been tagged for what they "depict". They are linked to Wikidata items and they have a wealth of labels in many, many languages. In essence it has always been an objective of Wikidata to share its content in any and all of the 300+ languages supported by a Wikipedia.

The ideas that floated around soon made it into a "proof of concept" and as so often it actually worked after a fashion. The first iteration was in true Wikimedia tradition English only. The proof of concept got its second language in Dutch, Hay Kranen the developer is Dutch. Now there are nine languages and we are waiting for French to be the tenth.

So what does it do. You can look for pictures in Commons, it has 61 million media files, and when you are looking for available pictures in your language, you will find it as long as Wikidata has a label in your language.  This is for instance a result in Japanese and this is the result in German.

What can you do to make it better? Add labels in your language for the things you want to find and find media files that depicts what you are looking for. When nobody translated the software in your language, you can even do that.

Why is this so relevant? Have you ever wondered how many pictures you find in one of the smaller languages using Google or Bing? Let me tell you, it is disappointing to be polite. Commons is the repository of the mediafiles that illustrate all the Wikipedias so yes, it covers "almost anything".

The Wikimedia Foundation has this big strategy for its movement to be inclusive. This is a wonderful opportunity to show how agile it is, that it understands and supports a need that has been expressed for many many years. The beauty is the the way forward has been expressed in something that already works.

ABSOLUTELY, there will be challenges in integrating this functionality where it fulfills a need.

Luckily it is not necessary for it all to be done in one go. The first step can be as little as to take the "proof of concept" an rewrite it in the preferred language of the WMF, internationalise and localise it and keep it stand alone for now. The people who know about it will use it and they will be the first to point out what more they want to be done. A priority will be to retain its KISSable nature.

The objective is to open up Commons. Open it up in any and all languages. For me it is obvious. I will gladly give it my attention in the expectation that both Wikidata and Commons actually find a public, have a purpose that is more than what we do for ourselves.

Sunday, May 03, 2020

These scientists saw the coronavirus coming. Now they're trying to stop the next pandemic before it starts.

When you read an article with the same title as this blog post, it is one among many clamoring for attention. There is so much that can be qualified as not worth your time. In this blogpost I describe my way of adding value for articles that I think are worthwhile.

What I do is look for people in the article. In this article it is a Jonathan Epstein. The first thing is to look for Jonathan in Wikidata. Disambiguation is the name of the game and, finding candidates who might be Jonathan is the first step. Jonathan proved to be Jonathan H Epstein, there was also a Jonathan H. Epstein. Because of sharing characteristics they could be merged. Vital in this are authority identifiers and links to papers that make it reasonable to assume that they are the same person. It is helpful when Jonathan is part of the disambiguation list when people look for "Jonathan Epstein" so it is added as an alias.

The next step is to enrich the data about Jonathan P.. Authorities may identify where he works and from the website of Columbia university additional information is digested into Wikidata statements, information like the alma maters. In Wikidata many authors are only known as "author name strings", meaning they are only known as text. With available tooling, papers are linked to Q88406948, the identifier for our Jonathan.

After these steps, there is a reasonable impression of the relevance of Jonathan as a scholar and this supports the likelihood that the article that cites him can be trusted. Do this for others presented as authorities in an article and by repeating the process you provide a way for Wikidata to become a source that helps identify fake news.

Sunday, April 19, 2020

@Wikimedia interconnection, what it looks like for me

On twitter a reference was made to an article in the Sunday Times. The article is about the response of the UK government to the COVID-19 pandemic. It mentions many people and mentions their roles.

It is up to you to have your own opinion, but most if not all people are known in Wikidata, some have a Wikipedia article and all of them are in the spotlight. So when you get an edited sound bite, when you want to know if someone is "for real", it helps when you can turn to Wikimedia and find what there is to know.

This sound bite about "herd immunity" is too short to be properly understood. The argument made is that herd immunity is all that we have now that the genie is out of the bottle and, who can argue with that? Read the article as well.. After some tinkering, the Scholia for Prof Edmunds shows some 235 papers, many co-authors and still, even more co-authors are missing. The subjects he covered are extensive.. check out that Scholia. Prof Edmunds takes/tool part in UK government deliberations; it is mentioned in that Sunday Times article. He is asked to explain epidemiology to the public.

Wikimedia interconnection for me is to enrich our existing knowledge in cases like this. Tweeting about it, blogging about it may lead to even more and better information like a Wikipedia article. What we as Wikimedians do does not happen in a vacuum, connecting to what happens and who the players are help us and our readers understand who they are in  these early days of the COVID-19 pandemic.

Monday, April 13, 2020

The CDC and its National Center for Immunization and Respiratory Diseases

Because of the COVID-19 pandemic, there is so much attention to every aspect of it; the epidemiology, virology, vaccination, co-morbidity. Mix it with a heady mix of economics, profiteering and graft and what are you to think of it all. What is fact and what is not.

When I read that there is an "Outbreak Management Team" in the Netherlands, an advisory body to the Dutch government, I had a look. I added all the known scientists to Wikidata, looked for "authority identifiers" and attributed some of the papers that are likely theirs to them. It generated a really nice Scholia for them and the team as well.

At first I wanted to do similar European organisations but it takes quite some effort to find them. So I took the easy route and went for the CDC. Its organisational chart contains a wealth or smaller orgs among the the NCIRD and it has its own organisational chart. I did the same routine, adding the obvious scientists to Wikidata, looked for the authority identifiers for them, attributed papers.

The best bit? While adding people one at a time, you see how the Scholia evolves. Authors are reordered based on their number of papers, you find the ones that are co-authors and colleagues. The latest papers are shown first.. It is nice. However, this is management only, I cannot wait and see it evolve as staff finds its place in the Scholia as well.

Sunday, April 12, 2020

False friends and ListeriaBot - finding a way out of an impasse

ListeriaBot is a bot that maintains lists based on information in Wikidata. In this blogpost I will explain what a Listeria list is, what it is used for. I will point out its qualitative benefits and explain how Listeria can be instrumental to limit bias, stimulate collaboration and help us share in the sum of the knowledge available for us.

The heart of a Listeria list is a query. In this query it is defined what data is retrieved from Wikidata, it includes the order of presentation and shows this information in a language depending on the availability of labels.

Listeria lists are defined only once and every day a job run by the ListeriaBot updates all lists with the latest data from Wikidata. In this way available information is provided even when articles are still to be written. When there is an article to read, the label is shown in the upright position, when there is not is shows in cursive.

The biggest difference between a Wikipedia list and a Listeria list? No false friends. When you seek a specific "Rebecca Cunnigham", it is really powerful to know that your Prof Cunningham will always be known as Q77527827 and is also authoritatively known by other identifiers. From a qualitative point of view, particularly in lists, red links even blue links such disambiguation is a big thing. At this time a typical Wikipedia list has an error rate because of disambiguation issues of around 4%. I frequently blogged about this, the Listeria list I often referred to is for the George Polk award.

Maintenance is another reason to choose for Listeria lists. This was documented by Magnus, a list was maintained up to a point in time as a Listeria list and for all the wrong reasons human qualities were to prevail. Magnus compared the results after some time and the human maintained list proved to be the poorly maintained list.

Categories are lists of a kind, for many categories it is defined what they contain. Consequently Wikidata is easily updated from Wikipedias and can serve as a source for updating categories as well.

Ok, the impasse. ListeriaBot is blocked because of a false friend issue. The objective is to find a resolution that will benefit us all. The false friend issue is that images can have a same name in both Wikimedia Commons and in English Wikipedia. The existing algorithm for showing pictures is that local pictures take precedence. When ListeriaBot is to do things differently, it can. Thanks to the wikidatification at Commons, we can indicate with a Wikidata identifier what a picture "depicts". Wikidatification of images can also be introduced for pictures at English Wikipedia and it is then becomes easy to always show what Commons has unless a preference is given to show a specific image for a particular project.

I have been told that I do not assume good faith. When I see the extend people care to go to resolve this issue I am only amused. The objective of what we do is share in the sum of all knowledge and do this in a collaborative way.

English Wikipedia fails spectacularly by assuming that their perceived consensus is in the best interest of what we aim to achieve. There is no reflection on the quality brought by Listeria, there is no reflection on how its quality can substantially be improved. I fail to understand what they achieve except for feeling safe by insisting on dated practices and dated points of view.

I wish we could be one community that is known by a best of breed effort with one common goal; sharing the sum of all the knowledge that is available to us.

Friday, April 10, 2020

When crossing the street in the days of Corona, look left, right and left again

Many of us are at home, waiting to go out. We are all obsessed with the latest statistics and read what pundits have to say.  It is likely that you are cognizant of the statistics for your country, state or county.

I learned that Jonathan P. Tennant died in a traffic accident. When you care for statistics, you will wonder what are my chances of dying in a traffic accident at this time. Deduct it from your chances of dying of Corona and things look up.

Not so much for Protohedgehog, he met with an accident. It is sad, he was young, full of promise; just became a member of the Global Young Academy. If anything, it serves as a reminder for us to look left, right and left again to not become a bus factor.