Saturday, August 01, 2020

Commissioners for Tanzanian Regions

Aggrey Mwanri is one of the 31 commissioners for a Tanzanian Region. The Tabora Region has a population of 2,291,623 inhabitants. For most of the 31 regions we know at least one commissioner and only for the Arusha Region we know "them all". 

I have been adding information about these Regional Commissioners and this is from a quality point of view a step in the right direction. Slowly but surely we know for more African countries structures and politicians.

When you compare African countries with "Western" countries, such structures are comparable. This makes it possible to show the extend the data in Wikidata does not represent the African reality. 

It is more than likely that there are lists of the data that is currently missing. These lists help us provide the bare bones of what it takes to know about African countries. 

So who are the data wizards who show where we our data is lacking. Where are the lists that enable the people who know tools like OpenRefine to fill in the gaps. Who has the pictures so that a Wikipedia article for a Mr Mwanri is illustrated??
Thanks,
       GerardM

Sunday, July 26, 2020

Data in Red - A holistic view on the bias for the English language and for AngloAmerican subjects

First a definition; "When data is biased, we mean that the sample is not representative of the entire population". This approach successfully underpins the Women in Red project currently a percentage of 18.51% women in English Wikipedia has been achieved. Compare the coverage of Anglo-American politicians with the politicians from the whole of Africa, the bias in the data at Wikidata is already obvious, it will then have numbers attached to it.

This is not a problem for Wikidata alone and yes, we can have a project and include a lot of data to get to a growth percentage as we did for the Women in Red. Worthwhile in its own right but in this way we do not forge a closer relation with its "premier brand Wikipedia". It would be mere stamp collecting.

The best argument for having data in Wikidata is that it is used. This is done in self selecting Wikipedias through global info boxes and lists. Interwiki links are used on every Wikipedia. Integrating the necessary functionality is a meta/technical affair and firmly for the Wikimedia Foundation to own. 

The functionality to make this happen implements an existing idea with additional twists.
  • Pictures for the subject are linked to courtesy of Special:MediaSearch
  • Automated descriptions are provided in every language to aid disambiguation. At first the functionality by Magnus is used and it is to be replaced with improved descriptions provided by Abstract Wikipedia
  • A Reasonator like display is provided to inform on the data we have on an item.
  • Suggestions for the inclusion in categories and lists are provided based on Wikidata definitions for categories and lists.
  • To help people find sources, alternate sources, Scholia is included when there are papers about the subject. Once existing citations are available, they are an additional resource
In essence this is a toolset that you can opt into as an individual and/or it is the standard for a project. Particularly for the smaller projects this will prove to be really valuable; it will prevent false friends, it indicates heavily linked items that do not have an article. It stimulates the addition of labels because it is beneficial in finding illustrations. 

This proposal is relatively low tech and it will bring our many communities together by providing widely the information that is available to us.
Thanks,
     GerardM

Thursday, July 23, 2020

What to love in English Wikipedia

This list of commissioners of the Arusha Region is great, it provides the basic information that enables me to include this information in Wikidata. It can be assumed that they are all from Tanzania, politicians and human as well. 

What I love in English Wikipedia are lists like this. It is more than likely that for every Tanzanian region there will be a similar list and as a consequence we can include all these fine politicians to Wikidata, list them in whatever Wikipedia.

As more politicians for Tanzania or any other African country are added, politicians will pop up who have held multiple offices. This will be explicit in Wikidata and in Wikipedia you could use Special:WhatLinksHere.

Technically there is not much stopping us from associating red links with Wikidata items. This is the same guy used in the "WhatLinksHere" and you find him in this list that is a work in progress as well. 

Think this through.. With lists like this in any Wikipedia, these people are findable, linkable. It will be possible to state in text what a given commissioner did and, there will be no ambiguity because of the link. 

So I love English Wikipedia for the rich resource of information it is. I love its editors who provide us with the information that enables the reuse of data. I will rejoice when it is recognised that we can do much more. When we accept that together, as an ecosystem, we are in a position where we actually share the sum of all knowledge that is available to us.
Thanks,
        GerardM

Tuesday, July 21, 2020

What to do to counter an institutional bias of the Wikimedia Foundation (part 2)

Our aim is to share the sum of the knowledge available to us with everyone, everywhere, in every language. That is what we are to achieve.

As we establish what we, as a movement, are to do, it follows that we need to measure how well we do. When a community does not play an active part for a particular goal, that too will show in the numbers.

Commons does not need to work in English only. The "Special:MediaSearch" works in all the languages we support. With this search engine enabled on every Wikipedia, we will learn how well it gets adopted in  all our languages. We will know if new Wikidata labels are used in searches on Commons. We will know if more diversity is realised in the pictures used in Wikipedia. We will know how many pictures are downloaded and from what languages.

Only in the Portuguese Wikipedia we find the governors of Mozambican provinces only in text. We can include them in Wikidata, make Listeria lists for them, but how do we disambiguate these politicians. What does it take to make the information for them usable for "abstract Wikipedia"?  How do we assemble information about countries like Mozambique and how do we get it to the quality level that some expect? As important, how do we get people from Mozambique interested and involved? 

Some Wikipedians opine that the Wikimedia Foundation does not need to raise funding for their project. Arguably this is correct, but we can raise funds for other projects, other languages elsewhere because we have more and other ambitions to realise. As we raise more money outside of the USA, more people will gain a sense of ownership. 

When we are to overcome our bias for English and our bias for Wikipedia, we need to market our other languages, our other projects. We need key performance indicators.. For Wikisource, how many books were downloaded. For Commons how many media files were downloaded and from what language.

Results need to be objective and measurable. As our research proves to have been about English Wikipedia we have a problem. We seriously need to consider to what extend it is applicable.
Thanks,
      GerardM

NB While the bias is real and the relationship with English Wikipedians is often antagonistic, it is important to recognise  English Wikipedia as the source for much of the information that ends up in other projects. When we collaborate more, our available data will reach more people in an informative way.

Saturday, July 18, 2020

What to do to counter an institutional bias of the Wikimedia Foundation (part 1)

The bias for Wikipedia as a project is strong, the bias for English makes it worse. When our aim is to share the sum of all knowledge, we have to acknowledge this and consider the consequences and allow for potential remedies.

"Bias" is a loaded word. When you read the Wikipedia article it is only negative. Dictionaries give more room an example: "our strong bias in favor of the idea". The Wikimedia Foundation is considering rebranding and it explicitly states that it seeks a closer relation with its premier brand Wikipedia. 

This is a published bias. It follows that other projects do not receive the same attention, do not get the same priority. For me it is obvious that as a consequence the WMF could do better when it intends to "share in the sum of all available knowledge" let alone the knowledge that is available to it.

Arguably another more insidious bias is the bias for English, particularly the bias for the English Wikipedia. Given that the proof of the pudding is in the eating, we have a world wide public and the use for our information hardly grows. Research is done on English Wikipedia so in effect we arguably do not even know what we are talking about.

When we are to do better, it means that we be need to be free to discuss our biases, present arguments and even use the arguments or publications of others to make a point. The COO of the WMF states in the context of diversity in tech and media that "when the bonus of executives relies on diversity, diversity will happen". It is reasonable to use this same argument. When the bonuses for executives of the WMF rely on the growth in all our projects, it stands to reason that they will make the necessary room for growth. When one of the best Wikipedians says "There are only a limited number of projects that the WMF can take on at any time, and this wouldn't have been my priority", this demonstrates a bias against the other projects. Arguably the WMF has never really, really, really supported other projects, it does not market them, it does not support them, they exist because the MediaWiki software allows for the functionality. 

When we are to counter the institutional bias of the WMF, we have to be able to make the case, present arguments and ask for the WMF to accept the premise and consider suggestions for change. This proves to be an issue and makes our biases even more intractable.
Thanks,
       GerardM

Sunday, July 12, 2020

Telling the story of governors of Mozambique

As part of my Africa project I look for political positions like Presidents, Prime Ministers, Ministers and now also Governors. I started with provinces et al because a South African minister of health of a province was considered to be not notable enough.

With Wikilambda or if you wish "Abstract Wikipedia" being a thing it is important to consider how the story is told. The bare bones of a story already shows in Reasonator. Most of the Mozambican governors are new to Wikidata. They have a position of  "governor of their state", a start and end date and as applicable a predecessor and a successor. Obviously they are politician and Mozambican.

This time I had to go for the Portuguese Wikipedia for a source. There is a list mixed with colonial governors and they need to fit a different mold. They are Portuguese and arguably they are not politicians but administrators. 

What I am eager to learn is how Wikilambda will be able to tell these stories. How it will expand the stories as more is known. I wonder if a tool like ShEx will play a role. Anyway, good times.
Thanks,
      GerardM

Sunday, July 05, 2020

The quality of all the Nigerian governors at @Wikidata

There are lists for all the governors of all the current Nigerian states. They exist on many Wikipedias. The information was known to be incomplete and based on lists on the English Wikipedia, I added information on Wikidata and as a result these lists may update with better data.

Obviously, when you copy data across to another platform, errors will occur. Sometimes it is me, sometimes it is in the data. I have only indicated when a governor was in office and predecessors and successors. 

The data is provided in a way that makes it easy to query; no information on elections (many governors were not elected) but proper start and end dates. The dates are as provided on the Wikipedia lists, articles for a governor are often more precise. People from Nigeria often are known by different names, I did add labels where I needed them for my disambiguation. 

When you want to know how many of these fine gentlemen are still alive, it will take some effort to kill of those who are still walking around according to Wikidata. It is relevant to know if a governor was elected or not. To do that properly you want to include election data elsewhere; there is no one on one relation between a position, elected officials and them being in office.

There is plenty to improve on the data. When people do, Listeria lists will update. Maybe someone will consider updating the English Wikipedia lists.
Thanks,
        GerardM

Saturday, July 04, 2020

Abstract Wikipedia, telling a story from available data

For me Reasonator is the best tool for Wikidata. It shows the data for a Wikidata item in an informative way. In my approach I am "deficit focused"; I add information for subjects that are not well represented. Additional information such as dates and successors make the information for Nigerian state governors more complete and it shows in Reasonator and Listeria lists.

Abstract Wikipedia, the new Wikimedia project is possible because of all the data in Wikidata. People who know the structure of a language will build constructs that present information in natural language. This is awesome because it will help us share widely in the sum of all available knowledge.

The objective of the Wikipedia projects has always been to share in the sum of all available knowledge. As more languages support the constructs needed for "Abstract Wikipedia", what we have in Wikidata will mushroom and evolve. It is because the data gets a purpose and, the data will be made to fit this purpose. 

The best part, Wikipedians want to tell stories and it only takes one person to add a bit of information to make a difference in the constructs for every language. My expectation is that as constructs become available for the languages of Nigeria, it will no longer be me who adds information on Nigerian politicians. It will be people from Nigeria. For them it will be Abstract Wikipedia that will show the data in an informative way.
Thanks,
      GerardM

Friday, July 03, 2020

Black representation matters, the Congressional Black Caucus

A friend asked me to help bolster the notability of black scientists. I was told of a "black caucus" with chairs and a list would help. I googled and found a black caucus with chairs and we did not know them at Wikidata. They were the chairs of the Congressional Black Caucus. Maybe not the caucus intended but of such a prominence that I added them all.

These are only the leaders and obviously over time the membership of the Congressional Black Caucus changed with the different elections. Someone else may add the data. 

The information I used could be found on English Wikipedia and is part of the article about the Congressional Black Caucus. Typically, when a position is considered important enough, it has its own article. When it does, it has more relevance and more information is available about the relevance and the history of such a position.

When Black representation matters, you want substantial lists and articles both on Wikidata and Wikipedia.
Thanks,
     GerardM

Sunday, June 28, 2020

@Wikipedia and freedom of speech

When you disagree on Wikipedia with current practices, you have to use stilted language to prevent administrators taking offence and blocking your account. 

At this time many articles of black female scientists have been marked for deletion. It is an organised effort because there are lists subdividing these articles on criteria. For the record, for many of these fine scientists I added content on Wikidata, added all kinds of information including awards.

When I learned that the article for Ayana Jordan was marked for deletion, I added the following protest: "Keep I want to stress that those !@##$ who make these proposals should be ashamed. Thanks, GerardM (talk) 05:10, 27 June 2020 (UTC)". The response came quick: "@GerardM: unlike some others here, yours is not a new account. So you should need no reminding that personal attacks and assumptions of bad faith are forbidden here. —" I replied with: " I did not use any swear words, I did express my opinion of the people who are so detrimental to what Wikipedia should stand for. That is not bad faith that is not a personal attack that is expressing revulsion. Thanks, GerardM (talk) 06:01, 27 June 2020 (UTC)". The conversation was taken elsewhere, I was blocked for a day.

For a Wikipedia administrator, it should be no news that these people who are repressive of what is not their cup of tea are widely resented. Marking articles for deletion is a form of harassment. I do not care who proposed the deletion, I do not know the person who marked Avana's article for deletion and I do not care to know him and his ilk. We have a situation where harassment is allowed and calling out such travesties is considered a personal attack and an assumption of bad faith. 

So I have been blocked for a day. I am proud to stand up against such bullies. I consider the process of deletion as rigged. These !@##$ are free to do as they wish because "we should assume good faith". Hell no.
Thanks,
     GerardM

Saturday, June 27, 2020

Hey @Wikimedia lets move the needle

The Wikimedia projects are biased. They favour only one language, the English language. When you look at Wikipedia traffic English Wikipedia is something like 50% and it does not represent 50% of our intended public. 

The objective is to improve the usefulness of the other projects and thereby increase their traffic. That is, more articles and books are read, more pictures are seen and downloaded.

Lets pick one language, Yoruba, as an example. There are currently 32,624 pages in its Wikipedia. There are some 40 million people speaking the language. So what can we do for Yoruba editors and readers. How can we track what makes a difference and also what makes a difference and what can the WMF do to achieve this.

* We can improve list support. 
Currently the best support for supporting lists in a Wikipedia is "Listeria". It is supported by Magnus.. Listeria lists have been shown to be more up to date then manual lists on English Wikipedia, for less resourced projects this will be even more true. When existing lists can be easily included in an article, it will expand available information hugely.. Here an example of Listeria lists on the Yoruba Wikipedia. Content of these lists show in Yoruba. Lists are better supported and adopted when it is WMF supported functionality.

* Choosing pictures for illustration
When people look for a picture, they have to goto Commons or they visit Wikipedia articles on the same subject and use these same pictures. When the Special:MediaSearch is available as a tool from every Wikipedia article, a much richer palette of pictures becomes available to choose from. (The search is for "Agbègbè Ìjọba Ìbílẹ̀ Mushin")..

The cool thing is, when this tool is available when writing an article, it is easy to more pro-actively add labels to Wikidata. This will improve the performance for the Special:MediaSearch even more.

What would truly support Special:MediaSearch is disambiguation. It is unreasonable to expect that we get descriptions in all the 300+ languages we support. What Reasonator supports are automated descriptions. It makes it easy and obvious to choose the right item in any language.

For the Wikimedia Foundation to support other languages, for it to move the needle on any and all languages, we need to measure what is meaningful. The number of searches by Special:MediaSearch and what language was used. The number of pictures used in each Wikipedia. The effect lists have on the writing of new articles.

When we did not measure such numbers so far, it is what we should do to move the needle. One needle is the total number of reads quite another is the number of reads for each project. Same for the use of Wikidata and Commons.
Thanks,
     GerardM

Sunday, June 21, 2020

Marketing @Wikimedia but first some SWOT analysis

The Wikimedia Foundation has a 2030 strategy, it intends to increase its reach, increase its budget and rename projects into "Wikipedia something" in order to improve its visibility.. 

Wikipedia is one of the most visited websites on the Internet, its quality is good and is mostly edited by older white males in the first world. Typically when people mention Wikipedia they refer to the English version but it is only 50% of Wikimedia traffic. From a marketing point of view the English market is saturated, growth can be expected from Wikipedias in other languages and from other projects.

The Wikimedia Foundation is very much tied to the United States. Given the current regime and the possibility that it will prevail in November, this reliance is an existential threat. It is likely that the US government will want to intervene in Wikimedia content after 2020. I doubt it is possible, given the current hardware configuration, to move away from the US and still serve the rest of the world with a NPOV.

At this time the Wikimedia Foundation is centrally led, there are satellite organisations in many countries who are limited in what they can do; their budgets are centrally managed. Fundraising is mostly done from the USA and most of it is raised in the USA. That is problematic in its own right because many "Wikipedians" feel that too much money is raised, money not needed to support their project and people in other countries do not get to feel that it is "their" project because of "their" contributions. As a professional fundraiser, I am convinced contributions from the Netherlands could increase at least tenfold within a year.

The bias for English is huge and it is compounded by the bias for English Wikipedia. At a conference a Dutch professor stated that research not about or linked to the English Wikipedia is unlikely to get published. It follows that the data used for the 2030 strategy includes this same bias. The MediaWiki software is developed first and foremost for English Wikipedia and it is expected to work for other languages and for other projects. There used to be a development team specialised in language technology.. it was dissolved. 

There was a time when English Wikipedia did support the other projects. Because of an anti Wikidata stance by some this changed. There is no solution for false friends and lists are not as well maintained as they could be. When we link to the Wikidata item for an article and no longer to a title for that same article this will change. It is easy enough to build functionality that allows for both and by opt-in projects will understand the benefit and choose to adopt.

When marketing is the reason for changing the name of projects, it is important to consider the ramifications. The "Wikipedians" among us claim ownership of the Foundation, insist on actions in their image. They represent a staid community representing a saturated market. With a strategy in place it is possible to disregard them. This makes only sense when the WMF tackles its bias for English as a priority. This is what is needed to realise the 2030 strategy.
Thanks,
       GerardM

Sunday, June 14, 2020

@Wikipedia is old news, it could point to new sources

Wikipedia provides the best text on many subjects. It being static is both a blessing and a curse. It is a blessing when it is a topic that is very much in the public eye, it attracts many people willing to edit and come to a neutral point of view.

It is a curse when the topic is no longer popular. No longer is there an interest to maintain the information, new publications are not integrated in what used to be a neutral point of view.

In the references section of an article you find the underpinnings of what is stated in an article. It may be newspaper articles or science papers. Both newspapers and science have a hard time attracting attention and this endangers the availability of quality sources for future updates.

In Scholia information is continuously updated about the latest papers by authors and or about subjects. As time goes by, papers become available dated later than the latest reference. When such papers are clearly marked, it is an invitation for the Wikipedia community to revisit a subject and learn if what was a neutral point of view survives as such vis a vis the latest information.

Every subject should have its own Scholia.
Thanks,
       GerardM

Friday, June 12, 2020

Professor Vassie Ware - an early recipient of an early career award

On the page of the "GlobalYoungAcademyTeam", you find many young academies. You also find "early career awards". These young academies, these awards are represented by "Listeria lists", when something changes in Wikidata it is reflected in them.

The WICB Junior Award is an early career award for women conferred by the American Society for Cell Biology. An English Wikipedia article provided the initial content for this award and given that there are many people interested in what this award is about, I included all recipients of the award. Professor Ware is the earliest recipient I added by hand.

The standardised Listeria lists, show the people who are included, it shows their occupation, identifiers for ORCiD, Google Scholar and VIAF and it shows the number of publications known for them. The approach is a wiki approach and it is therefore fine that we only have two publications for Professor Ware, we do not have a freely licensed picture of professor Ware yet and, there is no Wikipedia article either.

Once a list is reasonably complete, new information is added all the time. It follows that the Scholia page for the award and for a scientist like Prof Ware evolve. In the true wiki spirit, a structure is provided and anyone who cares to makes a difference. A difference for the understanding of science and for the people who make science what it is.
Thanks,
       GerardM

Thursday, June 04, 2020

@Wikimedia and languages - @WikiCommons search, the most relevant development since @Wikidata

The Wikimedia Foundation is important for the support of languages on the Internet. The localisation of its software is done at translatewiki.net, it is done in over 300 languages.

The milestones for multilingual support are:
These milestones have been very much technology driven. For me the one reason why Wikidata became the success it is, is because it was from the start linked to every subject covered by Wikipedia and the solution was so overwhelmingly superior that nobody could reasonably object.

To make a success of this latest milestone, institutional support is needed. It is for the Wikimedia Foundation, its movement to reduce its bias for English and make room for improved language support.

My way of phrasing this as an essential objective: "All of is available to every single person on the planet". As we adopt this as our objective, it is first and foremost about making Special:MediaSearch useful in any and all of our languages and make it available from any and all of our Wikipedias.

As we adopt this, it is essential that priority is given to multilingual search over special interests including GLAM, Open Data, SPARQL and what have you. Priority when we are to open up in multiple languages first. Special interest only gain relevance when it is made obvious how it helps it helps open up Commons in Swahili, Hindi, German or Vietnamese.

Special:MediaSearch is possible because of everything that went before.. Its functionality is part of MediaWiki and localised at translatewiki.net. The existing search engine is now linked to the labels for items in Wikidata and it was made public after Hay Kranen brought us his proof of concept. It became available warts and all and while finding منصور اعجاز  in Punjabi is huge, it is not great when you do not find cats because a user is called Kočka..

The challenge to us as an organisation, a movement are we willing to work on our existing bias, open up Commons in all the languages we are said to support and accept that our hobby horses will get attention not in the next but in a future iteration.
Thanks,
       GerardM

Thursday, May 28, 2020

@WikiCommons - Sarah T. Roberts versus Sarah T. Roberts

I have a renewed interest in Commons because the first steps have been made to make it actually useful. According to Wikidata there are two distinct Sarah T. Roberts. One is an epidemiologist the other is into information & media studies.

At Commons it was a mess, the picture of Sarah was used to illustrate an info box of the other Sarah. It is not that interesting to tell you how I did what. Relevant is that I did. I did because you will will find things when there is a label for whatever in "your" language..

Given that we do not research the use of Commons or Wikidata for that matter, why should the WMF give priority to opening up Commons even further? After all, there is no data to support it..
Thanks,
      GerardM

Tuesday, May 26, 2020

@WikiCommons - Meanwhile in a school in India, Japan, Russia

These students in India have to do a project. The subject is Botswana. Their teacher wants them to find many pictures so he searched Wikimedia Commons among others for pictures of  Mokgweetsi Masisi, the president of Botswana. He marked the pictures that depicts Mr Masisi and now his pupils will find more pictures of him when they look for मोकेगसेसी मासी.

At the same time in Japan students have to do a project about Botswana. Their teacher is pleasantly surprised when he find so many pictures for モクウィツィ・マシシ...
Thanks,
       GerardM

Monday, May 25, 2020

@WikiCommons - meanwhile in a different universe

And again there was a discussion that it should not be this hard to find pictures in Commons. The big difference this time is that there is now a wealth of images that have been tagged for what they "depict". They are linked to Wikidata items and they have a wealth of labels in many, many languages. In essence it has always been an objective of Wikidata to share its content in any and all of the 300+ languages supported by a Wikipedia.

The ideas that floated around soon made it into a "proof of concept" and as so often it actually worked after a fashion. The first iteration was in true Wikimedia tradition English only. The proof of concept got its second language in Dutch, Hay Kranen the developer is Dutch. Now there are nine languages and we are waiting for French to be the tenth.

So what does it do. You can look for pictures in Commons, it has 61 million media files, and when you are looking for available pictures in your language, you will find it as long as Wikidata has a label in your language.  This is for instance a result in Japanese and this is the result in German.

What can you do to make it better? Add labels in your language for the things you want to find and find media files that depicts what you are looking for. When nobody translated the software in your language, you can even do that.

Why is this so relevant? Have you ever wondered how many pictures you find in one of the smaller languages using Google or Bing? Let me tell you, it is disappointing to be polite. Commons is the repository of the mediafiles that illustrate all the Wikipedias so yes, it covers "almost anything".

The Wikimedia Foundation has this big strategy for its movement to be inclusive. This is a wonderful opportunity to show how agile it is, that it understands and supports a need that has been expressed for many many years. The beauty is the the way forward has been expressed in something that already works.

ABSOLUTELY, there will be challenges in integrating this functionality where it fulfills a need.

Luckily it is not necessary for it all to be done in one go. The first step can be as little as to take the "proof of concept" an rewrite it in the preferred language of the WMF, internationalise and localise it and keep it stand alone for now. The people who know about it will use it and they will be the first to point out what more they want to be done. A priority will be to retain its KISSable nature.

The objective is to open up Commons. Open it up in any and all languages. For me it is obvious. I will gladly give it my attention in the expectation that both Wikidata and Commons actually find a public, have a purpose that is more than what we do for ourselves.
Thanks,
      GerardM

Sunday, May 03, 2020

These scientists saw the coronavirus coming. Now they're trying to stop the next pandemic before it starts.

When you read an article with the same title as this blog post, it is one among many clamoring for attention. There is so much that can be qualified as not worth your time. In this blogpost I describe my way of adding value for articles that I think are worthwhile.

What I do is look for people in the article. In this article it is a Jonathan Epstein. The first thing is to look for Jonathan in Wikidata. Disambiguation is the name of the game and, finding candidates who might be Jonathan is the first step. Jonathan proved to be Jonathan H Epstein, there was also a Jonathan H. Epstein. Because of sharing characteristics they could be merged. Vital in this are authority identifiers and links to papers that make it reasonable to assume that they are the same person. It is helpful when Jonathan is part of the disambiguation list when people look for "Jonathan Epstein" so it is added as an alias.

The next step is to enrich the data about Jonathan P.. Authorities may identify where he works and from the website of Columbia university additional information is digested into Wikidata statements, information like the alma maters. In Wikidata many authors are only known as "author name strings", meaning they are only known as text. With available tooling, papers are linked to Q88406948, the identifier for our Jonathan.

After these steps, there is a reasonable impression of the relevance of Jonathan as a scholar and this supports the likelihood that the article that cites him can be trusted. Do this for others presented as authorities in an article and by repeating the process you provide a way for Wikidata to become a source that helps identify fake news.
Thanks,
      GerardM

Sunday, April 19, 2020

@Wikimedia interconnection, what it looks like for me

On twitter a reference was made to an article in the Sunday Times. The article is about the response of the UK government to the COVID-19 pandemic. It mentions many people and mentions their roles.

It is up to you to have your own opinion, but most if not all people are known in Wikidata, some have a Wikipedia article and all of them are in the spotlight. So when you get an edited sound bite, when you want to know if someone is "for real", it helps when you can turn to Wikimedia and find what there is to know.

This sound bite about "herd immunity" is too short to be properly understood. The argument made is that herd immunity is all that we have now that the genie is out of the bottle and, who can argue with that? Read the article as well.. After some tinkering, the Scholia for Prof Edmunds shows some 235 papers, many co-authors and still, even more co-authors are missing. The subjects he covered are extensive.. check out that Scholia. Prof Edmunds takes/tool part in UK government deliberations; it is mentioned in that Sunday Times article. He is asked to explain epidemiology to the public.

Wikimedia interconnection for me is to enrich our existing knowledge in cases like this. Tweeting about it, blogging about it may lead to even more and better information like a Wikipedia article. What we as Wikimedians do does not happen in a vacuum, connecting to what happens and who the players are help us and our readers understand who they are in  these early days of the COVID-19 pandemic.
Thanks,
      GerardM

Monday, April 13, 2020

The CDC and its National Center for Immunization and Respiratory Diseases

Because of the COVID-19 pandemic, there is so much attention to every aspect of it; the epidemiology, virology, vaccination, co-morbidity. Mix it with a heady mix of economics, profiteering and graft and what are you to think of it all. What is fact and what is not.

When I read that there is an "Outbreak Management Team" in the Netherlands, an advisory body to the Dutch government, I had a look. I added all the known scientists to Wikidata, looked for "authority identifiers" and attributed some of the papers that are likely theirs to them. It generated a really nice Scholia for them and the team as well.

At first I wanted to do similar European organisations but it takes quite some effort to find them. So I took the easy route and went for the CDC. Its organisational chart contains a wealth or smaller orgs among the the NCIRD and it has its own organisational chart. I did the same routine, adding the obvious scientists to Wikidata, looked for the authority identifiers for them, attributed papers.

The best bit? While adding people one at a time, you see how the Scholia evolves. Authors are reordered based on their number of papers, you find the ones that are co-authors and colleagues. The latest papers are shown first.. It is nice. However, this is management only, I cannot wait and see it evolve as staff finds its place in the Scholia as well.
Thanks,
     GerardM

Sunday, April 12, 2020

False friends and ListeriaBot - finding a way out of an impasse

ListeriaBot is a bot that maintains lists based on information in Wikidata. In this blogpost I will explain what a Listeria list is, what it is used for. I will point out its qualitative benefits and explain how Listeria can be instrumental to limit bias, stimulate collaboration and help us share in the sum of the knowledge available for us.

The heart of a Listeria list is a query. In this query it is defined what data is retrieved from Wikidata, it includes the order of presentation and shows this information in a language depending on the availability of labels.

Listeria lists are defined only once and every day a job run by the ListeriaBot updates all lists with the latest data from Wikidata. In this way available information is provided even when articles are still to be written. When there is an article to read, the label is shown in the upright position, when there is not is shows in cursive.

The biggest difference between a Wikipedia list and a Listeria list? No false friends. When you seek a specific "Rebecca Cunnigham", it is really powerful to know that your Prof Cunningham will always be known as Q77527827 and is also authoritatively known by other identifiers. From a qualitative point of view, particularly in lists, red links even blue links such disambiguation is a big thing. At this time a typical Wikipedia list has an error rate because of disambiguation issues of around 4%. I frequently blogged about this, the Listeria list I often referred to is for the George Polk award.

Maintenance is another reason to choose for Listeria lists. This was documented by Magnus, a list was maintained up to a point in time as a Listeria list and for all the wrong reasons human qualities were to prevail. Magnus compared the results after some time and the human maintained list proved to be the poorly maintained list.

Categories are lists of a kind, for many categories it is defined what they contain. Consequently Wikidata is easily updated from Wikipedias and can serve as a source for updating categories as well.

Ok, the impasse. ListeriaBot is blocked because of a false friend issue. The objective is to find a resolution that will benefit us all. The false friend issue is that images can have a same name in both Wikimedia Commons and in English Wikipedia. The existing algorithm for showing pictures is that local pictures take precedence. When ListeriaBot is to do things differently, it can. Thanks to the wikidatification at Commons, we can indicate with a Wikidata identifier what a picture "depicts". Wikidatification of images can also be introduced for pictures at English Wikipedia and it is then becomes easy to always show what Commons has unless a preference is given to show a specific image for a particular project.

I have been told that I do not assume good faith. When I see the extend people care to go to resolve this issue I am only amused. The objective of what we do is share in the sum of all knowledge and do this in a collaborative way.

English Wikipedia fails spectacularly by assuming that their perceived consensus is in the best interest of what we aim to achieve. There is no reflection on the quality brought by Listeria, there is no reflection on how its quality can substantially be improved. I fail to understand what they achieve except for feeling safe by insisting on dated practices and dated points of view.

I wish we could be one community that is known by a best of breed effort with one common goal; sharing the sum of all the knowledge that is available to us.
Thanks,
        GerardM

Friday, April 10, 2020

When crossing the street in the days of Corona, look left, right and left again

Many of us are at home, waiting to go out. We are all obsessed with the latest statistics and read what pundits have to say.  It is likely that you are cognizant of the statistics for your country, state or county.

I learned that Jonathan P. Tennant died in a traffic accident. When you care for statistics, you will wonder what are my chances of dying in a traffic accident at this time. Deduct it from your chances of dying of Corona and things look up.

Not so much for Protohedgehog, he met with an accident. It is sad, he was young, full of promise; just became a member of the Global Young Academy. If anything, it serves as a reminder for us to look left, right and left again to not become a bus factor.
Thanks,
     GerardM

Sunday, April 05, 2020

Edwin G. Abel aka Ed Abel

Professor E.G. Abel came on my radar because he is a recipient of the Daniel X. Freedman Award. He has a Wikipedia article as "Ed Abel" and the information of the award has him as "Edwin G. Abel".

I looked into the Freedman award because of a criticism on the Wikipedia article of Professor Montegia. The superior article of Prof Montegia is criticised because it is an orphan. It now has a Scholia template and that links the 105 scholarly papers known in Wikidata. Its timeline does include the Freedman award linking the Professors Abel en Montegia.

I doubt it is considered enough to remove the orphan template. I have added a redirect for the Freedman award to the issuing organisation. Maintaining a Wikipedia list is not one of my ambitions.. It could be a Listeria list like this one..
Thanks,
      GerardM

Sunday, March 15, 2020

#SwineFlue management with #wolves

A lot is being said about viruses and pandemics, they do not only exist in humans but also in animals particularly in kept animals. One knee jerk reaction is that by an outbreak of a disease animals in nature are blamed.

A good example is swine flue and African swine flue. It is a tradition to call for the culling, the extermination of wild boar and, traditionally the result is an increase in boar being killed.

A real solution may be found in an ecological solution, wolves who predate on boar prefer a sickly animal over a healthy animal that is better able to fight back. There is documentation of wolves determining the extend of outbreaks of a swine flue. Areas with wolves do better.

As an apex hunter the effects of wolves on its ecology are profound. There are all kinds of arguments why people oppose the reintroduction of animals that are essential for a functional ecology, animals like wild boar, beaver, wolf are extinct in places. We argue that we need more trees to offset climate change but this will not work when those trees are not placed in a functioning ecology. In Scotland trees will not grow because they will be eaten by overabundant elk.. Scotland has no functioning ecology it lacks predators like wolves and lynx to keep the elk in check.

When we consider pandemics, viral diseases, our ecology it is important to consider our own effects. We will do better when we enable ecological functionality and consider building with nature for more sustainable results.
Thanks,
        GerardM

Wednesday, March 04, 2020

@Wikipedia; the dread that is one identity that binds us all

On Twitter Janeen Uzzell praised a blogpost that is the Wikimedia Foundation All Hands: 2020 Sketchbook and indeed it informs about current thinking, most of it is great and still, I find it absolutely terrifying.

There are several great sketches in there. Katherine Maher gave an asperational talk, I love it for Wikimedia to be seen as infrastructural, inclusive and even that that what we do does not have to be in our projects. Important is that she mentions "support systems" because they provide the input for much of our processes.

Important is the page on security and risk. All the important concepts are mentioned among them; likelihood, relative impact and management preparedness but also "plan for and mitigate risks".

What truly makes me uneasy is when it is said that we aim to clarify who we are in the world in one brand, Wikipedia. The idea is that when we are all branded as Wikipedia, things are likely to become easier. When you check out the website brandingwikipedia.org there is no argument; Wikipedia is free knowledge. When you check out what it is to do
  • project and improve our reputation
  • support our movement/growth
  • be opt-in
In the abstract Wikipedia IS wonderful, in reality the concept of what Wikipedia is, is largely determined by the English Wikipedia. It it is fiercely independent, it is hardly inclusive and it has largely determined the maneuvering space the Wikimedia Foundation has. In order to "plan for and mitigate risks", I will mention several reasons why I am anxious because of this branding initiative.
  • In the Commons OTRS they use English Wikipedia notions to determine if pictures can stay or are to be removed. Commons provides a service to all Wikimedia projects
  • The query functionality for Commons is maintained by people from the Foundation. For more than half a year it puts a strain on the growth and usefulness of Wikidata. Tools have become glacially slow and often malfunction because an edit is not available when needed in further processing. It is not known what the position of the WMF director is in this
  • This is about marketing and we have never done much marketing for any of our projects. What we have done was reactive and has been all about the English Wikipedia. Now consider this:
    • Wikisource, we do not know what is available at what quality, it is all about editing and not about having people read the finished article, consequently we do not value Wikisource and fulfill its potential.
    • So far Commons has always been English only. With the support of the "depicts" functionality, there is room to enable and market  a multilingual search engine. In the spirit of "it is a Wiki", it serves as an open invite to add labels in any and all of our languages and open up what Commons has to offer. It is how to market free content the Wiki way.
    • In Wikidata we know many more concepts than what we know in any individual Wikipedias. We could use our data and inform as we have done for years in multilingual tools like Reasonator. This is an example in English Russian Chinese and Kannada. NB it takes additional labels to improve results and consequently this is the inclusive approach.
    • When Wikipedians were willing to reflect on their own performance, we could help them solve their false friends issues.
One sketch in the sketchbook is a presentation by Jess Wade. It says that even Academia is biased. As the Wikimedia community we do not need to be subservient to any bias and most certainly not the bias that Wikipedia has brought us.

Tuesday, March 03, 2020

"Building with Nature" .. a case for a beaver solution

The Markermeer is a lake with an ecological problem; the water is cloudy, plants and mussels do not grow. In order to alleviate that problem, the Marker Wadden was developed and in order to future proof the Houtribdijk the same "building with nature" concepts are used; the extensive water features will enable the growth of plants and the intended result is not only that the water will be clear again but also that the dyke will better withstand future storms.

With ecology part of the solution, it is relevant to appreciate ecology as part of a solution for open issues. There are two open issues: geese and willows. So far, geese are kept at bay at some areas with fences and young willows are being rooted out by volunteers.

When willows are allowed to grow, they will mature quickly and enable the next ecological succession. The wood and bark provides food and building material for beavers and this makes for an even more robust defense against storm damage. Some trees will mature anyway and this provides natural nesting places for white tailed eagles. Given that the wels catfish is endemic in the Markermeer, it will find its place among the Marker wadden and it may even predate on the over abundant geese.

So given that Natuurmonumenten, the organisation looking after the Marker Wadden is happy about beavers in its terrains, maybe it is the "building with nature" engineers who have to consider succession in their deliberations.
Thanks,
      GerardM

Thursday, February 27, 2020

Balancing arguments - Gender and the #Wikimedia projects

Some say, gender is important because there is a serious imbalance in the reporting on people in Wikipedia. There are many people who dedicate their time to bring some balance by writing Wikipedia articles. At the same time it is important to be cognizant of the fact that gender is not binary; the point it brings is that when you write an article you need a source to know what gender a person identifies with.

So far so good. At Wikidata other things are at play. It is vital to understand that Wikidata items are not so much about an individual, an item. When recipients of an award are included like for "Member of the Hassan II Academy of Sciences and Technologies". There is often nothing more than Moroccans that received an award because a source says so. Determining a gender relies on googling for images of the person and when the name is decidedly male like Omar, Hakim, Mustapha the gender is implied.

Why include a gender? Because projects like Women in Red rely on prospects to write articles about. Because tools like Scholia do express what we know about all the recipients of an award.. It tells us that there are currently two ladies known and 22 gentlemen. We know nothing of their work because the bias against Africa is staggering and because performance for inclusion at Wikidata is abysmal.

The arguments why we should not include gender is often based on what people expect; "Wikidata contains large sets of data and consider that it makes no statistical difference one way or the other". The reality however is that when you consider the use of data in for instance Scholia, the subsets are small. One more fine lady makes a statistical difference.

When people write about a person for a Wikipedia, they do get to know the person, they have multiple sources at hand. At Wikidata not so much. One purpose of adding people is to nibble away at our bias.

Requiring sources to indicate gender is what takes away the usefulness of the data and is counter productive when we are talking bias. For me it is a Wikipedia argument, an article based argument and it is counter productive to translate it to the set based approach of Wikidata.
Thanks,
       GerardM

Saturday, February 15, 2020

Wikipedia consensus? - It is who you ask but what are the facts

An article in VICE starts as follows: 'Wikipedia consensus is that an unedited machine translation, left as a Wikipedia article, is worse than nothing'. This article is problematic in so many ways, it starts with this premise because the Cebuano Wikipedia does not contain machine translation. It contains machine generated text and, to add insult to injury this same article states: 'the majority (generated articles) are surprisingly well constructed'.

An article like this can be sanity checked. Principles come first;
  • This is about a Wiki in contrast to the Nupedia approach. 
  • Wikipedia’s founding goal is to make knowledge freely available online in as many languages as possible.
  • There is a difference between opinions and facts
It is important how arguments are made. When "highly trusted users who specialize in combating vandalism" are introduced and comment that "many articles are created by bots", it does not follow that the quality is low nor that this is to be considered vandalism but the implication is made.

It is a fact that the Cebuano Wikipedia has 5,378,563 articles and also that there are some 16.5 million people who understand Cebuano. There is however no relation between these two facts. More relevant is that the wife of Sverker Johansson has Cebuano as her mother tongue and his two kids learn from their maternal cultural heritage also thanks to the work he does for the Cebuano Wikipedia. That is very much a classic Wiki approach.

In contrast the English Wikipedia has its bot policy preventing the use of bots for generating content. These notions should be local to the English Wikipedia and need not have relevance elsewhere. These highly trusted users can be expected to proselyte this point of view and thanks to this POV they take away a source of information without offering any credible alternative for the existing lack of information available to the rest of the world. At the same time the English Wikipedia is biased in the information it provides and does not provide the same quality of service for the domains selected for the Cebuano Wikipedia.

Sadly the Wikimedia Foundation itself makes no effective difference in support of the "other" languages it is said. An alternative to the LSJbot was introduced and it may be able to make a difference but as it does not provide a public facing service making it very much a paper tiger. Even worse are the Nupedia notions in the combination of two things: "Due to its heavy reliance on Wikidata entries, the quality of content produced is heavily influenced by the quality of the Wikidata available." and "It can discredit other Wikipedia entries related to automatic creation of content or even the Wikipedia quality.” These notions are problematic for several reasons.
  • No information is preferred over little information when our service to an end user is considered
  • Quality of information is framed in the light of existing Wikipedia entries. Whose Wikipedia entries are we considering? They are however irrelevant as our aim is to inform our end users; they do not cover the same subject.
  • When the quality is considered of Wikidata .. Why, it is a wiki and its quality is improving particularly as so many eyes shine their light on it.
  • We can inform, in any and all languages, and we do not even have to call it Wikipedia, we do not even have to save it in a Wikipedia when we only cache the results from the automated text generation.
  • When we cache results of automated text generation, texts can be generated again when the data is expanded or changed.
So far the critique of the VICE article, but then again does English not have its own problems?
  • Its 1,143 administrators and 137,368 active users are struggling to keep up, when you compare it with the 6 administrators and 14 active users for the Cebuano Wikipedia it is understandable that, as they grow, the English have to rely more and more on bots and artificial intelligence.
  • Magnus has demonstrated that the maintenance of lists is better served not by editors but by using the data from Wikidata
  • The Wikipedia technology has a problem with false friends. Arguably some 4% of list entries are wrong because the wrong article is linked to. When links are solidified by using Wikidata identifiers instead, this problem disappears in the same way as the problems with interwiki links disappeared.
The biggest problem "Wikipedia consensus" has is that it was formulated in the past by a tiny in-crowd making up the "accepted" big words for the rest of us and worse they can not be swayed from their POV by facts.
Thanks,
      GerardM

Sunday, February 09, 2020

Dear @krmaher @Wikipedia is not the #flagship to win our war

The virtues of Wikipedia have been expressed in millions of words, on many conferences and in many interviews by you, Jimmy and countless others. Nothing wrong with that. Wikipedia has been extremely useful, it has a dedicated following and it is going exactly nowhere new. What it is expected to bring is more of the same old old.

Wikipedia pundits use their own idiom, have their own values and easily dismiss what does not comply with their notions. Notions not based in actual facts but in opinions.

Our aim was to share in the sum of all knowledge, what we have is a domineering English Wikipedia expecting everything to be shaped in its image. The result is many malfunctioning sister projects that do not get attention "because what is good for the goose is good for the gander". It is not. I can find a picture of the Vasa, a former flagship but not in Commons (it uses the same technology as Wikipedia). There are many books in Wikisource but we do not know what is completed and we do not market these books to a public. When Wikidata was created its first achievement was taking inter wiki links away from Wikipedia providing a functional platform and removing millions of edits from all Wikipedias. So far functionality that does improve on what Wikipedia has is dismissed while facts show how Wikipedia under performs.

The question is if Wikimedia as an organisation is beholden to Wikipedia. If its aspirations are more than only that, it has an obligation to the other projects. It is to find a public for what the finished content of Wikisource. It has to find a public for the biggest open content resource of images making it actually easy and obvious to find pictures of for instance the Vasa. Finally there is Wikidata that is crippled by its own success and hampered from what seems to me to be a lack of organisational attention.

Dear Katherine, I am happy when a technician expresses his plans to mitigate a disaster. He does this within the restrictions he is under. It is however for you, in your capacity as director of the Wikimedia Foundation to express what relevance is given to Wikidata. We have a war chest, we are challenged to take up a new role in the war for factual and balanced information. With only English Wikipedia we have already lost the rest of the world and with English Wikipedia we also have a very biased world view. Never mind, nothing new here.

My question to you, are you aware that Wikidata has no room for growth? Is that acceptable to the Foundation? How are we going to share the sum of the knowledge that is available to us when our flagship is about to sink while sailing out of the harbor?
Thanks,
      GerardM

Saturday, February 08, 2020

The performance of Wikidata - Denis Karuhize Byarugaba

Professor Denis Karuhize Byarugaba is one of the Fellows of the Uganda National Academy of Sciences. At Wikidata we know about papers that he wrote, we know this because of the author strings that point to him.

One of the Scholia tools allows for the disambiguation of these papers by linking to the Wikidata item. It is important that we do because in this way we build on the existence of African scientists on Wikidata.

That is the theory, the practice is that it is increasingly cumbersome to even try to add papers because Wikidata more often than not informs about Too many requests. When this happens occasionally it is fine but when only 10 percent of the requests is honoured, the tool is effectively dead.

Wikidata is the most promissing tool of the Wikimedia Foundation and there is as far as I know no path forward. Obviously it affects people in what they do it affects the projects that are not progressing as fast as they could or should. Even when there is a notion of improved performance it is easily missed because of the pent up demand for much more power. Power to query and power to edit the data. We are not sharing in the sum of all knowledge when it is this hard to make it available.
Thanks,
      GerardM

Saturday, February 01, 2020

Prof Salimata WADE - some thoughts

This picture of professor Wade implies that she received multiple awards, her dress is particular to members of the National Academy of Sciences and Techniques of Senegal and multiple medals show. I added her to Wikidata but the data is sparse, it is better than what is there for most members of this academy of sciences.

In Wikidata we standardise names by having surnames at the end and they have a capital at the end. The result is Salimata Wade not Salimata WADE as you may find on many African websites..

When you google for professor Wade, it is easy to realise that she is quite notable.. It is easy even when you don't get much from French. There is work of professor Wade to find in Wikidata but attributing her work takes too much effort. She does not have an ORCiD id nor a Google Scholar ID. It is only because of googled texts that you feel safe to use quickstatements for what you find. It is super slow going but it is what you do when you expose what is possible.

Adding her papers should affect a change in two places on my African Science scaffolds, Wikipedia administrators permitting, the Listeria bot seems to be blocked for whatever reason.. Then again, other pages using the same bot are not..

When you consider the ratio of males / females it is 64 / 8. When you consider the ratio of Wikipedia articles I expect a quite different ratio. I do not know how to effectively make us a ration of US or UK scientists and compare that with African scientists. One reason is that I typically do not add nationality and I know the flaws in attributing a nationality to US scientists.. Whatever approach, Africa will show to be underrepresented both in Wikipedia and Wikidata.. Without the scaffolding, the preliminary data, there is no data approach to this.. No data means no clue.

Anyway, for countries like Senegal it makes sense to add the scaffolding to the French Wikipedia..
Thanks,
        GerardM


Sunday, January 12, 2020

Science and Africa - what colloboration exists and how do we know?

As I am adding large amount of African scientists to Wikidata, I find that I have moved into a green field. A green field as far as Wikipedia and Wikidata are concerned.

To learn about how the information about African science evolves in Wikidata, I created Listeria lists that inform about universities by country, fellows/member of academies of science and members of African young science organisations.

What I produce is a scaffolding; basic information that enables. The information that I use from the Royal Society of South Africa for its fellows includes dates, other awards, employers and even dates of death. Slowly but surely more information is being added for these people and consequently you will also find for, for instance Rhodes University, more employees and additional papers (currently only 1385 papers for its 84 scholars are known).

A scholar like Tebello Nyokong, a Rhodes scholar, has 637 papers to her name. She is a world class scientist and has four Wikipedia articles to her name. All kinds of questions may be queried for her co-authors; the gender distribution, the organisations they represent, the nationality of the co-authors.

Obviously, African science is not well represented at this time. This is a reflection of how people perceive and value African science... In essence it reflects a bias of regular Wikimedia editors. The regular Wikimedia editors are in the west, they have no reason to consider African science but this is a bias. It is highly likely that it will be hard to get Wikipedia articles accepted for African scientists because of a lack of sources and probably a lack of this perceived Western relevance.

Adding one scientist at a time does not make much of a difference. When scientists are added as part of a SourceMD process, any and all scientists who have a public ORCiD profile are likely to get included in Wikidata. This is why so many African scientist are already known. When a notable scientist is then recognised as a recipient of an award, we may already know about the papers they authored.

The SourceMD process is no longer available. It coincides with a lack of resources at Wikidata so any and all resources used for science papers are now available to something else. Understandable, but the result is that I am no longer motivated to seek ORCiD identifiers and consequently, the process is increasingly broken.
Thanks,
      GerardM