Sunday, July 26, 2020

Data in Red - A holistic view on the bias for the English language and for AngloAmerican subjects

First a definition; "When data is biased, we mean that the sample is not representative of the entire population". This approach successfully underpins the Women in Red project currently a percentage of 18.51% women in English Wikipedia has been achieved. Compare the coverage of Anglo-American politicians with the politicians from the whole of Africa, the bias in the data at Wikidata is already obvious, it will then have numbers attached to it.

This is not a problem for Wikidata alone and yes, we can have a project and include a lot of data to get to a growth percentage as we did for the Women in Red. Worthwhile in its own right but in this way we do not forge a closer relation with its "premier brand Wikipedia". It would be mere stamp collecting.

The best argument for having data in Wikidata is that it is used. This is done in self selecting Wikipedias through global info boxes and lists. Interwiki links are used on every Wikipedia. Integrating the necessary functionality is a meta/technical affair and firmly for the Wikimedia Foundation to own. 

The functionality to make this happen implements an existing idea with additional twists.
  • Pictures for the subject are linked to courtesy of Special:MediaSearch
  • Automated descriptions are provided in every language to aid disambiguation. At first the functionality by Magnus is used and it is to be replaced with improved descriptions provided by Abstract Wikipedia
  • A Reasonator like display is provided to inform on the data we have on an item.
  • Suggestions for the inclusion in categories and lists are provided based on Wikidata definitions for categories and lists.
  • To help people find sources, alternate sources, Scholia is included when there are papers about the subject. Once existing citations are available, they are an additional resource
In essence this is a toolset that you can opt into as an individual and/or it is the standard for a project. Particularly for the smaller projects this will prove to be really valuable; it will prevent false friends, it indicates heavily linked items that do not have an article. It stimulates the addition of labels because it is beneficial in finding illustrations. 

This proposal is relatively low tech and it will bring our many communities together by providing widely the information that is available to us.

Thursday, July 23, 2020

What to love in English Wikipedia

This list of commissioners of the Arusha Region is great, it provides the basic information that enables me to include this information in Wikidata. It can be assumed that they are all from Tanzania, politicians and human as well. 

What I love in English Wikipedia are lists like this. It is more than likely that for every Tanzanian region there will be a similar list and as a consequence we can include all these fine politicians to Wikidata, list them in whatever Wikipedia.

As more politicians for Tanzania or any other African country are added, politicians will pop up who have held multiple offices. This will be explicit in Wikidata and in Wikipedia you could use Special:WhatLinksHere.

Technically there is not much stopping us from associating red links with Wikidata items. This is the same guy used in the "WhatLinksHere" and you find him in this list that is a work in progress as well. 

Think this through.. With lists like this in any Wikipedia, these people are findable, linkable. It will be possible to state in text what a given commissioner did and, there will be no ambiguity because of the link. 

So I love English Wikipedia for the rich resource of information it is. I love its editors who provide us with the information that enables the reuse of data. I will rejoice when it is recognised that we can do much more. When we accept that together, as an ecosystem, we are in a position where we actually share the sum of all knowledge that is available to us.

Tuesday, July 21, 2020

What to do to counter an institutional bias of the Wikimedia Foundation (part 2)

Our aim is to share the sum of the knowledge available to us with everyone, everywhere, in every language. That is what we are to achieve.

As we establish what we, as a movement, are to do, it follows that we need to measure how well we do. When a community does not play an active part for a particular goal, that too will show in the numbers.

Commons does not need to work in English only. The "Special:MediaSearch" works in all the languages we support. With this search engine enabled on every Wikipedia, we will learn how well it gets adopted in  all our languages. We will know if new Wikidata labels are used in searches on Commons. We will know if more diversity is realised in the pictures used in Wikipedia. We will know how many pictures are downloaded and from what languages.

Only in the Portuguese Wikipedia we find the governors of Mozambican provinces only in text. We can include them in Wikidata, make Listeria lists for them, but how do we disambiguate these politicians. What does it take to make the information for them usable for "abstract Wikipedia"?  How do we assemble information about countries like Mozambique and how do we get it to the quality level that some expect? As important, how do we get people from Mozambique interested and involved? 

Some Wikipedians opine that the Wikimedia Foundation does not need to raise funding for their project. Arguably this is correct, but we can raise funds for other projects, other languages elsewhere because we have more and other ambitions to realise. As we raise more money outside of the USA, more people will gain a sense of ownership. 

When we are to overcome our bias for English and our bias for Wikipedia, we need to market our other languages, our other projects. We need key performance indicators.. For Wikisource, how many books were downloaded. For Commons how many media files were downloaded and from what language.

Results need to be objective and measurable. As our research proves to have been about English Wikipedia we have a problem. We seriously need to consider to what extend it is applicable.

NB While the bias is real and the relationship with English Wikipedians is often antagonistic, it is important to recognise  English Wikipedia as the source for much of the information that ends up in other projects. When we collaborate more, our available data will reach more people in an informative way.

Saturday, July 18, 2020

What to do to counter an institutional bias of the Wikimedia Foundation (part 1)

The bias for Wikipedia as a project is strong, the bias for English makes it worse. When our aim is to share the sum of all knowledge, we have to acknowledge this and consider the consequences and allow for potential remedies.

"Bias" is a loaded word. When you read the Wikipedia article it is only negative. Dictionaries give more room an example: "our strong bias in favor of the idea". The Wikimedia Foundation is considering rebranding and it explicitly states that it seeks a closer relation with its premier brand Wikipedia. 

This is a published bias. It follows that other projects do not receive the same attention, do not get the same priority. For me it is obvious that as a consequence the WMF could do better when it intends to "share in the sum of all available knowledge" let alone the knowledge that is available to it.

Arguably another more insidious bias is the bias for English, particularly the bias for the English Wikipedia. Given that the proof of the pudding is in the eating, we have a world wide public and the use for our information hardly grows. Research is done on English Wikipedia so in effect we arguably do not even know what we are talking about.

When we are to do better, it means that we be need to be free to discuss our biases, present arguments and even use the arguments or publications of others to make a point. The COO of the WMF states in the context of diversity in tech and media that "when the bonus of executives relies on diversity, diversity will happen". It is reasonable to use this same argument. When the bonuses for executives of the WMF rely on the growth in all our projects, it stands to reason that they will make the necessary room for growth. When one of the best Wikipedians says "There are only a limited number of projects that the WMF can take on at any time, and this wouldn't have been my priority", this demonstrates a bias against the other projects. Arguably the WMF has never really, really, really supported other projects, it does not market them, it does not support them, they exist because the MediaWiki software allows for the functionality. 

When we are to counter the institutional bias of the WMF, we have to be able to make the case, present arguments and ask for the WMF to accept the premise and consider suggestions for change. This proves to be an issue and makes our biases even more intractable.

Sunday, July 12, 2020

Telling the story of governors of Mozambique

As part of my Africa project I look for political positions like Presidents, Prime Ministers, Ministers and now also Governors. I started with provinces et al because a South African minister of health of a province was considered to be not notable enough.

With Wikilambda or if you wish "Abstract Wikipedia" being a thing it is important to consider how the story is told. The bare bones of a story already shows in Reasonator. Most of the Mozambican governors are new to Wikidata. They have a position of  "governor of their state", a start and end date and as applicable a predecessor and a successor. Obviously they are politician and Mozambican.

This time I had to go for the Portuguese Wikipedia for a source. There is a list mixed with colonial governors and they need to fit a different mold. They are Portuguese and arguably they are not politicians but administrators. 

What I am eager to learn is how Wikilambda will be able to tell these stories. How it will expand the stories as more is known. I wonder if a tool like ShEx will play a role. Anyway, good times.

Sunday, July 05, 2020

The quality of all the Nigerian governors at @Wikidata

There are lists for all the governors of all the current Nigerian states. They exist on many Wikipedias. The information was known to be incomplete and based on lists on the English Wikipedia, I added information on Wikidata and as a result these lists may update with better data.

Obviously, when you copy data across to another platform, errors will occur. Sometimes it is me, sometimes it is in the data. I have only indicated when a governor was in office and predecessors and successors. 

The data is provided in a way that makes it easy to query; no information on elections (many governors were not elected) but proper start and end dates. The dates are as provided on the Wikipedia lists, articles for a governor are often more precise. People from Nigeria often are known by different names, I did add labels where I needed them for my disambiguation. 

When you want to know how many of these fine gentlemen are still alive, it will take some effort to kill of those who are still walking around according to Wikidata. It is relevant to know if a governor was elected or not. To do that properly you want to include election data elsewhere; there is no one on one relation between a position, elected officials and them being in office.

There is plenty to improve on the data. When people do, Listeria lists will update. Maybe someone will consider updating the English Wikipedia lists.

Saturday, July 04, 2020

Abstract Wikipedia, telling a story from available data

For me Reasonator is the best tool for Wikidata. It shows the data for a Wikidata item in an informative way. In my approach I am "deficit focused"; I add information for subjects that are not well represented. Additional information such as dates and successors make the information for Nigerian state governors more complete and it shows in Reasonator and Listeria lists.

Abstract Wikipedia, the new Wikimedia project is possible because of all the data in Wikidata. People who know the structure of a language will build constructs that present information in natural language. This is awesome because it will help us share widely in the sum of all available knowledge.

The objective of the Wikipedia projects has always been to share in the sum of all available knowledge. As more languages support the constructs needed for "Abstract Wikipedia", what we have in Wikidata will mushroom and evolve. It is because the data gets a purpose and, the data will be made to fit this purpose. 

The best part, Wikipedians want to tell stories and it only takes one person to add a bit of information to make a difference in the constructs for every language. My expectation is that as constructs become available for the languages of Nigeria, it will no longer be me who adds information on Nigerian politicians. It will be people from Nigeria. For them it will be Abstract Wikipedia that will show the data in an informative way.

Friday, July 03, 2020

Black representation matters, the Congressional Black Caucus

A friend asked me to help bolster the notability of black scientists. I was told of a "black caucus" with chairs and a list would help. I googled and found a black caucus with chairs and we did not know them at Wikidata. They were the chairs of the Congressional Black Caucus. Maybe not the caucus intended but of such a prominence that I added them all.

These are only the leaders and obviously over time the membership of the Congressional Black Caucus changed with the different elections. Someone else may add the data. 

The information I used could be found on English Wikipedia and is part of the article about the Congressional Black Caucus. Typically, when a position is considered important enough, it has its own article. When it does, it has more relevance and more information is available about the relevance and the history of such a position.

When Black representation matters, you want substantial lists and articles both on Wikidata and Wikipedia.