Sunday, August 09, 2020

Keeping it simple for "Abstract Wikipedia"

Abstract Wikipedia is confusing to me; it is said to be about "articles in a language independent way". Articles are complicated because the expression in any language has to be consistent with the grammar, the diction, the vocabulary for that language. Wikipedia articles have one additional complication; once you start reading you may end up in a rabbit hole of wonderful stuff that grabs your attention.

Abstract Wikipedia covers all of Wikidata and that is much more than what all Wikipedias combined cover. Currently there are two items for every item with a Wikipedia link. The first objective that seems obvious is to have something to say about each item. It can be as little as **Name** is a **human**. When we know his profession **Name** is a **chemist**. When an award was won, "**Name** is a **chemist**. The **Award** was received in **year**." Patterns like these are similar for every language.

This minimal approach is the basis for automated descriptions and are vital when disambiguating. It is an improvement over manual descriptions because they do not get updated when new information becomes available. Automated descriptions are not articles; they have to be descriptive and not describing.

When a Wikipedia articles exist, they provide a rich source of information when new texts are to be generated. Given that Abstract Wikipedia is based on Wikidata, a tool like "Concept Cloud" is useful because it shows all the links to other articles and how often they occur in an article (Concept Cloud is part of Reasonator). The challenge will be to model such relations in Wikidata OR allow for these relations to be registered in a new way as part of Abstract Wikipedia.

Once sufficient information is available, an article can be generated. That is what LSJBOT and the Cebuano Wikipedia are famous for. It follows that once the same amount of data is available for a similar subject in Wikidata, an article can be generated in for instance Cebuano. When we recreate these templates, we can update them for any language. 

The linguists who theorise Abstract Wikipedia to death, can apply their magic and find if their pet theories hold water in the real world. In Abstract Wikipedia their function is to enable the provision of information in any language. Obviously competing theories may be implemented and as a result the underlying technology may evolve.

Thanks, GerardM

Saturday, August 01, 2020

Commissioners for Tanzanian Regions

Aggrey Mwanri is one of the 31 commissioners for a Tanzanian Region. The Tabora Region has a population of 2,291,623 inhabitants. For most of the 31 regions we know at least one commissioner and only for the Arusha Region we know "them all". 

I have been adding information about these Regional Commissioners and this is from a quality point of view a step in the right direction. Slowly but surely we know for more African countries structures and politicians.

When you compare African countries with "Western" countries, such structures are comparable. This makes it possible to show the extend the data in Wikidata does not represent the African reality. 

It is more than likely that there are lists of the data that is currently missing. These lists help us provide the bare bones of what it takes to know about African countries. 

So who are the data wizards who show where we our data is lacking. Where are the lists that enable the people who know tools like OpenRefine to fill in the gaps. Who has the pictures so that a Wikipedia article for a Mr Mwanri is illustrated??
Thanks,
       GerardM

Sunday, July 26, 2020

Data in Red - A holistic view on the bias for the English language and for AngloAmerican subjects

First a definition; "When data is biased, we mean that the sample is not representative of the entire population". This approach successfully underpins the Women in Red project currently a percentage of 18.51% women in English Wikipedia has been achieved. Compare the coverage of Anglo-American politicians with the politicians from the whole of Africa, the bias in the data at Wikidata is already obvious, it will then have numbers attached to it.

This is not a problem for Wikidata alone and yes, we can have a project and include a lot of data to get to a growth percentage as we did for the Women in Red. Worthwhile in its own right but in this way we do not forge a closer relation with its "premier brand Wikipedia". It would be mere stamp collecting.

The best argument for having data in Wikidata is that it is used. This is done in self selecting Wikipedias through global info boxes and lists. Interwiki links are used on every Wikipedia. Integrating the necessary functionality is a meta/technical affair and firmly for the Wikimedia Foundation to own. 

The functionality to make this happen implements an existing idea with additional twists.
  • Pictures for the subject are linked to courtesy of Special:MediaSearch
  • Automated descriptions are provided in every language to aid disambiguation. At first the functionality by Magnus is used and it is to be replaced with improved descriptions provided by Abstract Wikipedia
  • A Reasonator like display is provided to inform on the data we have on an item.
  • Suggestions for the inclusion in categories and lists are provided based on Wikidata definitions for categories and lists.
  • To help people find sources, alternate sources, Scholia is included when there are papers about the subject. Once existing citations are available, they are an additional resource
In essence this is a toolset that you can opt into as an individual and/or it is the standard for a project. Particularly for the smaller projects this will prove to be really valuable; it will prevent false friends, it indicates heavily linked items that do not have an article. It stimulates the addition of labels because it is beneficial in finding illustrations. 

This proposal is relatively low tech and it will bring our many communities together by providing widely the information that is available to us.
Thanks,
     GerardM

Thursday, July 23, 2020

What to love in English Wikipedia

This list of commissioners of the Arusha Region is great, it provides the basic information that enables me to include this information in Wikidata. It can be assumed that they are all from Tanzania, politicians and human as well. 

What I love in English Wikipedia are lists like this. It is more than likely that for every Tanzanian region there will be a similar list and as a consequence we can include all these fine politicians to Wikidata, list them in whatever Wikipedia.

As more politicians for Tanzania or any other African country are added, politicians will pop up who have held multiple offices. This will be explicit in Wikidata and in Wikipedia you could use Special:WhatLinksHere.

Technically there is not much stopping us from associating red links with Wikidata items. This is the same guy used in the "WhatLinksHere" and you find him in this list that is a work in progress as well. 

Think this through.. With lists like this in any Wikipedia, these people are findable, linkable. It will be possible to state in text what a given commissioner did and, there will be no ambiguity because of the link. 

So I love English Wikipedia for the rich resource of information it is. I love its editors who provide us with the information that enables the reuse of data. I will rejoice when it is recognised that we can do much more. When we accept that together, as an ecosystem, we are in a position where we actually share the sum of all knowledge that is available to us.
Thanks,
        GerardM

Tuesday, July 21, 2020

What to do to counter an institutional bias of the Wikimedia Foundation (part 2)

Our aim is to share the sum of the knowledge available to us with everyone, everywhere, in every language. That is what we are to achieve.

As we establish what we, as a movement, are to do, it follows that we need to measure how well we do. When a community does not play an active part for a particular goal, that too will show in the numbers.

Commons does not need to work in English only. The "Special:MediaSearch" works in all the languages we support. With this search engine enabled on every Wikipedia, we will learn how well it gets adopted in  all our languages. We will know if new Wikidata labels are used in searches on Commons. We will know if more diversity is realised in the pictures used in Wikipedia. We will know how many pictures are downloaded and from what languages.

Only in the Portuguese Wikipedia we find the governors of Mozambican provinces only in text. We can include them in Wikidata, make Listeria lists for them, but how do we disambiguate these politicians. What does it take to make the information for them usable for "abstract Wikipedia"?  How do we assemble information about countries like Mozambique and how do we get it to the quality level that some expect? As important, how do we get people from Mozambique interested and involved? 

Some Wikipedians opine that the Wikimedia Foundation does not need to raise funding for their project. Arguably this is correct, but we can raise funds for other projects, other languages elsewhere because we have more and other ambitions to realise. As we raise more money outside of the USA, more people will gain a sense of ownership. 

When we are to overcome our bias for English and our bias for Wikipedia, we need to market our other languages, our other projects. We need key performance indicators.. For Wikisource, how many books were downloaded. For Commons how many media files were downloaded and from what language.

Results need to be objective and measurable. As our research proves to have been about English Wikipedia we have a problem. We seriously need to consider to what extend it is applicable.
Thanks,
      GerardM

NB While the bias is real and the relationship with English Wikipedians is often antagonistic, it is important to recognise  English Wikipedia as the source for much of the information that ends up in other projects. When we collaborate more, our available data will reach more people in an informative way.

Saturday, July 18, 2020

What to do to counter an institutional bias of the Wikimedia Foundation (part 1)

The bias for Wikipedia as a project is strong, the bias for English makes it worse. When our aim is to share the sum of all knowledge, we have to acknowledge this and consider the consequences and allow for potential remedies.

"Bias" is a loaded word. When you read the Wikipedia article it is only negative. Dictionaries give more room an example: "our strong bias in favor of the idea". The Wikimedia Foundation is considering rebranding and it explicitly states that it seeks a closer relation with its premier brand Wikipedia. 

This is a published bias. It follows that other projects do not receive the same attention, do not get the same priority. For me it is obvious that as a consequence the WMF could do better when it intends to "share in the sum of all available knowledge" let alone the knowledge that is available to it.

Arguably another more insidious bias is the bias for English, particularly the bias for the English Wikipedia. Given that the proof of the pudding is in the eating, we have a world wide public and the use for our information hardly grows. Research is done on English Wikipedia so in effect we arguably do not even know what we are talking about.

When we are to do better, it means that we be need to be free to discuss our biases, present arguments and even use the arguments or publications of others to make a point. The COO of the WMF states in the context of diversity in tech and media that "when the bonus of executives relies on diversity, diversity will happen". It is reasonable to use this same argument. When the bonuses for executives of the WMF rely on the growth in all our projects, it stands to reason that they will make the necessary room for growth. When one of the best Wikipedians says "There are only a limited number of projects that the WMF can take on at any time, and this wouldn't have been my priority", this demonstrates a bias against the other projects. Arguably the WMF has never really, really, really supported other projects, it does not market them, it does not support them, they exist because the MediaWiki software allows for the functionality. 

When we are to counter the institutional bias of the WMF, we have to be able to make the case, present arguments and ask for the WMF to accept the premise and consider suggestions for change. This proves to be an issue and makes our biases even more intractable.
Thanks,
       GerardM

Sunday, July 12, 2020

Telling the story of governors of Mozambique

As part of my Africa project I look for political positions like Presidents, Prime Ministers, Ministers and now also Governors. I started with provinces et al because a South African minister of health of a province was considered to be not notable enough.

With Wikilambda or if you wish "Abstract Wikipedia" being a thing it is important to consider how the story is told. The bare bones of a story already shows in Reasonator. Most of the Mozambican governors are new to Wikidata. They have a position of  "governor of their state", a start and end date and as applicable a predecessor and a successor. Obviously they are politician and Mozambican.

This time I had to go for the Portuguese Wikipedia for a source. There is a list mixed with colonial governors and they need to fit a different mold. They are Portuguese and arguably they are not politicians but administrators. 

What I am eager to learn is how Wikilambda will be able to tell these stories. How it will expand the stories as more is known. I wonder if a tool like ShEx will play a role. Anyway, good times.
Thanks,
      GerardM