Monday, September 14, 2020

A new tool implies changes for me .


A list like this is wonderful and it has always been a list where I either only import existing office holders or attempt to "do them all".  Typically I did the first few, making a point to include the incumbent. 

I added this template {{PositionHolderHistory|id=Q**}} to all the items for an office in my African politicians project.  You find the template on the talk page and like the Listeria lists, they show past and present office holders.

I still prefer my method of including the "red links" in Wikidata but it is a wiki and there is so much more to do.. What I started to do with office holders from Togo is that for those I will not link to predecessors and successors, I will at least show the dates they were in office. 

It looks much better in Listeria too. 

Thanks, GerardM

Thursday, September 10, 2020

Амама Мбабази is a politician who held multiple positions in the Ugandan government

Amama Mbabazi is a former Prime Minister of Uganda and held many other governmental positions. There are 19 Wikipedia articles for him, Mr Mbabazi is notable. 

When you want to find a picture for him and you know his name in Russian, you can use Special:MediaSearch. You can also find him with اماما_مبابازى

One of the positions Mr Mbabazi held is Justice Minister of Uganda. English Wikipedia has a category for these Ministers, at this time two people are included but not Mr Mbabazi. There are ten Ministers missing in the category. Mind you, it is English Wikipedia that has a list that made my work at Wikidata possible!

On the talkpage for the Wikidata item for Justice Minister of Uganda, I added the {{PositionHolderHistory}} template. A bot updates the information every day and it adds comments on the quality of the information. This makes it easy to add positions of interest to a watchlist. 

On my Africa project you find Listeria lists for African political positions. It is duplicated to several Wikipedias and once a Wikipedia is synchronised it will show information like the lists that include Mr Mbabazi. 

One day all the data will be complete and up to date. In the meantime it is a "work in progress" and you are kindly invited to check the information out, find its shortcomings and make updates where necessary.



Saturday, August 29, 2020

Proposal for the liberal use of data from the Wikipedias and Wikidata

Every Wikimedia project has information available that could be shared with other Wikimedia projects. Data is incomplete in every project and the objective of this proposal is to indicate missing data so that it may be included.

It starts with a category. This category links to six English Wikipedia articles. Using a tool, that information is now available in Wikidata as well. As this information is further enriched, it is found that one article should be included in the category.

The category exists on many Wikipedias, in the defintion of the category it is known what content the category contains in Wikidata. Reasonator shows the information with an inbuilt query. When you check the article with a list, it is obvious that many articles do not have a category entry. The latest entry in the list is known to Wikidata thanks to the Latin Wikipedia..

The proposal is simple. Have a messaging agent that indicates missing categories on articles. This will enable any Wikipedian to add them. For Wikidata we would import data based on the definition of categories. The process would be enabled per defined category.

  • Nothing happens on a Wikipedia without prior agreement
  • The mechanism used is by default one of signalling and not of updating 
  • It follows existing practice for importing data from Wikipedias into Wikidata

Sunday, August 23, 2020

Having a conversation about the usefulness of shared data

Once there is a stalemate, where positions are entrenched, there is only sniping and little progress. At the English Wikipedia they are adamant; they do not want automatic changes from Wikidata. As a result there is little or no progress making effective use of the information that is at all the Wikipedias and Wikidata. There is room for improvement, improvement that will benefit both English Wikipedia and Wikidata.

Let me explain with an example. In the Gambia they have foreign ministers. Great information can be found at this English article. There is also an incomplete category, incomplete because not all the foreign ministers with an article are included. 

When somebody enters the data for Gambian foreign ministers in Wikidata, the result is best shown using Reasonator. Reasonator show it best because you can have it show in any and all languages. That is quite relevant because there may be lists in other languages.. like in German for instance. The German list has only one red link, the English list has five and the Reasonator list, once completed, will have none. 

When you summarise the state of play for lists of position like this, the presentation of these lists differs greatly while the content is by definition the same. When you want to spare both the cabbage and the goat, it takes extra moves. The Reasonator information for the category shows 24 entries and categories in four languages. It is easy to test if all the articles linked have a category entry for each language and also if Wikidata knows these people for the position they hold. 

We do not have to put Wikidata in "your face" like we would do with automatically changing infoboxes. Having a system that indicates that attention is needed is a first step for getting used to shared information. Information that comes from all Wikimedia projects and has Wikidata as its intermediary.

Sunday, August 09, 2020

Keeping it simple for "Abstract Wikipedia"

Abstract Wikipedia is confusing to me; it is said to be about "articles in a language independent way". Articles are complicated because the expression in any language has to be consistent with the grammar, the diction, the vocabulary for that language. Wikipedia articles have one additional complication; once you start reading you may end up in a rabbit hole of wonderful stuff that grabs your attention.

Abstract Wikipedia covers all of Wikidata and that is much more than what all Wikipedias combined cover. Currently there are two items for every item with a Wikipedia link. The first objective that seems obvious is to have something to say about each item. It can be as little as **Name** is a **human**. When we know his profession **Name** is a **chemist**. When an award was won, "**Name** is a **chemist**. The **Award** was received in **year**." Patterns like these are similar for every language.

This minimal approach is the basis for automated descriptions and are vital when disambiguating. It is an improvement over manual descriptions because they do not get updated when new information becomes available. Automated descriptions are not articles; they have to be descriptive and not describing.

When a Wikipedia articles exist, they provide a rich source of information when new texts are to be generated. Given that Abstract Wikipedia is based on Wikidata, a tool like "Concept Cloud" is useful because it shows all the links to other articles and how often they occur in an article (Concept Cloud is part of Reasonator). The challenge will be to model such relations in Wikidata OR allow for these relations to be registered in a new way as part of Abstract Wikipedia.

Once sufficient information is available, an article can be generated. That is what LSJBOT and the Cebuano Wikipedia are famous for. It follows that once the same amount of data is available for a similar subject in Wikidata, an article can be generated in for instance Cebuano. When we recreate these templates, we can update them for any language. 

The linguists who theorise Abstract Wikipedia to death, can apply their magic and find if their pet theories hold water in the real world. In Abstract Wikipedia their function is to enable the provision of information in any language. Obviously competing theories may be implemented and as a result the underlying technology may evolve.

Thanks, GerardM

Saturday, August 01, 2020

Commissioners for Tanzanian Regions

Aggrey Mwanri is one of the 31 commissioners for a Tanzanian Region. The Tabora Region has a population of 2,291,623 inhabitants. For most of the 31 regions we know at least one commissioner and only for the Arusha Region we know "them all". 

I have been adding information about these Regional Commissioners and this is from a quality point of view a step in the right direction. Slowly but surely we know for more African countries structures and politicians.

When you compare African countries with "Western" countries, such structures are comparable. This makes it possible to show the extend the data in Wikidata does not represent the African reality. 

It is more than likely that there are lists of the data that is currently missing. These lists help us provide the bare bones of what it takes to know about African countries. 

So who are the data wizards who show where we our data is lacking. Where are the lists that enable the people who know tools like OpenRefine to fill in the gaps. Who has the pictures so that a Wikipedia article for a Mr Mwanri is illustrated??

Sunday, July 26, 2020

Data in Red - A holistic view on the bias for the English language and for AngloAmerican subjects

First a definition; "When data is biased, we mean that the sample is not representative of the entire population". This approach successfully underpins the Women in Red project currently a percentage of 18.51% women in English Wikipedia has been achieved. Compare the coverage of Anglo-American politicians with the politicians from the whole of Africa, the bias in the data at Wikidata is already obvious, it will then have numbers attached to it.

This is not a problem for Wikidata alone and yes, we can have a project and include a lot of data to get to a growth percentage as we did for the Women in Red. Worthwhile in its own right but in this way we do not forge a closer relation with its "premier brand Wikipedia". It would be mere stamp collecting.

The best argument for having data in Wikidata is that it is used. This is done in self selecting Wikipedias through global info boxes and lists. Interwiki links are used on every Wikipedia. Integrating the necessary functionality is a meta/technical affair and firmly for the Wikimedia Foundation to own. 

The functionality to make this happen implements an existing idea with additional twists.
  • Pictures for the subject are linked to courtesy of Special:MediaSearch
  • Automated descriptions are provided in every language to aid disambiguation. At first the functionality by Magnus is used and it is to be replaced with improved descriptions provided by Abstract Wikipedia
  • A Reasonator like display is provided to inform on the data we have on an item.
  • Suggestions for the inclusion in categories and lists are provided based on Wikidata definitions for categories and lists.
  • To help people find sources, alternate sources, Scholia is included when there are papers about the subject. Once existing citations are available, they are an additional resource
In essence this is a toolset that you can opt into as an individual and/or it is the standard for a project. Particularly for the smaller projects this will prove to be really valuable; it will prevent false friends, it indicates heavily linked items that do not have an article. It stimulates the addition of labels because it is beneficial in finding illustrations. 

This proposal is relatively low tech and it will bring our many communities together by providing widely the information that is available to us.