Saturday, August 29, 2020

Proposal for the liberal use of data from the Wikipedias and Wikidata

Every Wikimedia project has information available that could be shared with other Wikimedia projects. Data is incomplete in every project and the objective of this proposal is to indicate missing data so that it may be included.

It starts with a category. This category links to six English Wikipedia articles. Using a tool, that information is now available in Wikidata as well. As this information is further enriched, it is found that one article should be included in the category.

The category exists on many Wikipedias, in the defintion of the category it is known what content the category contains in Wikidata. Reasonator shows the information with an inbuilt query. When you check the article with a list, it is obvious that many articles do not have a category entry. The latest entry in the list is known to Wikidata thanks to the Latin Wikipedia..

The proposal is simple. Have a messaging agent that indicates missing categories on articles. This will enable any Wikipedian to add them. For Wikidata we would import data based on the definition of categories. The process would be enabled per defined category.

  • Nothing happens on a Wikipedia without prior agreement
  • The mechanism used is by default one of signalling and not of updating 
  • It follows existing practice for importing data from Wikipedias into Wikidata

Sunday, August 23, 2020

Having a conversation about the usefulness of shared data

Once there is a stalemate, where positions are entrenched, there is only sniping and little progress. At the English Wikipedia they are adamant; they do not want automatic changes from Wikidata. As a result there is little or no progress making effective use of the information that is at all the Wikipedias and Wikidata. There is room for improvement, improvement that will benefit both English Wikipedia and Wikidata.

Let me explain with an example. In the Gambia they have foreign ministers. Great information can be found at this English article. There is also an incomplete category, incomplete because not all the foreign ministers with an article are included. 

When somebody enters the data for Gambian foreign ministers in Wikidata, the result is best shown using Reasonator. Reasonator show it best because you can have it show in any and all languages. That is quite relevant because there may be lists in other languages.. like in German for instance. The German list has only one red link, the English list has five and the Reasonator list, once completed, will have none. 

When you summarise the state of play for lists of position like this, the presentation of these lists differs greatly while the content is by definition the same. When you want to spare both the cabbage and the goat, it takes extra moves. The Reasonator information for the category shows 24 entries and categories in four languages. It is easy to test if all the articles linked have a category entry for each language and also if Wikidata knows these people for the position they hold. 

We do not have to put Wikidata in "your face" like we would do with automatically changing infoboxes. Having a system that indicates that attention is needed is a first step for getting used to shared information. Information that comes from all Wikimedia projects and has Wikidata as its intermediary.

Sunday, August 09, 2020

Keeping it simple for "Abstract Wikipedia"

Abstract Wikipedia is confusing to me; it is said to be about "articles in a language independent way". Articles are complicated because the expression in any language has to be consistent with the grammar, the diction, the vocabulary for that language. Wikipedia articles have one additional complication; once you start reading you may end up in a rabbit hole of wonderful stuff that grabs your attention.

Abstract Wikipedia covers all of Wikidata and that is much more than what all Wikipedias combined cover. Currently there are two items for every item with a Wikipedia link. The first objective that seems obvious is to have something to say about each item. It can be as little as **Name** is a **human**. When we know his profession **Name** is a **chemist**. When an award was won, "**Name** is a **chemist**. The **Award** was received in **year**." Patterns like these are similar for every language.

This minimal approach is the basis for automated descriptions and are vital when disambiguating. It is an improvement over manual descriptions because they do not get updated when new information becomes available. Automated descriptions are not articles; they have to be descriptive and not describing.

When a Wikipedia articles exist, they provide a rich source of information when new texts are to be generated. Given that Abstract Wikipedia is based on Wikidata, a tool like "Concept Cloud" is useful because it shows all the links to other articles and how often they occur in an article (Concept Cloud is part of Reasonator). The challenge will be to model such relations in Wikidata OR allow for these relations to be registered in a new way as part of Abstract Wikipedia.

Once sufficient information is available, an article can be generated. That is what LSJBOT and the Cebuano Wikipedia are famous for. It follows that once the same amount of data is available for a similar subject in Wikidata, an article can be generated in for instance Cebuano. When we recreate these templates, we can update them for any language. 

The linguists who theorise Abstract Wikipedia to death, can apply their magic and find if their pet theories hold water in the real world. In Abstract Wikipedia their function is to enable the provision of information in any language. Obviously competing theories may be implemented and as a result the underlying technology may evolve.

Thanks, GerardM

Saturday, August 01, 2020

Commissioners for Tanzanian Regions

Aggrey Mwanri is one of the 31 commissioners for a Tanzanian Region. The Tabora Region has a population of 2,291,623 inhabitants. For most of the 31 regions we know at least one commissioner and only for the Arusha Region we know "them all". 

I have been adding information about these Regional Commissioners and this is from a quality point of view a step in the right direction. Slowly but surely we know for more African countries structures and politicians.

When you compare African countries with "Western" countries, such structures are comparable. This makes it possible to show the extend the data in Wikidata does not represent the African reality. 

It is more than likely that there are lists of the data that is currently missing. These lists help us provide the bare bones of what it takes to know about African countries. 

So who are the data wizards who show where we our data is lacking. Where are the lists that enable the people who know tools like OpenRefine to fill in the gaps. Who has the pictures so that a Wikipedia article for a Mr Mwanri is illustrated??