Monday, September 14, 2020

A new tool implies changes for me .


A list like this is wonderful and it has always been a list where I either only import existing office holders or attempt to "do them all".  Typically I did the first few, making a point to include the incumbent. 

I added this template {{PositionHolderHistory|id=Q**}} to all the items for an office in my African politicians project.  You find the template on the talk page and like the Listeria lists, they show past and present office holders.

I still prefer my method of including the "red links" in Wikidata but it is a wiki and there is so much more to do.. What I started to do with office holders from Togo is that for those I will not link to predecessors and successors, I will at least show the dates they were in office. 

It looks much better in Listeria too. 

Thanks, GerardM

Thursday, September 10, 2020

Амама Мбабази is a politician who held multiple positions in the Ugandan government

Amama Mbabazi is a former Prime Minister of Uganda and held many other governmental positions. There are 19 Wikipedia articles for him, Mr Mbabazi is notable. 

When you want to find a picture for him and you know his name in Russian, you can use Special:MediaSearch. You can also find him with اماما_مبابازى

One of the positions Mr Mbabazi held is Justice Minister of Uganda. English Wikipedia has a category for these Ministers, at this time two people are included but not Mr Mbabazi. There are ten Ministers missing in the category. Mind you, it is English Wikipedia that has a list that made my work at Wikidata possible!

On the talkpage for the Wikidata item for Justice Minister of Uganda, I added the {{PositionHolderHistory}} template. A bot updates the information every day and it adds comments on the quality of the information. This makes it easy to add positions of interest to a watchlist. 

On my Africa project you find Listeria lists for African political positions. It is duplicated to several Wikipedias and once a Wikipedia is synchronised it will show information like the lists that include Mr Mbabazi. 

One day all the data will be complete and up to date. In the meantime it is a "work in progress" and you are kindly invited to check the information out, find its shortcomings and make updates where necessary.



Saturday, August 29, 2020

Proposal for the liberal use of data from the Wikipedias and Wikidata

Every Wikimedia project has information available that could be shared with other Wikimedia projects. Data is incomplete in every project and the objective of this proposal is to indicate missing data so that it may be included.

It starts with a category. This category links to six English Wikipedia articles. Using a tool, that information is now available in Wikidata as well. As this information is further enriched, it is found that one article should be included in the category.

The category exists on many Wikipedias, in the defintion of the category it is known what content the category contains in Wikidata. Reasonator shows the information with an inbuilt query. When you check the article with a list, it is obvious that many articles do not have a category entry. The latest entry in the list is known to Wikidata thanks to the Latin Wikipedia..

The proposal is simple. Have a messaging agent that indicates missing categories on articles. This will enable any Wikipedian to add them. For Wikidata we would import data based on the definition of categories. The process would be enabled per defined category.

  • Nothing happens on a Wikipedia without prior agreement
  • The mechanism used is by default one of signalling and not of updating 
  • It follows existing practice for importing data from Wikipedias into Wikidata

Sunday, August 23, 2020

Having a conversation about the usefulness of shared data

Once there is a stalemate, where positions are entrenched, there is only sniping and little progress. At the English Wikipedia they are adamant; they do not want automatic changes from Wikidata. As a result there is little or no progress making effective use of the information that is at all the Wikipedias and Wikidata. There is room for improvement, improvement that will benefit both English Wikipedia and Wikidata.

Let me explain with an example. In the Gambia they have foreign ministers. Great information can be found at this English article. There is also an incomplete category, incomplete because not all the foreign ministers with an article are included. 

When somebody enters the data for Gambian foreign ministers in Wikidata, the result is best shown using Reasonator. Reasonator show it best because you can have it show in any and all languages. That is quite relevant because there may be lists in other languages.. like in German for instance. The German list has only one red link, the English list has five and the Reasonator list, once completed, will have none. 

When you summarise the state of play for lists of position like this, the presentation of these lists differs greatly while the content is by definition the same. When you want to spare both the cabbage and the goat, it takes extra moves. The Reasonator information for the category shows 24 entries and categories in four languages. It is easy to test if all the articles linked have a category entry for each language and also if Wikidata knows these people for the position they hold. 

We do not have to put Wikidata in "your face" like we would do with automatically changing infoboxes. Having a system that indicates that attention is needed is a first step for getting used to shared information. Information that comes from all Wikimedia projects and has Wikidata as its intermediary.

Sunday, August 09, 2020

Keeping it simple for "Abstract Wikipedia"

Abstract Wikipedia is confusing to me; it is said to be about "articles in a language independent way". Articles are complicated because the expression in any language has to be consistent with the grammar, the diction, the vocabulary for that language. Wikipedia articles have one additional complication; once you start reading you may end up in a rabbit hole of wonderful stuff that grabs your attention.

Abstract Wikipedia covers all of Wikidata and that is much more than what all Wikipedias combined cover. Currently there are two items for every item with a Wikipedia link. The first objective that seems obvious is to have something to say about each item. It can be as little as **Name** is a **human**. When we know his profession **Name** is a **chemist**. When an award was won, "**Name** is a **chemist**. The **Award** was received in **year**." Patterns like these are similar for every language.

This minimal approach is the basis for automated descriptions and are vital when disambiguating. It is an improvement over manual descriptions because they do not get updated when new information becomes available. Automated descriptions are not articles; they have to be descriptive and not describing.

When a Wikipedia articles exist, they provide a rich source of information when new texts are to be generated. Given that Abstract Wikipedia is based on Wikidata, a tool like "Concept Cloud" is useful because it shows all the links to other articles and how often they occur in an article (Concept Cloud is part of Reasonator). The challenge will be to model such relations in Wikidata OR allow for these relations to be registered in a new way as part of Abstract Wikipedia.

Once sufficient information is available, an article can be generated. That is what LSJBOT and the Cebuano Wikipedia are famous for. It follows that once the same amount of data is available for a similar subject in Wikidata, an article can be generated in for instance Cebuano. When we recreate these templates, we can update them for any language. 

The linguists who theorise Abstract Wikipedia to death, can apply their magic and find if their pet theories hold water in the real world. In Abstract Wikipedia their function is to enable the provision of information in any language. Obviously competing theories may be implemented and as a result the underlying technology may evolve.

Thanks, GerardM

Saturday, August 01, 2020

Commissioners for Tanzanian Regions

Aggrey Mwanri is one of the 31 commissioners for a Tanzanian Region. The Tabora Region has a population of 2,291,623 inhabitants. For most of the 31 regions we know at least one commissioner and only for the Arusha Region we know "them all". 

I have been adding information about these Regional Commissioners and this is from a quality point of view a step in the right direction. Slowly but surely we know for more African countries structures and politicians.

When you compare African countries with "Western" countries, such structures are comparable. This makes it possible to show the extend the data in Wikidata does not represent the African reality. 

It is more than likely that there are lists of the data that is currently missing. These lists help us provide the bare bones of what it takes to know about African countries. 

So who are the data wizards who show where we our data is lacking. Where are the lists that enable the people who know tools like OpenRefine to fill in the gaps. Who has the pictures so that a Wikipedia article for a Mr Mwanri is illustrated??

Sunday, July 26, 2020

Data in Red - A holistic view on the bias for the English language and for AngloAmerican subjects

First a definition; "When data is biased, we mean that the sample is not representative of the entire population". This approach successfully underpins the Women in Red project currently a percentage of 18.51% women in English Wikipedia has been achieved. Compare the coverage of Anglo-American politicians with the politicians from the whole of Africa, the bias in the data at Wikidata is already obvious, it will then have numbers attached to it.

This is not a problem for Wikidata alone and yes, we can have a project and include a lot of data to get to a growth percentage as we did for the Women in Red. Worthwhile in its own right but in this way we do not forge a closer relation with its "premier brand Wikipedia". It would be mere stamp collecting.

The best argument for having data in Wikidata is that it is used. This is done in self selecting Wikipedias through global info boxes and lists. Interwiki links are used on every Wikipedia. Integrating the necessary functionality is a meta/technical affair and firmly for the Wikimedia Foundation to own. 

The functionality to make this happen implements an existing idea with additional twists.
  • Pictures for the subject are linked to courtesy of Special:MediaSearch
  • Automated descriptions are provided in every language to aid disambiguation. At first the functionality by Magnus is used and it is to be replaced with improved descriptions provided by Abstract Wikipedia
  • A Reasonator like display is provided to inform on the data we have on an item.
  • Suggestions for the inclusion in categories and lists are provided based on Wikidata definitions for categories and lists.
  • To help people find sources, alternate sources, Scholia is included when there are papers about the subject. Once existing citations are available, they are an additional resource
In essence this is a toolset that you can opt into as an individual and/or it is the standard for a project. Particularly for the smaller projects this will prove to be really valuable; it will prevent false friends, it indicates heavily linked items that do not have an article. It stimulates the addition of labels because it is beneficial in finding illustrations. 

This proposal is relatively low tech and it will bring our many communities together by providing widely the information that is available to us.

Thursday, July 23, 2020

What to love in English Wikipedia

This list of commissioners of the Arusha Region is great, it provides the basic information that enables me to include this information in Wikidata. It can be assumed that they are all from Tanzania, politicians and human as well. 

What I love in English Wikipedia are lists like this. It is more than likely that for every Tanzanian region there will be a similar list and as a consequence we can include all these fine politicians to Wikidata, list them in whatever Wikipedia.

As more politicians for Tanzania or any other African country are added, politicians will pop up who have held multiple offices. This will be explicit in Wikidata and in Wikipedia you could use Special:WhatLinksHere.

Technically there is not much stopping us from associating red links with Wikidata items. This is the same guy used in the "WhatLinksHere" and you find him in this list that is a work in progress as well. 

Think this through.. With lists like this in any Wikipedia, these people are findable, linkable. It will be possible to state in text what a given commissioner did and, there will be no ambiguity because of the link. 

So I love English Wikipedia for the rich resource of information it is. I love its editors who provide us with the information that enables the reuse of data. I will rejoice when it is recognised that we can do much more. When we accept that together, as an ecosystem, we are in a position where we actually share the sum of all knowledge that is available to us.

Tuesday, July 21, 2020

What to do to counter an institutional bias of the Wikimedia Foundation (part 2)

Our aim is to share the sum of the knowledge available to us with everyone, everywhere, in every language. That is what we are to achieve.

As we establish what we, as a movement, are to do, it follows that we need to measure how well we do. When a community does not play an active part for a particular goal, that too will show in the numbers.

Commons does not need to work in English only. The "Special:MediaSearch" works in all the languages we support. With this search engine enabled on every Wikipedia, we will learn how well it gets adopted in  all our languages. We will know if new Wikidata labels are used in searches on Commons. We will know if more diversity is realised in the pictures used in Wikipedia. We will know how many pictures are downloaded and from what languages.

Only in the Portuguese Wikipedia we find the governors of Mozambican provinces only in text. We can include them in Wikidata, make Listeria lists for them, but how do we disambiguate these politicians. What does it take to make the information for them usable for "abstract Wikipedia"?  How do we assemble information about countries like Mozambique and how do we get it to the quality level that some expect? As important, how do we get people from Mozambique interested and involved? 

Some Wikipedians opine that the Wikimedia Foundation does not need to raise funding for their project. Arguably this is correct, but we can raise funds for other projects, other languages elsewhere because we have more and other ambitions to realise. As we raise more money outside of the USA, more people will gain a sense of ownership. 

When we are to overcome our bias for English and our bias for Wikipedia, we need to market our other languages, our other projects. We need key performance indicators.. For Wikisource, how many books were downloaded. For Commons how many media files were downloaded and from what language.

Results need to be objective and measurable. As our research proves to have been about English Wikipedia we have a problem. We seriously need to consider to what extend it is applicable.

NB While the bias is real and the relationship with English Wikipedians is often antagonistic, it is important to recognise  English Wikipedia as the source for much of the information that ends up in other projects. When we collaborate more, our available data will reach more people in an informative way.

Saturday, July 18, 2020

What to do to counter an institutional bias of the Wikimedia Foundation (part 1)

The bias for Wikipedia as a project is strong, the bias for English makes it worse. When our aim is to share the sum of all knowledge, we have to acknowledge this and consider the consequences and allow for potential remedies.

"Bias" is a loaded word. When you read the Wikipedia article it is only negative. Dictionaries give more room an example: "our strong bias in favor of the idea". The Wikimedia Foundation is considering rebranding and it explicitly states that it seeks a closer relation with its premier brand Wikipedia. 

This is a published bias. It follows that other projects do not receive the same attention, do not get the same priority. For me it is obvious that as a consequence the WMF could do better when it intends to "share in the sum of all available knowledge" let alone the knowledge that is available to it.

Arguably another more insidious bias is the bias for English, particularly the bias for the English Wikipedia. Given that the proof of the pudding is in the eating, we have a world wide public and the use for our information hardly grows. Research is done on English Wikipedia so in effect we arguably do not even know what we are talking about.

When we are to do better, it means that we be need to be free to discuss our biases, present arguments and even use the arguments or publications of others to make a point. The COO of the WMF states in the context of diversity in tech and media that "when the bonus of executives relies on diversity, diversity will happen". It is reasonable to use this same argument. When the bonuses for executives of the WMF rely on the growth in all our projects, it stands to reason that they will make the necessary room for growth. When one of the best Wikipedians says "There are only a limited number of projects that the WMF can take on at any time, and this wouldn't have been my priority", this demonstrates a bias against the other projects. Arguably the WMF has never really, really, really supported other projects, it does not market them, it does not support them, they exist because the MediaWiki software allows for the functionality. 

When we are to counter the institutional bias of the WMF, we have to be able to make the case, present arguments and ask for the WMF to accept the premise and consider suggestions for change. This proves to be an issue and makes our biases even more intractable.

Sunday, July 12, 2020

Telling the story of governors of Mozambique

As part of my Africa project I look for political positions like Presidents, Prime Ministers, Ministers and now also Governors. I started with provinces et al because a South African minister of health of a province was considered to be not notable enough.

With Wikilambda or if you wish "Abstract Wikipedia" being a thing it is important to consider how the story is told. The bare bones of a story already shows in Reasonator. Most of the Mozambican governors are new to Wikidata. They have a position of  "governor of their state", a start and end date and as applicable a predecessor and a successor. Obviously they are politician and Mozambican.

This time I had to go for the Portuguese Wikipedia for a source. There is a list mixed with colonial governors and they need to fit a different mold. They are Portuguese and arguably they are not politicians but administrators. 

What I am eager to learn is how Wikilambda will be able to tell these stories. How it will expand the stories as more is known. I wonder if a tool like ShEx will play a role. Anyway, good times.

Sunday, July 05, 2020

The quality of all the Nigerian governors at @Wikidata

There are lists for all the governors of all the current Nigerian states. They exist on many Wikipedias. The information was known to be incomplete and based on lists on the English Wikipedia, I added information on Wikidata and as a result these lists may update with better data.

Obviously, when you copy data across to another platform, errors will occur. Sometimes it is me, sometimes it is in the data. I have only indicated when a governor was in office and predecessors and successors. 

The data is provided in a way that makes it easy to query; no information on elections (many governors were not elected) but proper start and end dates. The dates are as provided on the Wikipedia lists, articles for a governor are often more precise. People from Nigeria often are known by different names, I did add labels where I needed them for my disambiguation. 

When you want to know how many of these fine gentlemen are still alive, it will take some effort to kill of those who are still walking around according to Wikidata. It is relevant to know if a governor was elected or not. To do that properly you want to include election data elsewhere; there is no one on one relation between a position, elected officials and them being in office.

There is plenty to improve on the data. When people do, Listeria lists will update. Maybe someone will consider updating the English Wikipedia lists.

Saturday, July 04, 2020

Abstract Wikipedia, telling a story from available data

For me Reasonator is the best tool for Wikidata. It shows the data for a Wikidata item in an informative way. In my approach I am "deficit focused"; I add information for subjects that are not well represented. Additional information such as dates and successors make the information for Nigerian state governors more complete and it shows in Reasonator and Listeria lists.

Abstract Wikipedia, the new Wikimedia project is possible because of all the data in Wikidata. People who know the structure of a language will build constructs that present information in natural language. This is awesome because it will help us share widely in the sum of all available knowledge.

The objective of the Wikipedia projects has always been to share in the sum of all available knowledge. As more languages support the constructs needed for "Abstract Wikipedia", what we have in Wikidata will mushroom and evolve. It is because the data gets a purpose and, the data will be made to fit this purpose. 

The best part, Wikipedians want to tell stories and it only takes one person to add a bit of information to make a difference in the constructs for every language. My expectation is that as constructs become available for the languages of Nigeria, it will no longer be me who adds information on Nigerian politicians. It will be people from Nigeria. For them it will be Abstract Wikipedia that will show the data in an informative way.

Friday, July 03, 2020

Black representation matters, the Congressional Black Caucus

A friend asked me to help bolster the notability of black scientists. I was told of a "black caucus" with chairs and a list would help. I googled and found a black caucus with chairs and we did not know them at Wikidata. They were the chairs of the Congressional Black Caucus. Maybe not the caucus intended but of such a prominence that I added them all.

These are only the leaders and obviously over time the membership of the Congressional Black Caucus changed with the different elections. Someone else may add the data. 

The information I used could be found on English Wikipedia and is part of the article about the Congressional Black Caucus. Typically, when a position is considered important enough, it has its own article. When it does, it has more relevance and more information is available about the relevance and the history of such a position.

When Black representation matters, you want substantial lists and articles both on Wikidata and Wikipedia.

Sunday, June 28, 2020

@Wikipedia and freedom of speech

When you disagree on Wikipedia with current practices, you have to use stilted language to prevent administrators taking offence and blocking your account. 

At this time many articles of black female scientists have been marked for deletion. It is an organised effort because there are lists subdividing these articles on criteria. For the record, for many of these fine scientists I added content on Wikidata, added all kinds of information including awards.

When I learned that the article for Ayana Jordan was marked for deletion, I added the following protest: "Keep I want to stress that those !@##$ who make these proposals should be ashamed. Thanks, GerardM (talk) 05:10, 27 June 2020 (UTC)". The response came quick: "@GerardM: unlike some others here, yours is not a new account. So you should need no reminding that personal attacks and assumptions of bad faith are forbidden here. —" I replied with: " I did not use any swear words, I did express my opinion of the people who are so detrimental to what Wikipedia should stand for. That is not bad faith that is not a personal attack that is expressing revulsion. Thanks, GerardM (talk) 06:01, 27 June 2020 (UTC)". The conversation was taken elsewhere, I was blocked for a day.

For a Wikipedia administrator, it should be no news that these people who are repressive of what is not their cup of tea are widely resented. Marking articles for deletion is a form of harassment. I do not care who proposed the deletion, I do not know the person who marked Avana's article for deletion and I do not care to know him and his ilk. We have a situation where harassment is allowed and calling out such travesties is considered a personal attack and an assumption of bad faith. 

So I have been blocked for a day. I am proud to stand up against such bullies. I consider the process of deletion as rigged. These !@##$ are free to do as they wish because "we should assume good faith". Hell no.

Saturday, June 27, 2020

Hey @Wikimedia lets move the needle

The Wikimedia projects are biased. They favour only one language, the English language. When you look at Wikipedia traffic English Wikipedia is something like 50% and it does not represent 50% of our intended public. 

The objective is to improve the usefulness of the other projects and thereby increase their traffic. That is, more articles and books are read, more pictures are seen and downloaded.

Lets pick one language, Yoruba, as an example. There are currently 32,624 pages in its Wikipedia. There are some 40 million people speaking the language. So what can we do for Yoruba editors and readers. How can we track what makes a difference and also what makes a difference and what can the WMF do to achieve this.

* We can improve list support. 
Currently the best support for supporting lists in a Wikipedia is "Listeria". It is supported by Magnus.. Listeria lists have been shown to be more up to date then manual lists on English Wikipedia, for less resourced projects this will be even more true. When existing lists can be easily included in an article, it will expand available information hugely.. Here an example of Listeria lists on the Yoruba Wikipedia. Content of these lists show in Yoruba. Lists are better supported and adopted when it is WMF supported functionality.

* Choosing pictures for illustration
When people look for a picture, they have to goto Commons or they visit Wikipedia articles on the same subject and use these same pictures. When the Special:MediaSearch is available as a tool from every Wikipedia article, a much richer palette of pictures becomes available to choose from. (The search is for "Agbègbè Ìjọba Ìbílẹ̀ Mushin")..

The cool thing is, when this tool is available when writing an article, it is easy to more pro-actively add labels to Wikidata. This will improve the performance for the Special:MediaSearch even more.

What would truly support Special:MediaSearch is disambiguation. It is unreasonable to expect that we get descriptions in all the 300+ languages we support. What Reasonator supports are automated descriptions. It makes it easy and obvious to choose the right item in any language.

For the Wikimedia Foundation to support other languages, for it to move the needle on any and all languages, we need to measure what is meaningful. The number of searches by Special:MediaSearch and what language was used. The number of pictures used in each Wikipedia. The effect lists have on the writing of new articles.

When we did not measure such numbers so far, it is what we should do to move the needle. One needle is the total number of reads quite another is the number of reads for each project. Same for the use of Wikidata and Commons.

Sunday, June 21, 2020

Marketing @Wikimedia but first some SWOT analysis

The Wikimedia Foundation has a 2030 strategy, it intends to increase its reach, increase its budget and rename projects into "Wikipedia something" in order to improve its visibility.. 

Wikipedia is one of the most visited websites on the Internet, its quality is good and is mostly edited by older white males in the first world. Typically when people mention Wikipedia they refer to the English version but it is only 50% of Wikimedia traffic. From a marketing point of view the English market is saturated, growth can be expected from Wikipedias in other languages and from other projects.

The Wikimedia Foundation is very much tied to the United States. Given the current regime and the possibility that it will prevail in November, this reliance is an existential threat. It is likely that the US government will want to intervene in Wikimedia content after 2020. I doubt it is possible, given the current hardware configuration, to move away from the US and still serve the rest of the world with a NPOV.

At this time the Wikimedia Foundation is centrally led, there are satellite organisations in many countries who are limited in what they can do; their budgets are centrally managed. Fundraising is mostly done from the USA and most of it is raised in the USA. That is problematic in its own right because many "Wikipedians" feel that too much money is raised, money not needed to support their project and people in other countries do not get to feel that it is "their" project because of "their" contributions. As a professional fundraiser, I am convinced contributions from the Netherlands could increase at least tenfold within a year.

The bias for English is huge and it is compounded by the bias for English Wikipedia. At a conference a Dutch professor stated that research not about or linked to the English Wikipedia is unlikely to get published. It follows that the data used for the 2030 strategy includes this same bias. The MediaWiki software is developed first and foremost for English Wikipedia and it is expected to work for other languages and for other projects. There used to be a development team specialised in language technology.. it was dissolved. 

There was a time when English Wikipedia did support the other projects. Because of an anti Wikidata stance by some this changed. There is no solution for false friends and lists are not as well maintained as they could be. When we link to the Wikidata item for an article and no longer to a title for that same article this will change. It is easy enough to build functionality that allows for both and by opt-in projects will understand the benefit and choose to adopt.

When marketing is the reason for changing the name of projects, it is important to consider the ramifications. The "Wikipedians" among us claim ownership of the Foundation, insist on actions in their image. They represent a staid community representing a saturated market. With a strategy in place it is possible to disregard them. This makes only sense when the WMF tackles its bias for English as a priority. This is what is needed to realise the 2030 strategy.

Sunday, June 14, 2020

@Wikipedia is old news, it could point to new sources

Wikipedia provides the best text on many subjects. It being static is both a blessing and a curse. It is a blessing when it is a topic that is very much in the public eye, it attracts many people willing to edit and come to a neutral point of view.

It is a curse when the topic is no longer popular. No longer is there an interest to maintain the information, new publications are not integrated in what used to be a neutral point of view.

In the references section of an article you find the underpinnings of what is stated in an article. It may be newspaper articles or science papers. Both newspapers and science have a hard time attracting attention and this endangers the availability of quality sources for future updates.

In Scholia information is continuously updated about the latest papers by authors and or about subjects. As time goes by, papers become available dated later than the latest reference. When such papers are clearly marked, it is an invitation for the Wikipedia community to revisit a subject and learn if what was a neutral point of view survives as such vis a vis the latest information.

Every subject should have its own Scholia.

Friday, June 12, 2020

Professor Vassie Ware - an early recipient of an early career award

On the page of the "GlobalYoungAcademyTeam", you find many young academies. You also find "early career awards". These young academies, these awards are represented by "Listeria lists", when something changes in Wikidata it is reflected in them.

The WICB Junior Award is an early career award for women conferred by the American Society for Cell Biology. An English Wikipedia article provided the initial content for this award and given that there are many people interested in what this award is about, I included all recipients of the award. Professor Ware is the earliest recipient I added by hand.

The standardised Listeria lists, show the people who are included, it shows their occupation, identifiers for ORCiD, Google Scholar and VIAF and it shows the number of publications known for them. The approach is a wiki approach and it is therefore fine that we only have two publications for Professor Ware, we do not have a freely licensed picture of professor Ware yet and, there is no Wikipedia article either.

Once a list is reasonably complete, new information is added all the time. It follows that the Scholia page for the award and for a scientist like Prof Ware evolve. In the true wiki spirit, a structure is provided and anyone who cares to makes a difference. A difference for the understanding of science and for the people who make science what it is.

Thursday, June 04, 2020

@Wikimedia and languages - @WikiCommons search, the most relevant development since @Wikidata

The Wikimedia Foundation is important for the support of languages on the Internet. The localisation of its software is done at, it is done in over 300 languages.

The milestones for multilingual support are:
These milestones have been very much technology driven. For me the one reason why Wikidata became the success it is, is because it was from the start linked to every subject covered by Wikipedia and the solution was so overwhelmingly superior that nobody could reasonably object.

To make a success of this latest milestone, institutional support is needed. It is for the Wikimedia Foundation, its movement to reduce its bias for English and make room for improved language support.

My way of phrasing this as an essential objective: "All of is available to every single person on the planet". As we adopt this as our objective, it is first and foremost about making Special:MediaSearch useful in any and all of our languages and make it available from any and all of our Wikipedias.

As we adopt this, it is essential that priority is given to multilingual search over special interests including GLAM, Open Data, SPARQL and what have you. Priority when we are to open up in multiple languages first. Special interest only gain relevance when it is made obvious how it helps it helps open up Commons in Swahili, Hindi, German or Vietnamese.

Special:MediaSearch is possible because of everything that went before.. Its functionality is part of MediaWiki and localised at The existing search engine is now linked to the labels for items in Wikidata and it was made public after Hay Kranen brought us his proof of concept. It became available warts and all and while finding منصور اعجاز  in Punjabi is huge, it is not great when you do not find cats because a user is called Kočka..

The challenge to us as an organisation, a movement are we willing to work on our existing bias, open up Commons in all the languages we are said to support and accept that our hobby horses will get attention not in the next but in a future iteration.

Thursday, May 28, 2020

@WikiCommons - Sarah T. Roberts versus Sarah T. Roberts

I have a renewed interest in Commons because the first steps have been made to make it actually useful. According to Wikidata there are two distinct Sarah T. Roberts. One is an epidemiologist the other is into information & media studies.

At Commons it was a mess, the picture of Sarah was used to illustrate an info box of the other Sarah. It is not that interesting to tell you how I did what. Relevant is that I did. I did because you will will find things when there is a label for whatever in "your" language..

Given that we do not research the use of Commons or Wikidata for that matter, why should the WMF give priority to opening up Commons even further? After all, there is no data to support it..

Tuesday, May 26, 2020

@WikiCommons - Meanwhile in a school in India, Japan, Russia

These students in India have to do a project. The subject is Botswana. Their teacher wants them to find many pictures so he searched Wikimedia Commons among others for pictures of  Mokgweetsi Masisi, the president of Botswana. He marked the pictures that depicts Mr Masisi and now his pupils will find more pictures of him when they look for मोकेगसेसी मासी.

At the same time in Japan students have to do a project about Botswana. Their teacher is pleasantly surprised when he find so many pictures for モクウィツィ・マシシ...

Monday, May 25, 2020

@WikiCommons - meanwhile in a different universe

And again there was a discussion that it should not be this hard to find pictures in Commons. The big difference this time is that there is now a wealth of images that have been tagged for what they "depict". They are linked to Wikidata items and they have a wealth of labels in many, many languages. In essence it has always been an objective of Wikidata to share its content in any and all of the 300+ languages supported by a Wikipedia.

The ideas that floated around soon made it into a "proof of concept" and as so often it actually worked after a fashion. The first iteration was in true Wikimedia tradition English only. The proof of concept got its second language in Dutch, Hay Kranen the developer is Dutch. Now there are nine languages and we are waiting for French to be the tenth.

So what does it do. You can look for pictures in Commons, it has 61 million media files, and when you are looking for available pictures in your language, you will find it as long as Wikidata has a label in your language.  This is for instance a result in Japanese and this is the result in German.

What can you do to make it better? Add labels in your language for the things you want to find and find media files that depicts what you are looking for. When nobody translated the software in your language, you can even do that.

Why is this so relevant? Have you ever wondered how many pictures you find in one of the smaller languages using Google or Bing? Let me tell you, it is disappointing to be polite. Commons is the repository of the mediafiles that illustrate all the Wikipedias so yes, it covers "almost anything".

The Wikimedia Foundation has this big strategy for its movement to be inclusive. This is a wonderful opportunity to show how agile it is, that it understands and supports a need that has been expressed for many many years. The beauty is the the way forward has been expressed in something that already works.

ABSOLUTELY, there will be challenges in integrating this functionality where it fulfills a need.

Luckily it is not necessary for it all to be done in one go. The first step can be as little as to take the "proof of concept" an rewrite it in the preferred language of the WMF, internationalise and localise it and keep it stand alone for now. The people who know about it will use it and they will be the first to point out what more they want to be done. A priority will be to retain its KISSable nature.

The objective is to open up Commons. Open it up in any and all languages. For me it is obvious. I will gladly give it my attention in the expectation that both Wikidata and Commons actually find a public, have a purpose that is more than what we do for ourselves.

Sunday, May 03, 2020

These scientists saw the coronavirus coming. Now they're trying to stop the next pandemic before it starts.

When you read an article with the same title as this blog post, it is one among many clamoring for attention. There is so much that can be qualified as not worth your time. In this blogpost I describe my way of adding value for articles that I think are worthwhile.

What I do is look for people in the article. In this article it is a Jonathan Epstein. The first thing is to look for Jonathan in Wikidata. Disambiguation is the name of the game and, finding candidates who might be Jonathan is the first step. Jonathan proved to be Jonathan H Epstein, there was also a Jonathan H. Epstein. Because of sharing characteristics they could be merged. Vital in this are authority identifiers and links to papers that make it reasonable to assume that they are the same person. It is helpful when Jonathan is part of the disambiguation list when people look for "Jonathan Epstein" so it is added as an alias.

The next step is to enrich the data about Jonathan P.. Authorities may identify where he works and from the website of Columbia university additional information is digested into Wikidata statements, information like the alma maters. In Wikidata many authors are only known as "author name strings", meaning they are only known as text. With available tooling, papers are linked to Q88406948, the identifier for our Jonathan.

After these steps, there is a reasonable impression of the relevance of Jonathan as a scholar and this supports the likelihood that the article that cites him can be trusted. Do this for others presented as authorities in an article and by repeating the process you provide a way for Wikidata to become a source that helps identify fake news.

Sunday, April 19, 2020

@Wikimedia interconnection, what it looks like for me

On twitter a reference was made to an article in the Sunday Times. The article is about the response of the UK government to the COVID-19 pandemic. It mentions many people and mentions their roles.

It is up to you to have your own opinion, but most if not all people are known in Wikidata, some have a Wikipedia article and all of them are in the spotlight. So when you get an edited sound bite, when you want to know if someone is "for real", it helps when you can turn to Wikimedia and find what there is to know.

This sound bite about "herd immunity" is too short to be properly understood. The argument made is that herd immunity is all that we have now that the genie is out of the bottle and, who can argue with that? Read the article as well.. After some tinkering, the Scholia for Prof Edmunds shows some 235 papers, many co-authors and still, even more co-authors are missing. The subjects he covered are extensive.. check out that Scholia. Prof Edmunds takes/tool part in UK government deliberations; it is mentioned in that Sunday Times article. He is asked to explain epidemiology to the public.

Wikimedia interconnection for me is to enrich our existing knowledge in cases like this. Tweeting about it, blogging about it may lead to even more and better information like a Wikipedia article. What we as Wikimedians do does not happen in a vacuum, connecting to what happens and who the players are help us and our readers understand who they are in  these early days of the COVID-19 pandemic.

Monday, April 13, 2020

The CDC and its National Center for Immunization and Respiratory Diseases

Because of the COVID-19 pandemic, there is so much attention to every aspect of it; the epidemiology, virology, vaccination, co-morbidity. Mix it with a heady mix of economics, profiteering and graft and what are you to think of it all. What is fact and what is not.

When I read that there is an "Outbreak Management Team" in the Netherlands, an advisory body to the Dutch government, I had a look. I added all the known scientists to Wikidata, looked for "authority identifiers" and attributed some of the papers that are likely theirs to them. It generated a really nice Scholia for them and the team as well.

At first I wanted to do similar European organisations but it takes quite some effort to find them. So I took the easy route and went for the CDC. Its organisational chart contains a wealth or smaller orgs among the the NCIRD and it has its own organisational chart. I did the same routine, adding the obvious scientists to Wikidata, looked for the authority identifiers for them, attributed papers.

The best bit? While adding people one at a time, you see how the Scholia evolves. Authors are reordered based on their number of papers, you find the ones that are co-authors and colleagues. The latest papers are shown first.. It is nice. However, this is management only, I cannot wait and see it evolve as staff finds its place in the Scholia as well.

Sunday, April 12, 2020

False friends and ListeriaBot - finding a way out of an impasse

ListeriaBot is a bot that maintains lists based on information in Wikidata. In this blogpost I will explain what a Listeria list is, what it is used for. I will point out its qualitative benefits and explain how Listeria can be instrumental to limit bias, stimulate collaboration and help us share in the sum of the knowledge available for us.

The heart of a Listeria list is a query. In this query it is defined what data is retrieved from Wikidata, it includes the order of presentation and shows this information in a language depending on the availability of labels.

Listeria lists are defined only once and every day a job run by the ListeriaBot updates all lists with the latest data from Wikidata. In this way available information is provided even when articles are still to be written. When there is an article to read, the label is shown in the upright position, when there is not is shows in cursive.

The biggest difference between a Wikipedia list and a Listeria list? No false friends. When you seek a specific "Rebecca Cunnigham", it is really powerful to know that your Prof Cunningham will always be known as Q77527827 and is also authoritatively known by other identifiers. From a qualitative point of view, particularly in lists, red links even blue links such disambiguation is a big thing. At this time a typical Wikipedia list has an error rate because of disambiguation issues of around 4%. I frequently blogged about this, the Listeria list I often referred to is for the George Polk award.

Maintenance is another reason to choose for Listeria lists. This was documented by Magnus, a list was maintained up to a point in time as a Listeria list and for all the wrong reasons human qualities were to prevail. Magnus compared the results after some time and the human maintained list proved to be the poorly maintained list.

Categories are lists of a kind, for many categories it is defined what they contain. Consequently Wikidata is easily updated from Wikipedias and can serve as a source for updating categories as well.

Ok, the impasse. ListeriaBot is blocked because of a false friend issue. The objective is to find a resolution that will benefit us all. The false friend issue is that images can have a same name in both Wikimedia Commons and in English Wikipedia. The existing algorithm for showing pictures is that local pictures take precedence. When ListeriaBot is to do things differently, it can. Thanks to the wikidatification at Commons, we can indicate with a Wikidata identifier what a picture "depicts". Wikidatification of images can also be introduced for pictures at English Wikipedia and it is then becomes easy to always show what Commons has unless a preference is given to show a specific image for a particular project.

I have been told that I do not assume good faith. When I see the extend people care to go to resolve this issue I am only amused. The objective of what we do is share in the sum of all knowledge and do this in a collaborative way.

English Wikipedia fails spectacularly by assuming that their perceived consensus is in the best interest of what we aim to achieve. There is no reflection on the quality brought by Listeria, there is no reflection on how its quality can substantially be improved. I fail to understand what they achieve except for feeling safe by insisting on dated practices and dated points of view.

I wish we could be one community that is known by a best of breed effort with one common goal; sharing the sum of all the knowledge that is available to us.

Friday, April 10, 2020

When crossing the street in the days of Corona, look left, right and left again

Many of us are at home, waiting to go out. We are all obsessed with the latest statistics and read what pundits have to say.  It is likely that you are cognizant of the statistics for your country, state or county.

I learned that Jonathan P. Tennant died in a traffic accident. When you care for statistics, you will wonder what are my chances of dying in a traffic accident at this time. Deduct it from your chances of dying of Corona and things look up.

Not so much for Protohedgehog, he met with an accident. It is sad, he was young, full of promise; just became a member of the Global Young Academy. If anything, it serves as a reminder for us to look left, right and left again to not become a bus factor.

Sunday, April 05, 2020

Edwin G. Abel aka Ed Abel

Professor E.G. Abel came on my radar because he is a recipient of the Daniel X. Freedman Award. He has a Wikipedia article as "Ed Abel" and the information of the award has him as "Edwin G. Abel".

I looked into the Freedman award because of a criticism on the Wikipedia article of Professor Montegia. The superior article of Prof Montegia is criticised because it is an orphan. It now has a Scholia template and that links the 105 scholarly papers known in Wikidata. Its timeline does include the Freedman award linking the Professors Abel en Montegia.

I doubt it is considered enough to remove the orphan template. I have added a redirect for the Freedman award to the issuing organisation. Maintaining a Wikipedia list is not one of my ambitions.. It could be a Listeria list like this one..

Sunday, March 15, 2020

#SwineFlue management with #wolves

A lot is being said about viruses and pandemics, they do not only exist in humans but also in animals particularly in kept animals. One knee jerk reaction is that by an outbreak of a disease animals in nature are blamed.

A good example is swine flue and African swine flue. It is a tradition to call for the culling, the extermination of wild boar and, traditionally the result is an increase in boar being killed.

A real solution may be found in an ecological solution, wolves who predate on boar prefer a sickly animal over a healthy animal that is better able to fight back. There is documentation of wolves determining the extend of outbreaks of a swine flue. Areas with wolves do better.

As an apex hunter the effects of wolves on its ecology are profound. There are all kinds of arguments why people oppose the reintroduction of animals that are essential for a functional ecology, animals like wild boar, beaver, wolf are extinct in places. We argue that we need more trees to offset climate change but this will not work when those trees are not placed in a functioning ecology. In Scotland trees will not grow because they will be eaten by overabundant elk.. Scotland has no functioning ecology it lacks predators like wolves and lynx to keep the elk in check.

When we consider pandemics, viral diseases, our ecology it is important to consider our own effects. We will do better when we enable ecological functionality and consider building with nature for more sustainable results.

Wednesday, March 04, 2020

@Wikipedia; the dread that is one identity that binds us all

On Twitter Janeen Uzzell praised a blogpost that is the Wikimedia Foundation All Hands: 2020 Sketchbook and indeed it informs about current thinking, most of it is great and still, I find it absolutely terrifying.

There are several great sketches in there. Katherine Maher gave an asperational talk, I love it for Wikimedia to be seen as infrastructural, inclusive and even that that what we do does not have to be in our projects. Important is that she mentions "support systems" because they provide the input for much of our processes.

Important is the page on security and risk. All the important concepts are mentioned among them; likelihood, relative impact and management preparedness but also "plan for and mitigate risks".

What truly makes me uneasy is when it is said that we aim to clarify who we are in the world in one brand, Wikipedia. The idea is that when we are all branded as Wikipedia, things are likely to become easier. When you check out the website there is no argument; Wikipedia is free knowledge. When you check out what it is to do
  • project and improve our reputation
  • support our movement/growth
  • be opt-in
In the abstract Wikipedia IS wonderful, in reality the concept of what Wikipedia is, is largely determined by the English Wikipedia. It it is fiercely independent, it is hardly inclusive and it has largely determined the maneuvering space the Wikimedia Foundation has. In order to "plan for and mitigate risks", I will mention several reasons why I am anxious because of this branding initiative.
  • In the Commons OTRS they use English Wikipedia notions to determine if pictures can stay or are to be removed. Commons provides a service to all Wikimedia projects
  • The query functionality for Commons is maintained by people from the Foundation. For more than half a year it puts a strain on the growth and usefulness of Wikidata. Tools have become glacially slow and often malfunction because an edit is not available when needed in further processing. It is not known what the position of the WMF director is in this
  • This is about marketing and we have never done much marketing for any of our projects. What we have done was reactive and has been all about the English Wikipedia. Now consider this:
    • Wikisource, we do not know what is available at what quality, it is all about editing and not about having people read the finished article, consequently we do not value Wikisource and fulfill its potential.
    • So far Commons has always been English only. With the support of the "depicts" functionality, there is room to enable and market  a multilingual search engine. In the spirit of "it is a Wiki", it serves as an open invite to add labels in any and all of our languages and open up what Commons has to offer. It is how to market free content the Wiki way.
    • In Wikidata we know many more concepts than what we know in any individual Wikipedias. We could use our data and inform as we have done for years in multilingual tools like Reasonator. This is an example in English Russian Chinese and Kannada. NB it takes additional labels to improve results and consequently this is the inclusive approach.
    • When Wikipedians were willing to reflect on their own performance, we could help them solve their false friends issues.
One sketch in the sketchbook is a presentation by Jess Wade. It says that even Academia is biased. As the Wikimedia community we do not need to be subservient to any bias and most certainly not the bias that Wikipedia has brought us.

Tuesday, March 03, 2020

"Building with Nature" .. a case for a beaver solution

The Markermeer is a lake with an ecological problem; the water is cloudy, plants and mussels do not grow. In order to alleviate that problem, the Marker Wadden was developed and in order to future proof the Houtribdijk the same "building with nature" concepts are used; the extensive water features will enable the growth of plants and the intended result is not only that the water will be clear again but also that the dyke will better withstand future storms.

With ecology part of the solution, it is relevant to appreciate ecology as part of a solution for open issues. There are two open issues: geese and willows. So far, geese are kept at bay at some areas with fences and young willows are being rooted out by volunteers.

When willows are allowed to grow, they will mature quickly and enable the next ecological succession. The wood and bark provides food and building material for beavers and this makes for an even more robust defense against storm damage. Some trees will mature anyway and this provides natural nesting places for white tailed eagles. Given that the wels catfish is endemic in the Markermeer, it will find its place among the Marker wadden and it may even predate on the over abundant geese.

So given that Natuurmonumenten, the organisation looking after the Marker Wadden is happy about beavers in its terrains, maybe it is the "building with nature" engineers who have to consider succession in their deliberations.

Thursday, February 27, 2020

Balancing arguments - Gender and the #Wikimedia projects

Some say, gender is important because there is a serious imbalance in the reporting on people in Wikipedia. There are many people who dedicate their time to bring some balance by writing Wikipedia articles. At the same time it is important to be cognizant of the fact that gender is not binary; the point it brings is that when you write an article you need a source to know what gender a person identifies with.

So far so good. At Wikidata other things are at play. It is vital to understand that Wikidata items are not so much about an individual, an item. When recipients of an award are included like for "Member of the Hassan II Academy of Sciences and Technologies". There is often nothing more than Moroccans that received an award because a source says so. Determining a gender relies on googling for images of the person and when the name is decidedly male like Omar, Hakim, Mustapha the gender is implied.

Why include a gender? Because projects like Women in Red rely on prospects to write articles about. Because tools like Scholia do express what we know about all the recipients of an award.. It tells us that there are currently two ladies known and 22 gentlemen. We know nothing of their work because the bias against Africa is staggering and because performance for inclusion at Wikidata is abysmal.

The arguments why we should not include gender is often based on what people expect; "Wikidata contains large sets of data and consider that it makes no statistical difference one way or the other". The reality however is that when you consider the use of data in for instance Scholia, the subsets are small. One more fine lady makes a statistical difference.

When people write about a person for a Wikipedia, they do get to know the person, they have multiple sources at hand. At Wikidata not so much. One purpose of adding people is to nibble away at our bias.

Requiring sources to indicate gender is what takes away the usefulness of the data and is counter productive when we are talking bias. For me it is a Wikipedia argument, an article based argument and it is counter productive to translate it to the set based approach of Wikidata.

Saturday, February 15, 2020

Wikipedia consensus? - It is who you ask but what are the facts

An article in VICE starts as follows: 'Wikipedia consensus is that an unedited machine translation, left as a Wikipedia article, is worse than nothing'. This article is problematic in so many ways, it starts with this premise because the Cebuano Wikipedia does not contain machine translation. It contains machine generated text and, to add insult to injury this same article states: 'the majority (generated articles) are surprisingly well constructed'.

An article like this can be sanity checked. Principles come first;
  • This is about a Wiki in contrast to the Nupedia approach. 
  • Wikipedia’s founding goal is to make knowledge freely available online in as many languages as possible.
  • There is a difference between opinions and facts
It is important how arguments are made. When "highly trusted users who specialize in combating vandalism" are introduced and comment that "many articles are created by bots", it does not follow that the quality is low nor that this is to be considered vandalism but the implication is made.

It is a fact that the Cebuano Wikipedia has 5,378,563 articles and also that there are some 16.5 million people who understand Cebuano. There is however no relation between these two facts. More relevant is that the wife of Sverker Johansson has Cebuano as her mother tongue and his two kids learn from their maternal cultural heritage also thanks to the work he does for the Cebuano Wikipedia. That is very much a classic Wiki approach.

In contrast the English Wikipedia has its bot policy preventing the use of bots for generating content. These notions should be local to the English Wikipedia and need not have relevance elsewhere. These highly trusted users can be expected to proselyte this point of view and thanks to this POV they take away a source of information without offering any credible alternative for the existing lack of information available to the rest of the world. At the same time the English Wikipedia is biased in the information it provides and does not provide the same quality of service for the domains selected for the Cebuano Wikipedia.

Sadly the Wikimedia Foundation itself makes no effective difference in support of the "other" languages it is said. An alternative to the LSJbot was introduced and it may be able to make a difference but as it does not provide a public facing service making it very much a paper tiger. Even worse are the Nupedia notions in the combination of two things: "Due to its heavy reliance on Wikidata entries, the quality of content produced is heavily influenced by the quality of the Wikidata available." and "It can discredit other Wikipedia entries related to automatic creation of content or even the Wikipedia quality.” These notions are problematic for several reasons.
  • No information is preferred over little information when our service to an end user is considered
  • Quality of information is framed in the light of existing Wikipedia entries. Whose Wikipedia entries are we considering? They are however irrelevant as our aim is to inform our end users; they do not cover the same subject.
  • When the quality is considered of Wikidata .. Why, it is a wiki and its quality is improving particularly as so many eyes shine their light on it.
  • We can inform, in any and all languages, and we do not even have to call it Wikipedia, we do not even have to save it in a Wikipedia when we only cache the results from the automated text generation.
  • When we cache results of automated text generation, texts can be generated again when the data is expanded or changed.
So far the critique of the VICE article, but then again does English not have its own problems?
  • Its 1,143 administrators and 137,368 active users are struggling to keep up, when you compare it with the 6 administrators and 14 active users for the Cebuano Wikipedia it is understandable that, as they grow, the English have to rely more and more on bots and artificial intelligence.
  • Magnus has demonstrated that the maintenance of lists is better served not by editors but by using the data from Wikidata
  • The Wikipedia technology has a problem with false friends. Arguably some 4% of list entries are wrong because the wrong article is linked to. When links are solidified by using Wikidata identifiers instead, this problem disappears in the same way as the problems with interwiki links disappeared.
The biggest problem "Wikipedia consensus" has is that it was formulated in the past by a tiny in-crowd making up the "accepted" big words for the rest of us and worse they can not be swayed from their POV by facts.