Saturday, December 26, 2020

Wikicite: "No #rewilding in Wales" but why or why not?

No rewilding with a background of bracken

There is rewilding practice, there is rewilding science and there is rewilding controversy. Never mind how you define rewilding, it has been practices for a long time and rewilding practitioners want scientists to study the efficacy of their work. 

There are many papers that are about rewilding or touch on rewilding and as I am fascinated by rewilding, it is all too easy to concentrate on the papers and people that I like. So I am adding all the cited papers for "Abandoning or Reimagining a Cultural Heartland? Understanding and Responding to Rewilding Conflicts in Wales - the Case of the Cambrian Wildwood". The full text is available only as a PDF and as a consequence I have to google for every citation not yet in Wikidata. The result is a wealth of additional papers (they have a DOI), and new books etc have a mention on the Wikidata item.

As more papers on conservation, rewilding and its politics find their way in Wikidata, not only the Scholia for that paper evolves all the papers it touches have at least one "citing paper". The version of the tool that I use, SourceMD, is the first iteration of a tool that in a later iteration looked up authors from ORCiD ... Sadly no longer available. It has me use the author disambiguator to replace author strings with references to authors expanding the Scholia representations for authors and papers even more.

When you find scholarly papers interesting and you know what papers it cites, it should provide a mix op opinions balanced to make what is the point of the paper. At that it is comparable to a NPOV article in a Wikipedia. One provides original research and the other reflects on the available research.

What I do is in a wiki way.. When another paper takes my attention away, I leave a paper to eventually return to it. It is one reason why I work sequentially from top to bottom and add everything.

Thanks, GerardM

Wednesday, December 02, 2020

Wikicite from the ground up: attending a symposium on "rewilding"


Professor Dr Liesbeth Bakker has as a role to establish the science behind the rewilding practice. Professor Bakker has been appointed as the Special Professor of Rewilding Ecology at Wageningen UR in April 2020. Because of the Corona pandemic, her maiden speech is for now replaced with a one time symposium on the subject. Anyone can attend, it is wildly popular and you can attend to this one day event as well

Rewilding is being practices on a large scale, it did not happen overnight, there is a large amount of scholarly work that will be the basis of what Professor Bakker will consider in her own research.

What I can do for the occasion is make a Scholia for the conference? I can add more citations to papers, I can attribute papers to scholars. The bottom line for citations is that they are an invite to read more papers, to get to a better understanding of what is considered. Providing a better understanding of rewilding is the role Professor Bakker. When the public is to understand subjects like rewilding, ecology Wikipedia is the first place people go to. Its references can now link to Wikidata but from there it is still a huge step to a Scholia for a paper and better understanding of a paper.

When the subject matter of rewilding is to be more inclusively covered in Wikidata, it follows that the ecology of Wikidata has to play its role. It consists of items, properties and qualifiers. It has its data with  bots and users all iterating on. Applying this ecology for a purpose is a challenge.

As a paper like "Wild Steps in a semi-wild setting? Habitat selection and behavior of European bison reintroduced to an enclosure in an anthropogenic landscape" has no references in Wikidata, there is not much in its Scholia. Compare that to this paper where references exist to all the cited papers. Wikidata provides a rabbit hole that can help bring context to subjects, authors and papers. What to do for rewilding is very much a collective challenge.

Thanks, GerardM

Thursday, November 26, 2020

Wikicite from the ground up: understanding wildfires

The world has to deal with climate change and one of its effects is the severity of wildfires. One paper on the subject is "Can trophic rewilding reduce the impact of fire in a more flammable world?" It is only one paper, there are others. It caught my attention after watching YouTube about goats and blogging about rewilding..

When I found the paper, it did not have any "cites work" statements. In the PDF of the paper, references to 91 other works can be found. Text in a PDF is problematic; you may scrape the titles and search for a match but you won't find them in Wikidata even when they are there. This is because of missing spaces and special characters that are different in Wikidata. It has been a lot of work finding and linking many of the citations. 

The effect of these references on the Scholia for the paper is staggering. It demonstrates the power of open data; authors of the cited papers are shown, there is an accumulation of the papers they have in common. The associated subjects are shown and have their own weight. 

The papers informs that there are 91 cited papers, at this time only 54 papers have been linked. All of them have a title, a DOI. The Scholia presentation is the best we have for the paper but it is as a consequence incomplete. Why not have a "Cites work string"? Combined with attributes like "series ordinal" and "DOI" even "Main subject" it completes missing information for the paper. Bots can pick up on this, check Wikidata for the DOI, add the paper when we do not have it and even replace the string when it is with the process of checking and importing papers. 

When people take the effort of understanding a subject like "wildfires" and enrich important papers, the power of their work followed by the work done by bots opens up scholarly papers even more to the people who care to learn from the scholarly papers themselves.

Thanks, GerardM

Saturday, November 21, 2020

Wikicite from the ground up: a call to action

Many of the world's ecological systems are under thread. All kinds of reasons may be given and there is an interplay between the many causes that affect a downward spiral. Similar patterns can be found in many places and when these patterns are broken up, an ecological restoration may be one of the consequences.

I was called stupid when I said that water management is essential and suggested that beavers in California could do a lot of good. Beavers have been extinct in California for over 100 years, the many gullies in a watershed rush water straight to sea. Consequently the water table is not restored resulting is an increase in the risk of fires. So I am stupid, but at least I read some papers lately.

I follow the Mulloon Institute on Twitter, they have been restoring the watershed of the Mulloon Creek, this Australian project has been running for more than 10 years and apart from all the other benefits it restored profits to farmers by 60%. To underpin their results, they perform scientific research with the aim of convincing incredulous farmers and government to consider alternatives. This project is recognised by the United Nations as a research project. 

One of the papers they produced is about a possible reintroduction of the Green and Golden Bell Frog in the Mulloon catchment. I understand this paper to be about  an indicator species. From a scientific point of view, there are no issues, the paper is richly referenced and when you want to read scientific papers about the Green and Golden Bell Frog, check out its Scholia..

The problem is that like so many papers, it does not have a DOI and the best insurance for it being available in the future is the "Wayback Machine". It was not known there and it is now.. When papers are to be known to the general public, adding a paper like this to Wikidata is a next step

More can be done; for instance adding the references in the paper to the Wikidata item. Maybe there is an update about an introduction of the Green and Golden Bell Frog in the Mulloon Creek, who knows.. I did not find it.

Thanks, GerardM

Sunday, November 15, 2020

Wikicite from the ground up: references

When a point is made in Wikipedia, when a statement is made in Wikidata, best practice is to include a reference. The same is true in a scholarly paper, its references are typically found in a references section.

Wikicite is a project that brings many scholarly papers into Wikidata as beautiful as it is, it is a top down process. As an ordinary editor there is a lot that you can do to enrich the result.

The paper, "Can trophic rewilding reduce the impact of fire in a more flammable world?" has a DOI, the PDF includes a reference section. It takes a lot of effort to add the authors and papers it cites to Wikidata. The visibility of the paper improves and so does the visibility of the paper it cites. The Scholia shows that at this time, this paper is not used as a reference in Wikipedia. 

There is now a template that retrieves information from Wikidata for its reference data. It will be great when it is widely adopted because it provides an additional pathway from Wikipedia to the used references and the information relating to the reference.

So what can we do to improve on the quality of the data in Wikidata. First, the processes that import the bulk of new data are crucial, they are essential and need to be appreciated as such. The next part is enabling a community to improve the data. A recent paper explained what can be done with a top down approach. All kinds of decisions were made for us and the result feels like a one off project. 

When ORCID is considered to be our partner, it makes sense to invite people registered at ORCID to contribute to Wikidata. Their papers can be uploaded from ORCID into Wikidata, their co-authors and references can be linked by these people. As they do this while being logged into ORCID, we are assured because of their known personal involvement and use this as a reference.

The quality of such a reference is better than our current references that came with a link to an "author name string". Who knows that the disambiguation was correct? When a paper is linked to at least one known ORCID person with public information, we have a link we can verify and consequently it becomes a link we can trust. Once the link with a person with a ORCID identifier is established, we can ask to acknowledge the  changes that happen in his or her papers. Our quality is enhanced and a sense of community with ORCID is established.

Thanks, GerardM

Wikicite from the ground up: "Trophic rewilding"

In nature conservation, trophic rewilding and trophic cascades are important topics. When an animal like the howler monkey is no longer around, it no longer distributes the seeds of trees. The likely effect is that in time plants are no longer part of the ecosystem. Reintroducing a howler monkey restores the relation; it is considered an example of trophic rewilding.

At Wikipedia there is no article about trophic rewilding. As someone famously said, references are the most important part of a Wikipedia article, let's start with finding references.

There is a longstanding process of importing data about scholarly papers, all kinds of scholarly papers. Some of them have "trophic rewilding" in their title. Trophic rewilding was not known as a subject so it was easy enough to look for "trophic rewilding" and add it as a subject. Slowly but surely the Scholia representation evolves. More papers means more authors and more authors known to have collaborated on multiple publications. More citations are found for these papers and by inference they have a relation to the subject.

The initial set of data is already good enough to get a grasp of the subject but when you want more, you can look for missing data using Scholia, information like missing authors. The author disambiguator aids in finding papers for the missing author. With such iterations, the Scholia for trophic rewilding becomes more complete.

Another avenue to improve the coverage of a subject is by adding "cites work" in Wikidata for a paper like this one. Not all cited works are known to Wikidata but the effect can be impressive. NB The citations are often found in a PDF  and not in the article..

Slowly but surely all the scholarly references to be used for a new article are available, you can use a template in the article to link to the (evolving) Scholia. The best bit is you can add this template in an existing Wikipedia article as well providing a scholarly rabbit hole for interested readers.

Thanks, GerardM

Sunday, November 01, 2020

Wikicite from the ground up - oyster reefs


I watched this video having looked for oysters and oyster reefs. They are a thing in the Netherlands, we don't have them enough of them and should have them as a functioning ecosystem. 

The video starts with a Prof A. Randall Hughes moving into the water for an experiment. Prof Hughes was already in Wikidata from 2018. Being triggered by the video, adding additional information and papers is for me the thing to do. One of her paper is about oyster reefs, linking the paper to the item for oyster reefs includes her in the Scholia for oyster reef

Wikicite is about citations and one of its ambitions is to link Wikipedia references. There are many articles referenced that include the subject of the article: "oyster reef" but only one of them can be found in Wikidata. When you check the authors, Megan K. La Peyre is an associate Research Professor in the School of Renewable Natural Resources at Louisiana State University Agricultural Center, her name you will find quite often. It is cumbersome to add papers by hand, I made a stab at one of them. Only to find that I have to merge two items for Prof La Peyre because "there can be only one". 

Given that the scholarly papers among these references all have a DOI, we should have a tool that collects all DOI from the reference section of an article. It then gets the information from CrossRef using the DOI, includes the publication in Wikidata AND, something on my wishlist, link it to the Wikipedia article where it is used as a reference.

The objective of this tool is not so much expanding Wikidata but make it easy and obvious to find more information and publications on a topic through co-authors, subjects and Wikipedia articles where the same paper is used as a reference. When references are considered by some as the most important component of an article, it follows that it should be easy to expand from there in a whole different rabbit hole.

Thanks, GerardM

Saturday, October 31, 2020

Wikicite, but from the bottom up

Wikicite is one of the most active projects in Wikidata. Its purpose is to "develop open citations and linked bibliographic data to serve free knowledge". A lot of work has been done over the years, there is only one issue; what purpose does it serve.

One of the visible parts of Wikicite are the many Scholia presentations for information. Papers, authors, organisations, subjects even combinations. There is a template that enables the inclusion of Scholia information on a Wikipedia article like here.

One objective of Wikicite is to become the repository of all references of Wikipedia articles. This is where progress is possible enabling people like myself to combine the two and make it easier for Wikipedia editors to find even more sources.. I spend a lot of time adding the subject "trophic cascade" to scholarly articles that include the phrase "trophic cascade" in its title. In addition I attributed the papers of many a scholar as well. This is reflected in the Scholia for trophic cascade. Many of the papers in the references part of the English article are these same papers.

Referenced articles may be specific to multiple subjects and, may be part of the references of multiple articles. When we know all the papers used as references in a Wikipedia article in Wikidata, we can make the information in a Scholia even more useful. 

The information of existing papers with authors and citations can be enriched. For references we can add new papers., the subject of the Wikipedia article can be marked as a "main subject" for the paper as well. We weave a mighty web in this way. Our quality will be improved by flagging retracted papers and we can flag articles for an update when new information becomes available as well.

What we do does not have to be complete. That is not the way, that is not the Wiki way. When we start with what we have, we will find that it is already really useful.

Thanks, GerardM

Monday, September 14, 2020

A new tool implies changes for me .

A list like this is wonderful and it has always been a list where I either only import existing office holders or attempt to "do them all".  Typically I did the first few, making a point to include the incumbent. 

I added this template {{PositionHolderHistory|id=Q**}} to all the items for an office in my African politicians project.  You find the template on the talk page and like the Listeria lists, they show past and present office holders.

I still prefer my method of including the "red links" in Wikidata but it is a wiki and there is so much more to do.. What I started to do with office holders from Togo is that for those I will not link to predecessors and successors, I will at least show the dates they were in office. 

It looks much better in Listeria too. 

Thanks, GerardM

Thursday, September 10, 2020

Амама Мбабази is a politician who held multiple positions in the Ugandan government

Amama Mbabazi is a former Prime Minister of Uganda and held many other governmental positions. There are 19 Wikipedia articles for him, Mr Mbabazi is notable. 

When you want to find a picture for him and you know his name in Russian, you can use Special:MediaSearch. You can also find him with اماما_مبابازى

One of the positions Mr Mbabazi held is Justice Minister of Uganda. English Wikipedia has a category for these Ministers, at this time two people are included but not Mr Mbabazi. There are ten Ministers missing in the category. Mind you, it is English Wikipedia that has a list that made my work at Wikidata possible!

On the talkpage for the Wikidata item for Justice Minister of Uganda, I added the {{PositionHolderHistory}} template. A bot updates the information every day and it adds comments on the quality of the information. This makes it easy to add positions of interest to a watchlist. 

On my Africa project you find Listeria lists for African political positions. It is duplicated to several Wikipedias and once a Wikipedia is synchronised it will show information like the lists that include Mr Mbabazi. 

One day all the data will be complete and up to date. In the meantime it is a "work in progress" and you are kindly invited to check the information out, find its shortcomings and make updates where necessary.



Saturday, August 29, 2020

Proposal for the liberal use of data from the Wikipedias and Wikidata

Every Wikimedia project has information available that could be shared with other Wikimedia projects. Data is incomplete in every project and the objective of this proposal is to indicate missing data so that it may be included.

It starts with a category. This category links to six English Wikipedia articles. Using a tool, that information is now available in Wikidata as well. As this information is further enriched, it is found that one article should be included in the category.

The category exists on many Wikipedias, in the defintion of the category it is known what content the category contains in Wikidata. Reasonator shows the information with an inbuilt query. When you check the article with a list, it is obvious that many articles do not have a category entry. The latest entry in the list is known to Wikidata thanks to the Latin Wikipedia..

The proposal is simple. Have a messaging agent that indicates missing categories on articles. This will enable any Wikipedian to add them. For Wikidata we would import data based on the definition of categories. The process would be enabled per defined category.

  • Nothing happens on a Wikipedia without prior agreement
  • The mechanism used is by default one of signalling and not of updating 
  • It follows existing practice for importing data from Wikipedias into Wikidata

Sunday, August 23, 2020

Having a conversation about the usefulness of shared data

Once there is a stalemate, where positions are entrenched, there is only sniping and little progress. At the English Wikipedia they are adamant; they do not want automatic changes from Wikidata. As a result there is little or no progress making effective use of the information that is at all the Wikipedias and Wikidata. There is room for improvement, improvement that will benefit both English Wikipedia and Wikidata.

Let me explain with an example. In the Gambia they have foreign ministers. Great information can be found at this English article. There is also an incomplete category, incomplete because not all the foreign ministers with an article are included. 

When somebody enters the data for Gambian foreign ministers in Wikidata, the result is best shown using Reasonator. Reasonator show it best because you can have it show in any and all languages. That is quite relevant because there may be lists in other languages.. like in German for instance. The German list has only one red link, the English list has five and the Reasonator list, once completed, will have none. 

When you summarise the state of play for lists of position like this, the presentation of these lists differs greatly while the content is by definition the same. When you want to spare both the cabbage and the goat, it takes extra moves. The Reasonator information for the category shows 24 entries and categories in four languages. It is easy to test if all the articles linked have a category entry for each language and also if Wikidata knows these people for the position they hold. 

We do not have to put Wikidata in "your face" like we would do with automatically changing infoboxes. Having a system that indicates that attention is needed is a first step for getting used to shared information. Information that comes from all Wikimedia projects and has Wikidata as its intermediary.

Sunday, August 09, 2020

Keeping it simple for "Abstract Wikipedia"

Abstract Wikipedia is confusing to me; it is said to be about "articles in a language independent way". Articles are complicated because the expression in any language has to be consistent with the grammar, the diction, the vocabulary for that language. Wikipedia articles have one additional complication; once you start reading you may end up in a rabbit hole of wonderful stuff that grabs your attention.

Abstract Wikipedia covers all of Wikidata and that is much more than what all Wikipedias combined cover. Currently there are two items for every item with a Wikipedia link. The first objective that seems obvious is to have something to say about each item. It can be as little as **Name** is a **human**. When we know his profession **Name** is a **chemist**. When an award was won, "**Name** is a **chemist**. The **Award** was received in **year**." Patterns like these are similar for every language.

This minimal approach is the basis for automated descriptions and are vital when disambiguating. It is an improvement over manual descriptions because they do not get updated when new information becomes available. Automated descriptions are not articles; they have to be descriptive and not describing.

When a Wikipedia articles exist, they provide a rich source of information when new texts are to be generated. Given that Abstract Wikipedia is based on Wikidata, a tool like "Concept Cloud" is useful because it shows all the links to other articles and how often they occur in an article (Concept Cloud is part of Reasonator). The challenge will be to model such relations in Wikidata OR allow for these relations to be registered in a new way as part of Abstract Wikipedia.

Once sufficient information is available, an article can be generated. That is what LSJBOT and the Cebuano Wikipedia are famous for. It follows that once the same amount of data is available for a similar subject in Wikidata, an article can be generated in for instance Cebuano. When we recreate these templates, we can update them for any language. 

The linguists who theorise Abstract Wikipedia to death, can apply their magic and find if their pet theories hold water in the real world. In Abstract Wikipedia their function is to enable the provision of information in any language. Obviously competing theories may be implemented and as a result the underlying technology may evolve.

Thanks, GerardM

Saturday, August 01, 2020

Commissioners for Tanzanian Regions

Aggrey Mwanri is one of the 31 commissioners for a Tanzanian Region. The Tabora Region has a population of 2,291,623 inhabitants. For most of the 31 regions we know at least one commissioner and only for the Arusha Region we know "them all". 

I have been adding information about these Regional Commissioners and this is from a quality point of view a step in the right direction. Slowly but surely we know for more African countries structures and politicians.

When you compare African countries with "Western" countries, such structures are comparable. This makes it possible to show the extend the data in Wikidata does not represent the African reality. 

It is more than likely that there are lists of the data that is currently missing. These lists help us provide the bare bones of what it takes to know about African countries. 

So who are the data wizards who show where we our data is lacking. Where are the lists that enable the people who know tools like OpenRefine to fill in the gaps. Who has the pictures so that a Wikipedia article for a Mr Mwanri is illustrated??

Sunday, July 26, 2020

Data in Red - A holistic view on the bias for the English language and for AngloAmerican subjects

First a definition; "When data is biased, we mean that the sample is not representative of the entire population". This approach successfully underpins the Women in Red project currently a percentage of 18.51% women in English Wikipedia has been achieved. Compare the coverage of Anglo-American politicians with the politicians from the whole of Africa, the bias in the data at Wikidata is already obvious, it will then have numbers attached to it.

This is not a problem for Wikidata alone and yes, we can have a project and include a lot of data to get to a growth percentage as we did for the Women in Red. Worthwhile in its own right but in this way we do not forge a closer relation with its "premier brand Wikipedia". It would be mere stamp collecting.

The best argument for having data in Wikidata is that it is used. This is done in self selecting Wikipedias through global info boxes and lists. Interwiki links are used on every Wikipedia. Integrating the necessary functionality is a meta/technical affair and firmly for the Wikimedia Foundation to own. 

The functionality to make this happen implements an existing idea with additional twists.
  • Pictures for the subject are linked to courtesy of Special:MediaSearch
  • Automated descriptions are provided in every language to aid disambiguation. At first the functionality by Magnus is used and it is to be replaced with improved descriptions provided by Abstract Wikipedia
  • A Reasonator like display is provided to inform on the data we have on an item.
  • Suggestions for the inclusion in categories and lists are provided based on Wikidata definitions for categories and lists.
  • To help people find sources, alternate sources, Scholia is included when there are papers about the subject. Once existing citations are available, they are an additional resource
In essence this is a toolset that you can opt into as an individual and/or it is the standard for a project. Particularly for the smaller projects this will prove to be really valuable; it will prevent false friends, it indicates heavily linked items that do not have an article. It stimulates the addition of labels because it is beneficial in finding illustrations. 

This proposal is relatively low tech and it will bring our many communities together by providing widely the information that is available to us.

Thursday, July 23, 2020

What to love in English Wikipedia

This list of commissioners of the Arusha Region is great, it provides the basic information that enables me to include this information in Wikidata. It can be assumed that they are all from Tanzania, politicians and human as well. 

What I love in English Wikipedia are lists like this. It is more than likely that for every Tanzanian region there will be a similar list and as a consequence we can include all these fine politicians to Wikidata, list them in whatever Wikipedia.

As more politicians for Tanzania or any other African country are added, politicians will pop up who have held multiple offices. This will be explicit in Wikidata and in Wikipedia you could use Special:WhatLinksHere.

Technically there is not much stopping us from associating red links with Wikidata items. This is the same guy used in the "WhatLinksHere" and you find him in this list that is a work in progress as well. 

Think this through.. With lists like this in any Wikipedia, these people are findable, linkable. It will be possible to state in text what a given commissioner did and, there will be no ambiguity because of the link. 

So I love English Wikipedia for the rich resource of information it is. I love its editors who provide us with the information that enables the reuse of data. I will rejoice when it is recognised that we can do much more. When we accept that together, as an ecosystem, we are in a position where we actually share the sum of all knowledge that is available to us.

Tuesday, July 21, 2020

What to do to counter an institutional bias of the Wikimedia Foundation (part 2)

Our aim is to share the sum of the knowledge available to us with everyone, everywhere, in every language. That is what we are to achieve.

As we establish what we, as a movement, are to do, it follows that we need to measure how well we do. When a community does not play an active part for a particular goal, that too will show in the numbers.

Commons does not need to work in English only. The "Special:MediaSearch" works in all the languages we support. With this search engine enabled on every Wikipedia, we will learn how well it gets adopted in  all our languages. We will know if new Wikidata labels are used in searches on Commons. We will know if more diversity is realised in the pictures used in Wikipedia. We will know how many pictures are downloaded and from what languages.

Only in the Portuguese Wikipedia we find the governors of Mozambican provinces only in text. We can include them in Wikidata, make Listeria lists for them, but how do we disambiguate these politicians. What does it take to make the information for them usable for "abstract Wikipedia"?  How do we assemble information about countries like Mozambique and how do we get it to the quality level that some expect? As important, how do we get people from Mozambique interested and involved? 

Some Wikipedians opine that the Wikimedia Foundation does not need to raise funding for their project. Arguably this is correct, but we can raise funds for other projects, other languages elsewhere because we have more and other ambitions to realise. As we raise more money outside of the USA, more people will gain a sense of ownership. 

When we are to overcome our bias for English and our bias for Wikipedia, we need to market our other languages, our other projects. We need key performance indicators.. For Wikisource, how many books were downloaded. For Commons how many media files were downloaded and from what language.

Results need to be objective and measurable. As our research proves to have been about English Wikipedia we have a problem. We seriously need to consider to what extend it is applicable.

NB While the bias is real and the relationship with English Wikipedians is often antagonistic, it is important to recognise  English Wikipedia as the source for much of the information that ends up in other projects. When we collaborate more, our available data will reach more people in an informative way.

Saturday, July 18, 2020

What to do to counter an institutional bias of the Wikimedia Foundation (part 1)

The bias for Wikipedia as a project is strong, the bias for English makes it worse. When our aim is to share the sum of all knowledge, we have to acknowledge this and consider the consequences and allow for potential remedies.

"Bias" is a loaded word. When you read the Wikipedia article it is only negative. Dictionaries give more room an example: "our strong bias in favor of the idea". The Wikimedia Foundation is considering rebranding and it explicitly states that it seeks a closer relation with its premier brand Wikipedia. 

This is a published bias. It follows that other projects do not receive the same attention, do not get the same priority. For me it is obvious that as a consequence the WMF could do better when it intends to "share in the sum of all available knowledge" let alone the knowledge that is available to it.

Arguably another more insidious bias is the bias for English, particularly the bias for the English Wikipedia. Given that the proof of the pudding is in the eating, we have a world wide public and the use for our information hardly grows. Research is done on English Wikipedia so in effect we arguably do not even know what we are talking about.

When we are to do better, it means that we be need to be free to discuss our biases, present arguments and even use the arguments or publications of others to make a point. The COO of the WMF states in the context of diversity in tech and media that "when the bonus of executives relies on diversity, diversity will happen". It is reasonable to use this same argument. When the bonuses for executives of the WMF rely on the growth in all our projects, it stands to reason that they will make the necessary room for growth. When one of the best Wikipedians says "There are only a limited number of projects that the WMF can take on at any time, and this wouldn't have been my priority", this demonstrates a bias against the other projects. Arguably the WMF has never really, really, really supported other projects, it does not market them, it does not support them, they exist because the MediaWiki software allows for the functionality. 

When we are to counter the institutional bias of the WMF, we have to be able to make the case, present arguments and ask for the WMF to accept the premise and consider suggestions for change. This proves to be an issue and makes our biases even more intractable.

Sunday, July 12, 2020

Telling the story of governors of Mozambique

As part of my Africa project I look for political positions like Presidents, Prime Ministers, Ministers and now also Governors. I started with provinces et al because a South African minister of health of a province was considered to be not notable enough.

With Wikilambda or if you wish "Abstract Wikipedia" being a thing it is important to consider how the story is told. The bare bones of a story already shows in Reasonator. Most of the Mozambican governors are new to Wikidata. They have a position of  "governor of their state", a start and end date and as applicable a predecessor and a successor. Obviously they are politician and Mozambican.

This time I had to go for the Portuguese Wikipedia for a source. There is a list mixed with colonial governors and they need to fit a different mold. They are Portuguese and arguably they are not politicians but administrators. 

What I am eager to learn is how Wikilambda will be able to tell these stories. How it will expand the stories as more is known. I wonder if a tool like ShEx will play a role. Anyway, good times.

Sunday, July 05, 2020

The quality of all the Nigerian governors at @Wikidata

There are lists for all the governors of all the current Nigerian states. They exist on many Wikipedias. The information was known to be incomplete and based on lists on the English Wikipedia, I added information on Wikidata and as a result these lists may update with better data.

Obviously, when you copy data across to another platform, errors will occur. Sometimes it is me, sometimes it is in the data. I have only indicated when a governor was in office and predecessors and successors. 

The data is provided in a way that makes it easy to query; no information on elections (many governors were not elected) but proper start and end dates. The dates are as provided on the Wikipedia lists, articles for a governor are often more precise. People from Nigeria often are known by different names, I did add labels where I needed them for my disambiguation. 

When you want to know how many of these fine gentlemen are still alive, it will take some effort to kill of those who are still walking around according to Wikidata. It is relevant to know if a governor was elected or not. To do that properly you want to include election data elsewhere; there is no one on one relation between a position, elected officials and them being in office.

There is plenty to improve on the data. When people do, Listeria lists will update. Maybe someone will consider updating the English Wikipedia lists.

Saturday, July 04, 2020

Abstract Wikipedia, telling a story from available data

For me Reasonator is the best tool for Wikidata. It shows the data for a Wikidata item in an informative way. In my approach I am "deficit focused"; I add information for subjects that are not well represented. Additional information such as dates and successors make the information for Nigerian state governors more complete and it shows in Reasonator and Listeria lists.

Abstract Wikipedia, the new Wikimedia project is possible because of all the data in Wikidata. People who know the structure of a language will build constructs that present information in natural language. This is awesome because it will help us share widely in the sum of all available knowledge.

The objective of the Wikipedia projects has always been to share in the sum of all available knowledge. As more languages support the constructs needed for "Abstract Wikipedia", what we have in Wikidata will mushroom and evolve. It is because the data gets a purpose and, the data will be made to fit this purpose. 

The best part, Wikipedians want to tell stories and it only takes one person to add a bit of information to make a difference in the constructs for every language. My expectation is that as constructs become available for the languages of Nigeria, it will no longer be me who adds information on Nigerian politicians. It will be people from Nigeria. For them it will be Abstract Wikipedia that will show the data in an informative way.

Friday, July 03, 2020

Black representation matters, the Congressional Black Caucus

A friend asked me to help bolster the notability of black scientists. I was told of a "black caucus" with chairs and a list would help. I googled and found a black caucus with chairs and we did not know them at Wikidata. They were the chairs of the Congressional Black Caucus. Maybe not the caucus intended but of such a prominence that I added them all.

These are only the leaders and obviously over time the membership of the Congressional Black Caucus changed with the different elections. Someone else may add the data. 

The information I used could be found on English Wikipedia and is part of the article about the Congressional Black Caucus. Typically, when a position is considered important enough, it has its own article. When it does, it has more relevance and more information is available about the relevance and the history of such a position.

When Black representation matters, you want substantial lists and articles both on Wikidata and Wikipedia.

Sunday, June 28, 2020

@Wikipedia and freedom of speech

When you disagree on Wikipedia with current practices, you have to use stilted language to prevent administrators taking offence and blocking your account. 

At this time many articles of black female scientists have been marked for deletion. It is an organised effort because there are lists subdividing these articles on criteria. For the record, for many of these fine scientists I added content on Wikidata, added all kinds of information including awards.

When I learned that the article for Ayana Jordan was marked for deletion, I added the following protest: "Keep I want to stress that those !@##$ who make these proposals should be ashamed. Thanks, GerardM (talk) 05:10, 27 June 2020 (UTC)". The response came quick: "@GerardM: unlike some others here, yours is not a new account. So you should need no reminding that personal attacks and assumptions of bad faith are forbidden here. —" I replied with: " I did not use any swear words, I did express my opinion of the people who are so detrimental to what Wikipedia should stand for. That is not bad faith that is not a personal attack that is expressing revulsion. Thanks, GerardM (talk) 06:01, 27 June 2020 (UTC)". The conversation was taken elsewhere, I was blocked for a day.

For a Wikipedia administrator, it should be no news that these people who are repressive of what is not their cup of tea are widely resented. Marking articles for deletion is a form of harassment. I do not care who proposed the deletion, I do not know the person who marked Avana's article for deletion and I do not care to know him and his ilk. We have a situation where harassment is allowed and calling out such travesties is considered a personal attack and an assumption of bad faith. 

So I have been blocked for a day. I am proud to stand up against such bullies. I consider the process of deletion as rigged. These !@##$ are free to do as they wish because "we should assume good faith". Hell no.

Saturday, June 27, 2020

Hey @Wikimedia lets move the needle

The Wikimedia projects are biased. They favour only one language, the English language. When you look at Wikipedia traffic English Wikipedia is something like 50% and it does not represent 50% of our intended public. 

The objective is to improve the usefulness of the other projects and thereby increase their traffic. That is, more articles and books are read, more pictures are seen and downloaded.

Lets pick one language, Yoruba, as an example. There are currently 32,624 pages in its Wikipedia. There are some 40 million people speaking the language. So what can we do for Yoruba editors and readers. How can we track what makes a difference and also what makes a difference and what can the WMF do to achieve this.

* We can improve list support. 
Currently the best support for supporting lists in a Wikipedia is "Listeria". It is supported by Magnus.. Listeria lists have been shown to be more up to date then manual lists on English Wikipedia, for less resourced projects this will be even more true. When existing lists can be easily included in an article, it will expand available information hugely.. Here an example of Listeria lists on the Yoruba Wikipedia. Content of these lists show in Yoruba. Lists are better supported and adopted when it is WMF supported functionality.

* Choosing pictures for illustration
When people look for a picture, they have to goto Commons or they visit Wikipedia articles on the same subject and use these same pictures. When the Special:MediaSearch is available as a tool from every Wikipedia article, a much richer palette of pictures becomes available to choose from. (The search is for "Agbègbè Ìjọba Ìbílẹ̀ Mushin")..

The cool thing is, when this tool is available when writing an article, it is easy to more pro-actively add labels to Wikidata. This will improve the performance for the Special:MediaSearch even more.

What would truly support Special:MediaSearch is disambiguation. It is unreasonable to expect that we get descriptions in all the 300+ languages we support. What Reasonator supports are automated descriptions. It makes it easy and obvious to choose the right item in any language.

For the Wikimedia Foundation to support other languages, for it to move the needle on any and all languages, we need to measure what is meaningful. The number of searches by Special:MediaSearch and what language was used. The number of pictures used in each Wikipedia. The effect lists have on the writing of new articles.

When we did not measure such numbers so far, it is what we should do to move the needle. One needle is the total number of reads quite another is the number of reads for each project. Same for the use of Wikidata and Commons.

Sunday, June 21, 2020

Marketing @Wikimedia but first some SWOT analysis

The Wikimedia Foundation has a 2030 strategy, it intends to increase its reach, increase its budget and rename projects into "Wikipedia something" in order to improve its visibility.. 

Wikipedia is one of the most visited websites on the Internet, its quality is good and is mostly edited by older white males in the first world. Typically when people mention Wikipedia they refer to the English version but it is only 50% of Wikimedia traffic. From a marketing point of view the English market is saturated, growth can be expected from Wikipedias in other languages and from other projects.

The Wikimedia Foundation is very much tied to the United States. Given the current regime and the possibility that it will prevail in November, this reliance is an existential threat. It is likely that the US government will want to intervene in Wikimedia content after 2020. I doubt it is possible, given the current hardware configuration, to move away from the US and still serve the rest of the world with a NPOV.

At this time the Wikimedia Foundation is centrally led, there are satellite organisations in many countries who are limited in what they can do; their budgets are centrally managed. Fundraising is mostly done from the USA and most of it is raised in the USA. That is problematic in its own right because many "Wikipedians" feel that too much money is raised, money not needed to support their project and people in other countries do not get to feel that it is "their" project because of "their" contributions. As a professional fundraiser, I am convinced contributions from the Netherlands could increase at least tenfold within a year.

The bias for English is huge and it is compounded by the bias for English Wikipedia. At a conference a Dutch professor stated that research not about or linked to the English Wikipedia is unlikely to get published. It follows that the data used for the 2030 strategy includes this same bias. The MediaWiki software is developed first and foremost for English Wikipedia and it is expected to work for other languages and for other projects. There used to be a development team specialised in language technology.. it was dissolved. 

There was a time when English Wikipedia did support the other projects. Because of an anti Wikidata stance by some this changed. There is no solution for false friends and lists are not as well maintained as they could be. When we link to the Wikidata item for an article and no longer to a title for that same article this will change. It is easy enough to build functionality that allows for both and by opt-in projects will understand the benefit and choose to adopt.

When marketing is the reason for changing the name of projects, it is important to consider the ramifications. The "Wikipedians" among us claim ownership of the Foundation, insist on actions in their image. They represent a staid community representing a saturated market. With a strategy in place it is possible to disregard them. This makes only sense when the WMF tackles its bias for English as a priority. This is what is needed to realise the 2030 strategy.

Sunday, June 14, 2020

@Wikipedia is old news, it could point to new sources

Wikipedia provides the best text on many subjects. It being static is both a blessing and a curse. It is a blessing when it is a topic that is very much in the public eye, it attracts many people willing to edit and come to a neutral point of view.

It is a curse when the topic is no longer popular. No longer is there an interest to maintain the information, new publications are not integrated in what used to be a neutral point of view.

In the references section of an article you find the underpinnings of what is stated in an article. It may be newspaper articles or science papers. Both newspapers and science have a hard time attracting attention and this endangers the availability of quality sources for future updates.

In Scholia information is continuously updated about the latest papers by authors and or about subjects. As time goes by, papers become available dated later than the latest reference. When such papers are clearly marked, it is an invitation for the Wikipedia community to revisit a subject and learn if what was a neutral point of view survives as such vis a vis the latest information.

Every subject should have its own Scholia.

Friday, June 12, 2020

Professor Vassie Ware - an early recipient of an early career award

On the page of the "GlobalYoungAcademyTeam", you find many young academies. You also find "early career awards". These young academies, these awards are represented by "Listeria lists", when something changes in Wikidata it is reflected in them.

The WICB Junior Award is an early career award for women conferred by the American Society for Cell Biology. An English Wikipedia article provided the initial content for this award and given that there are many people interested in what this award is about, I included all recipients of the award. Professor Ware is the earliest recipient I added by hand.

The standardised Listeria lists, show the people who are included, it shows their occupation, identifiers for ORCiD, Google Scholar and VIAF and it shows the number of publications known for them. The approach is a wiki approach and it is therefore fine that we only have two publications for Professor Ware, we do not have a freely licensed picture of professor Ware yet and, there is no Wikipedia article either.

Once a list is reasonably complete, new information is added all the time. It follows that the Scholia page for the award and for a scientist like Prof Ware evolve. In the true wiki spirit, a structure is provided and anyone who cares to makes a difference. A difference for the understanding of science and for the people who make science what it is.

Thursday, June 04, 2020

@Wikimedia and languages - @WikiCommons search, the most relevant development since @Wikidata

The Wikimedia Foundation is important for the support of languages on the Internet. The localisation of its software is done at, it is done in over 300 languages.

The milestones for multilingual support are:
These milestones have been very much technology driven. For me the one reason why Wikidata became the success it is, is because it was from the start linked to every subject covered by Wikipedia and the solution was so overwhelmingly superior that nobody could reasonably object.

To make a success of this latest milestone, institutional support is needed. It is for the Wikimedia Foundation, its movement to reduce its bias for English and make room for improved language support.

My way of phrasing this as an essential objective: "All of is available to every single person on the planet". As we adopt this as our objective, it is first and foremost about making Special:MediaSearch useful in any and all of our languages and make it available from any and all of our Wikipedias.

As we adopt this, it is essential that priority is given to multilingual search over special interests including GLAM, Open Data, SPARQL and what have you. Priority when we are to open up in multiple languages first. Special interest only gain relevance when it is made obvious how it helps it helps open up Commons in Swahili, Hindi, German or Vietnamese.

Special:MediaSearch is possible because of everything that went before.. Its functionality is part of MediaWiki and localised at The existing search engine is now linked to the labels for items in Wikidata and it was made public after Hay Kranen brought us his proof of concept. It became available warts and all and while finding منصور اعجاز  in Punjabi is huge, it is not great when you do not find cats because a user is called Kočka..

The challenge to us as an organisation, a movement are we willing to work on our existing bias, open up Commons in all the languages we are said to support and accept that our hobby horses will get attention not in the next but in a future iteration.

Thursday, May 28, 2020

@WikiCommons - Sarah T. Roberts versus Sarah T. Roberts

I have a renewed interest in Commons because the first steps have been made to make it actually useful. According to Wikidata there are two distinct Sarah T. Roberts. One is an epidemiologist the other is into information & media studies.

At Commons it was a mess, the picture of Sarah was used to illustrate an info box of the other Sarah. It is not that interesting to tell you how I did what. Relevant is that I did. I did because you will will find things when there is a label for whatever in "your" language..

Given that we do not research the use of Commons or Wikidata for that matter, why should the WMF give priority to opening up Commons even further? After all, there is no data to support it..

Tuesday, May 26, 2020

@WikiCommons - Meanwhile in a school in India, Japan, Russia

These students in India have to do a project. The subject is Botswana. Their teacher wants them to find many pictures so he searched Wikimedia Commons among others for pictures of  Mokgweetsi Masisi, the president of Botswana. He marked the pictures that depicts Mr Masisi and now his pupils will find more pictures of him when they look for मोकेगसेसी मासी.

At the same time in Japan students have to do a project about Botswana. Their teacher is pleasantly surprised when he find so many pictures for モクウィツィ・マシシ...

Monday, May 25, 2020

@WikiCommons - meanwhile in a different universe

And again there was a discussion that it should not be this hard to find pictures in Commons. The big difference this time is that there is now a wealth of images that have been tagged for what they "depict". They are linked to Wikidata items and they have a wealth of labels in many, many languages. In essence it has always been an objective of Wikidata to share its content in any and all of the 300+ languages supported by a Wikipedia.

The ideas that floated around soon made it into a "proof of concept" and as so often it actually worked after a fashion. The first iteration was in true Wikimedia tradition English only. The proof of concept got its second language in Dutch, Hay Kranen the developer is Dutch. Now there are nine languages and we are waiting for French to be the tenth.

So what does it do. You can look for pictures in Commons, it has 61 million media files, and when you are looking for available pictures in your language, you will find it as long as Wikidata has a label in your language.  This is for instance a result in Japanese and this is the result in German.

What can you do to make it better? Add labels in your language for the things you want to find and find media files that depicts what you are looking for. When nobody translated the software in your language, you can even do that.

Why is this so relevant? Have you ever wondered how many pictures you find in one of the smaller languages using Google or Bing? Let me tell you, it is disappointing to be polite. Commons is the repository of the mediafiles that illustrate all the Wikipedias so yes, it covers "almost anything".

The Wikimedia Foundation has this big strategy for its movement to be inclusive. This is a wonderful opportunity to show how agile it is, that it understands and supports a need that has been expressed for many many years. The beauty is the the way forward has been expressed in something that already works.

ABSOLUTELY, there will be challenges in integrating this functionality where it fulfills a need.

Luckily it is not necessary for it all to be done in one go. The first step can be as little as to take the "proof of concept" an rewrite it in the preferred language of the WMF, internationalise and localise it and keep it stand alone for now. The people who know about it will use it and they will be the first to point out what more they want to be done. A priority will be to retain its KISSable nature.

The objective is to open up Commons. Open it up in any and all languages. For me it is obvious. I will gladly give it my attention in the expectation that both Wikidata and Commons actually find a public, have a purpose that is more than what we do for ourselves.

Sunday, May 03, 2020

These scientists saw the coronavirus coming. Now they're trying to stop the next pandemic before it starts.

When you read an article with the same title as this blog post, it is one among many clamoring for attention. There is so much that can be qualified as not worth your time. In this blogpost I describe my way of adding value for articles that I think are worthwhile.

What I do is look for people in the article. In this article it is a Jonathan Epstein. The first thing is to look for Jonathan in Wikidata. Disambiguation is the name of the game and, finding candidates who might be Jonathan is the first step. Jonathan proved to be Jonathan H Epstein, there was also a Jonathan H. Epstein. Because of sharing characteristics they could be merged. Vital in this are authority identifiers and links to papers that make it reasonable to assume that they are the same person. It is helpful when Jonathan is part of the disambiguation list when people look for "Jonathan Epstein" so it is added as an alias.

The next step is to enrich the data about Jonathan P.. Authorities may identify where he works and from the website of Columbia university additional information is digested into Wikidata statements, information like the alma maters. In Wikidata many authors are only known as "author name strings", meaning they are only known as text. With available tooling, papers are linked to Q88406948, the identifier for our Jonathan.

After these steps, there is a reasonable impression of the relevance of Jonathan as a scholar and this supports the likelihood that the article that cites him can be trusted. Do this for others presented as authorities in an article and by repeating the process you provide a way for Wikidata to become a source that helps identify fake news.