Sunday, November 16, 2025

Today's laurels are tomorrow's compost

There is a congratulatory article because "in the AI era, Wikipedia has never been more valuable". That may be however, it is valuable now but will this remain the same? So probably one Wikipedia is mined for information to be used by an AI. The information contained in this Wikipedia will evolve over time and this information may consequently end up in the results of the involved AI.

The question becomes, how will we remain relevant and up to date. Relevancy is in multiple parts, how do we remain a challenge for our editor community, how do we remain the "go to" place for our public and how do we remain a source for the bots feeding the AI.

My suggestion is predictable. Leverage the sum of all the knowledge we have in all our projects and maximise cooperation with any and all compatible organisations.

We can share all the awards and recipients of awards known on our projects. Our academic references should all be known to Wikidata and we could and should update these in collaboration with ORCiD and CrossRef. We would have up to date portfolio for the scientists we have Wikipedia articles of. We would know for scientific articles their citations and what cited these articles. Our editors would be enabled to improve the quality of our work. 

Yes, the AI engines would be better informed but hey, our intention is to share the sum of our knowledge. They are welcome to it.

Thanks,

      GerardM

Wednesday, November 05, 2025

Missing award recipients in both Wikidata and the Wikipedias

Professor Fei-Fei Li is one of the recipients of the 2025 Queen Elizabeth Prize for Engineering. It says so on the English Wikipedia and it is confirmed on the website of the prize.

There are nine Wikipedias with an article for the award and there is Wikidata. When the 2025 awardees are known on a Wikipedia, "2025" should be available in the text of the article. Otherwise the article is likely out of date. The recipients should be known on Wikidata AND there should be an "award received" for the award with a date of 2025.

When you check Wikidata for this award using "Reasonator", you will find that Wikidata is in need of an update. It is by accident that I learned of this award. Updates are an hit or miss affair, this would be improved when a bot produces a list of all the awards that are in need of updates. When a bot produces this list for every Wikipedia for all the known awards, it enables people to do this maintenance work. 

Obviously 2025 is this year and it will have the most mutations. A similar job can be run for other years but it is less likely to bring many additions, more likely these list will become reduced in size over time.

Thanks,

      GerardM

Saturday, November 01, 2025

English Wikipedia awards, a Wikidata user story

I noticed that Yahvinder Malhi received the Roman Magalev Prize on Bluesky. There is an English Wikipedia article for both Mr Malhi and for the prize. I looked it up, Mr Malhi is not known as a prize winner on the article for the prize, it is however on the his personal article as a text reference.

So why not have a tool that produces a list for all awards on a Wikipedia where Wikidata knows about an award AND an award winner where both have a Wikidata item and the award winner is not on the award article. Easy obvious and it will improve the quality of articles about awards.

This can work two ways.. Why not have a tool that produces a list where awards known at Wikidata are not linked on the article.

Technically it is not that hard. It is just a few queries that are to be run on a regular basis. It is the user interface where it becomes tricky. How will a user know that something was fixed.. How will we run it for all the Wikipedias.. Will we be smart and recognise red links..

Another tool could be where we indicate to Wikipedias with an article for an award when a change happened for that award.. particularly new award winners for the current year.. It could be a list where editors are triggered to revisit their articles.

Thanks,

       GerardM

Sunday, October 26, 2025

Automated updates for Wikimedia projects

 

I revisited my Wikipedia user page. On it I have several subpages that are regularly automatically updated when things change on Wikidata. One of them is about the "Prix Roger Nimier", I had not looked at it for years. I updated Wikidata from the data on the French Wikipedia and to make it interesting, I added the Listeria template to my French Wikipedia user page. It updated and the English and French article are nearly identical. The difference is in the description.

There are many personal project pages that are automatically updated from Wikidata. The point that I wanted to make: topics are not universally maintained. As I had another look after a few years, I found that many have had regular updates. The quality however is not that great. From a Wikimedia perspective, it seems that we have not one audience but many. When we allow for automatic updates, we will be able to share the sum of all our knowledge with a much bigger audience.

Thanks,

       GerardM

Sunday, October 19, 2025

Providing Resources for a subject used in a Wikimedia project

So you want to write an article in a Wikimedia Project. It may be new to your project but it is likely part of the sum of all Wikimedia knowledge. Lets consider a workflow that acknowledges this reality.

An article typically starts with a title and it may be linked to an existing item in Wikidata. If so, the item, the concept is linked to a workflow. All the references for all articles are gathered. All relations known at Wikidata are presented. Based on what kind of item it is, tools are identified presenting information in the concept articles. They are categories and info boxes. References for content in the info boxes are included as well.

Another workflow is for existing articles. All references and relations expressed in the article show as green, unused references and relations show as orange. Missing categories and values in info boxes are presented and the author may click to include them in the article. Values in info boxes may show black, red or blue it will be whatever the author chooses.

The workflow is enabled once the concept or the article is linked to Wikidata. So for those Wikipedians who do not want to change, they just do not make use of this workflow and are left unbothered. There will be harvesting processes based on the recent changes on all projects; a change will trigger processes that may look for vandalism for new relations and for suggestions for new labels. 

The most important beneficiary will be our audience. This workflow makes the sum of all our knowledge actionable to improve articles, populate articles and reflect what we know in all our articles. Our editors have the choice to use this tool or not. Obviously their edits will be harvested and evaluated in a more broad context; all of the Wikimedia projects. The smaller projects where more new articles are created will have an easy time adding info boxes and references. The bigger projects will find the relations that are not or not sufficiently expressed with references. 

Providing subject resources will work only when it is supported on a Foundation scale. It is not that volunteers cannot build a prototype, it is the need for scalability and sustained performance that is not provided by the Toolforge.

Thanks,

      GerardM

Saturday, October 18, 2025

Using AI for both Wikidata/Wikipedia quality assurance

When people consider the relation between Wikipedia and Wikidata, it is typically seen from the perspective of creating new information either in a Wikipedia or in Wikidata. However what can we do for the quality of both Wikipedia and Wikidata when we consider the existing data in all Wikipedias and compare it to the Wikidata information.

All Wikipedia articles on the same subject are linked to only one Wikidata item. Articles linked from a Wikipedia article are consequently known to Wikidata. When Wikidata knows about a relation between these two articles, dependent on the relation they could feature in info boxes and/or categories in the article. At Wikidata we know about categories and what they should contain. Info boxes are known to Wikipedias for what they contain, relations are likely to be known both to Wikidata and Wikipedia

Issues identified in this way will substantially improve the integrity of the data in all our projects. We are expecting false friends and missing information in Wikidata and in all Wikipedias.

Using AI for identifying issues ensures that quality will be constantly part of the process. That basic facts are correct so that the information we provide to our audience will be as good as we have it.

Thanks,

       GerardM

Monday, October 13, 2025

Batch processes for Wikidata .. importing from ORCiD and Crosreff - a more comprehensive trick

Every week a process runs that produces a list of all the papers for all the scientists known to Wikidata that have an ORCiD identifier. The papers are known by a DOI and typically all scientific papers at Wikidata have a DOI. ORCiD-Scraper uses this list for interested users to upload the information of these papers to Wikidata using the "QuickStatements" tool. One paper at a time for one author at a time.

What if.. what if all new papers of all authors known to ORCiD are added? The challenge will be not to introduce duplicate papers or duplicate authors.. So lets agree on two things, we only introduce authors who have an ORCiD identifier and for now we only introduce papers who have a DOI and at least one author who has an ORCiD identifier.

The trick is to cycle through all authors known to Wikidata. For instance a thousand at a time. All new papers have their DOI entered in a relational database where a DOI can exist only once. All these papers may include multiple authors, they enter the same relational database where an author is unique. When all papers for the first thousand authors and associated authors in the database, we can first add all missing authors to Wikidata and add the Wikidata identifiers in the relational database. We then add the missing papers. It is likely that no duplicates are introduced but there will be duplicates for authors where Wikidata does not know about the ORCiD identifier.

We cycle through all our known authors and we can rerun each week.

We can do this but why.. 

One reason is an English Wikipedia template named {{Scholia}} it may be used on scholars and subjects. Unlike Wikipedia the information it presents will always be up to date. There are more reasons but this is a great starter.
Thanks,
      GerardM