Saturday, September 27, 2025

Moving forward with Amir's "Internal Links in #Wikipedia" presentation

At Wikimania 2019, my friend Amir presented "Internal Links in Wikipedia". It provides a wonderful expose of what is problematic with the existing functionality with blue and red links in all our Wikipedias. At the end of 2024, technically things have moved forward, this blog post's intention is to provide arguments what a local Wikibase for wiki links will bring to both editors and readers and why it does not need to be controversial. By definition, changes made to Wikipedia are controversial. 

Functionally, every link red or blue should remain exactly as is. Technically, every blue link refers to one article and every article SHOULD have an item at Wikidata. Every link, blue or red, may be referred to from many places and SHOULD be about only one concept. For every destination there MAY be a link to an item at Wikidata. At this time we have no way of knowing if there is only one concept and if there is an item at Wikidata for that concept.

Many years ago Wikidata solved a similar problem. Wikidata was an instant success because it replaced the interwiki functionality. The solution proposed today is similar and only possible now that Wikidata can be "federated" with many instances of a Wikibase. 

All destinations for both red and blue links will be known in a local Wikibase federated with Wikidata. Any destination may be linked to a Wikidata item but the name of the local article/destination will remain unique. Thanks to this federation, disambiguation support may be provided based on what is known both locally and globally when a new link is created. It will know about the synonymy for each subject.

This change does not need to be controversial because like with the interwiki links, people can opt out of this new functionality. When only a subset of the editor community becomes involved, the quality of all links will improve quickly. With the interwiki links fixed, Wikidata was ready to become a knowledge base. As the wiki links in the local Wikibases get in shape, the Wikidata knowledge base may be used to signal that articles should be in specific categories, or that red links could be added in summation articles like in articles about an award.

Our dependence on Wikipedia editors will remain key but tools like the Wikidata knowledge base are available to bring us the data that enables us with information that is up to date and improves the connections between all our articles. Manually checking wiki links is a Sisyphean task, with tooling it becomes manageable and worthwhile.

Thanks,

      GerardM

Batch processes for Wikidata .. importing from ORCiD and Crosreff

One of my favourite batch processes produces data for a tool called "Orcid-Scraper". I use it to add the missing publications known to ORCiD. I do it as a hobby however, I would use my time more effectively when in stead of producing a database that enables me adding new data, the new data is added to Wikidata. 

This was done in the past by a different tool. It was a drama because Wikidata is NOT a relational database. The problem is that an item cannot be created with the certainty that it will be unique. To ensure that new items will be unique there are plenty of available tricks. 

The easiest trick is to have an option in the tool to create all the missing papers known for a given author. One author at a time and, from Scholia. It makes use of results from a batch process that runs once a week. Cheap, cheerful highly effective.

Then there is a need for another batch process. For all the "author string"s that include an ORCiD identifier, existing authors are sought and these author strings are changed into "author"s removing the link to the ORCiD identifier as it is implicitly part of the author. This process can run once a week.

A second batch process, also running once a week, looks for "author string"s with ORCiD identifiers without corresponding authors. It generates a list of ORCiD identifiers with associated "author string"s and creates one new item uniquely identified by that ORCiD identifier. 

Obviously new authors make it useful to run the first batch process again.

These batches could run exclusively for an author processed by Orcid-scraper making this tool and Scholia more powerful and up to date.

Thanks,

       GerardM

Sunday, September 14, 2025

One line in a Wikipedia article; a prize is name after her

The ESHG refers to Wikipedia articles for its award recipients. The award named after Leena Peltonen-Palotie currently has 6 recipients, three have a Wikipedia article and when I am done writing this blogpost, all will have a Wikidata item. 

This award currently has no Wikipedia article, it has a Wikidata item and consequently associated information can be shown in Reasonator or in a Scholia

I added the 2024 recipient because awards without recipients is not really informative. I came across the inaugural 2013 recipient because of another award she received. All six recipients had an Wikidata item and only one did not have publications associated with him. However, a merge of two items solved that issue.

It is wonderful to see prestigious organisations refer to Wikipedia articles. I do notice that we are still at a stage where Wikipedia information is not valued enough to mine it, curate it and finally share it with all our audience. We could share the knowledge that is available to us.

Thanks,

     GerardM