Thursday, November 14, 2024

Red pill and blue pill - Wikipedia is it a binary choice?

As far as the English Wikipedia is concerned, there is no red nor a blue link for the 2024 awardees of the Brewster medal. Its information ends in 2021. The German Wikipedia is up to date. There are no articles for Renée A. Duckworth and for Juan C. Reboreda on both Wikipedias, the German has two red links.

When you maintain information like this, there are three options. You can include an awardee in text or as a link and as luck will have it the link will turn red or blue. This is complicated because a link may have homonyms. With a red link you will only know an homonym issue once an article is created, with a blue link you may know immediately.

The Wikimedia Foundation solved a similar problem a long time ago for another type of link, the "interwiki link".  The solution is Wikidata. It works because there is only one identifier for every topic and every article needs a link to a Wikidata item to have a more global relevance.

Thanks to the ongoing development of Wikidata, there is the Wikibase. We should do a similar job for the red and blue links. It will do away with the false friends problems in Wikipedia. It will improve quality for each Wikipedia and it will improve the quality of Wikidata. Any data related updates that are not strictly local will remain at Wikidata because that helps us in the sharing of the sum of all knowledge.

When a new a link is to be added in any of the 333+ Wikipedias, it starts with disambiguation.. Is the subject already known in any of the other Wikipedias? If not a new Wikidata item will be created and extend options in any future disambiguation. If it is, available information and references are available from the start and consequently a Scholia, a Reasonator or any other generated view of the information may become available dependent on the policies of a Wikipedia.

Implementing such a Wikibase is not really problematic because all the blue links still refer through the local Wikipedia article to Wikidata. The red links are the more tricky bit. They are opened up once they are linked to a Wikidata item. 

With such a Wikibase in place, we can start doing the smart things. The Brewster medal, Q612041, could have a red or blue link to all the awardees. When they don't the article is to be reported for maintenance..

Cool?

     GerardM

Tuesday, November 12, 2024

Fellows of the Royal Zoological Society of NZW and .. #ChatGPT

Wikipedia knew in a text about a fellow of the Royal Zoological Society of New South Wales. Unlike many other awards it does not have its own article, there is no category for these fellows, it has a paragraph in the article about the fellows.

Wikidata did not know the award. 

The list of fellows on the RZS website is formatted in a "last name, first name" format. There are too many fellows so converting it by hand is inconvenient. As so many people are enamoured by ChatGPT, I gave it a spin. ChatGPT does NOT process websites for me. So I copy pasted the list and asked it to change the order of the surname and the first name. 

I asked it who had a Wikipedia article. It could not tell me but it gave me a list of fellows who likely have a Wikipedia article. For many of them I added the award in Wikidata and for some fellows  I added a new Wikidata item. For many of them I linked publications and this results in a nice Scholia for the award

It would be really cool when there is a Wikimedia AI that will answer questions like: "for the people in this list change the order of the name and check if these Australian award winners have a Wikipedia article or a Wikidata item". Maybe start with a tool for editors and then open it up to the general public. 

Given that Wikipedia is multilingual, what would be the effect of the data for the answers being all Wikipedias AND Wikidata.. Given that Wikifunctions is language agnostic, why not have functions that are a front end to such a Wikimedia AI?

Thanks,

       GerardM

Saturday, November 09, 2024

The story of African award winning scientists using Wikifunctions

You can find the winners of the Alan Pifer Research award on the English Wikipedia. One of them, the 2011 recipient is Mr Kelly Chibale. there are several ways to be informed about him. There is Scholia and Reasonator, both derive from Wikidata and then there is the Wikipedia article. All four provide information, one is unstructured and exclusively in English. The good news is that parts of it have a structure making it easy for tools to analyse and convert to data. 

A person can read an article, find and add an award not in Wikidata and choose to add the awardees or use "Awarder" to do it with less effort. It is good when it is done but analytical tools could do a better job. There are many tools that produce information in a nice layout like Listeria.. Problem is that it is not maintained by Wikimedia and it is not necessarily multilingual. 

And then there is Wikifunctions. It is developed and maintained by the Wikimedia Foundation. It could do all the things that Listeria does. Having a function that does only list all the honours and awards for someone like Mr Chibale would be great particularly when there is a function that brings to the light all the award winners for any award. An article about an award can be minimalist, and still include stuff that typically goes into an info box.

With functions available like this, it PAYS to engage in Wikifunctions for the specifics of a language for a function. It is impossible to include all awards in any language but with some imagination, we can expose information once the necessary functions are available.

Thanks,

      GerardM

Tuesday, October 29, 2024

the virality of co-authors in urology

Happy birthday Wikidata and, many happy returns.

When you start enriching the data for a Dutch urologist, an academic who published quite a number of scientific papers, obviously there must be many co-authors. Many of them are yet to be identified, at this moment for Jakko A. Nieuwenhuijzen there are some 339 still to be added.

The main consideration is what has the biggest impact. As a colleague of Mr Nieuwenhuijzen is known at Google scholar, adding papers for him brought new publications to Mr Nieuwenhuijzen and many of his co-authors. Enriching data for these co-authors makes the graph more complex.

At some point more precision in the data for a single author is no longer worth the effort. When you then find an other urologist with many papers not yet attributed and many co-authors where Wikidata does not know the gender yet, focus shifts and many more edits make their way into Wikidata.

Many of these co-authors are of the same institute but people from elsewhere find their place in these graphs as well. Many are Dutch but as urology knows many international collaborations this is reflected in the expanding number of co-authors. 

As a topic is developed in this way, it easily results in thousands of edits. As many subject are  researched in this way, the enriched data is there for the world to use. This data is only of value when there is a public. Sharing in the sum of all knowledge has always been what we stand for. Sharing freely and widely generates us a a both public and a future.

Thanks,

      GerardM

Sunday, October 27, 2024

The fallibility of notability

When Wikidata will be split up in a "science" part and "all the rest", scientists who have a Wikipedia article will need to be part of the "rest" as well. This is necessary as all Wikipedia articles have a link to Wikidata because of the "interwiki" mechanism.

It follows that there will be an over abundance of USA scientists and there will  hardly be any scientists of Africa or South America. 

Some data about scientists is likely to be considered to be part of "all the rest" awards for instance. Are these scientists who received an award to be known in two data sets? Some scientists had a career as an athlete.. an other reason for duplication. It is hard enough to maintain the interwiki links and existing duplication within Wikidata, it will become exponentially more difficult when another data set is added.

When the creation of Wikimedia Commons was considered, similar good reasons led to hesitation and prevented us to bite the bullet for quite some time. Commons started with the creation of a Wiki, a MediaWiki patch that showed a picture in a Wikipedia and it then took a long time for most of the duplicate pictures to be only in Commons. It was not technically perfect but it was done perfect in the wiki way.

I hope that we will bite the bullet this time as well. With a new unrestricted wikibase, the old batch jobs can be dusted off and make good for the years of academic data we missed. I pray that Scholia will become functional soon after. 

I will still be able to do my Wikidata thing.. projects like African politicians, Muslim countries and their rulers (past and present).. Awards that can do with an update obviously including science awards.. I will not be bored but maybe I will be working .. maybe not.

Thanks,

       GerardM

Saturday, October 26, 2024

Old soldiers never die, they march in the remembrance parades

As our movement matures, people who were there at the beginning, age. They get other priorities, they get sick, operated upon and as a consequence have a windfall of time to do more work at Wikidata. 

I did a similar job for a dear fellow Wikimedian.. It is now my turn, my chirurg is in this picture and as I add missing co-authors this picture becomes more complex. It will also become more complex when existing co-authors are enriched with new and linked papers.

With Wikipedia there is the promise that even though the information will evolve, all the work people have put in will be there in future and enable people to read/study the subjects each editor cared for.

The data of Wikidata as it is will be split in parts. For the best of reasons but once its structure is broken, the tools that bring structure to the data will be broken as well. The same tools that enable the enrichment of the data will be broken. Much of my Wikimedia legacy will be lost because there will no longer be a public enabled to learn about scholarly works in a Wiki way.

For a few years now this sword of Damocles has hung over Wikidata. As a consequence the potential of Wikidata is not being realised. The data could be so much richer when automated processes bring free knowledge together. References in Wikipedia  indicating later papers and improve its quality. 

As long as I can I will do my Wikidata thing; hope is eternal.

Thanks,

       GerardM


Thursday, March 07, 2024

A Red&Blue approach to Wikipedia references.

Elisabeth Bik is according to her Wikipedia article a "scientific integrity consultant". Her work is often to the detriment of the reputation of scientists and the work they do. Many of the scientists have a Wikipedia article and retracted publications serve as references in Wikipedia articles.

Many more publications are retracted, most if not all are registered at Retraction Watch. It is reasonable to expect that many publications serving as references in a Wikipedia are retracted. Arguments used to achieve a Neutral Point of View based on a retracted publications, are wrong by definition. 

When all references of a Wikipedia are registered in a Red&Blue Wikibase and, when all books with an ISBN and scientific publications with a DOI are ALSO known at Wikidata, it becomes possible to offer a new service. A service providing information about retractions and citations to the publications used as a reference.

Such a service is to be interactive as well.. Just consider: a Wikipedian wants to check the quality of a Wikipedia article. An update button, first checks for retractions and for all citing publications. It then checks for missing data like citations and authors. At the same time new references are added; they are  all processed in the same way.

In the background, all publications will be checked by a batch functionality for updates at Wikidata. Particularly for new retractions, authors who claim a publication.. In this way the information on any topic will be as good as we can make it.

  • scientific publications are retracted and these retractions impact our NPOV
  • publications may be used as a reference in multiple Wikipedias
  • keeping information on sources up to date protects our NPOV
  • making the latest references available to all our Wikipedians ensures an optimal result
So what is not to like? 
Thanks, 
        GerardM

Wednesday, March 06, 2024

Another Red&Blue application; the epidemiologist who wrote the book on "Smoking Kills"

Professor Richard Doll of Oxford is considered one of the best epidemiologists of the 20th century. There are 20 Wikipedias who consider him notable enough for an article yet Wikidata had until now no scientific paper associated with him. That was easily solved by disambiguating "author strings" for Mr Doll. 

With currently 54 publications to his name, none of his books are included. At the Open Library, Mr Doll is known five times and several books were known by these different Mr Dolls. All books have now been attributed to the Mr Doll with id OL1150080A. This identifier is now linked on Wikidata and reading the available books can be read by an international public.

All publications known at Wikidata for Mr Doll are represented in his Scholia. Given that there is much more to explore, this representation will evolve over time. People may add books or publications and additional co-authors may be disambiguated (currently a potential of 159 authors). 

The English Wikipedia has a Scholia template and it is implemented on the Richard Doll article. Functionality like this makes all the effort worth it bringing information to a next level of exposure. It works both ways. Suppose that all references of all Wikipedia articles in any Wikipedia are to be found in Wikidata. All of these references will be known in the Red&Blue Wikibase. All references with an identifier like a DOI or an ISBN can easily be integrated in Wikidata for re-use in other Wiki projects. 

With some additional work, it is even possible to associate references to individual statements and have them known in Wikidata as well. Again this promotes exposure of all the work we do and it promotes re-use in other Wiki projects.
  • Scholia is/could be available as a template on any and all Wikipedias
  • You can read books when available at OpenLibrary
  • Anyone can contribute to the tapestry of information for any scholar
  • References can easily be added in Red&Blue Wikibase
  • These references can be linked to Wikidata making for one stop shopping for updates
So what is not to like? 
Thanks, 
       GerardM

Monday, March 04, 2024

A Red&Blue Wikibase disambiguation on the English Wikipedia

Mark Edward Hay is an American marine ecologist. There is a Wikipedia article about him in two languages and there is an article in Wikispecies. Consequently there is an item in Wikidata.

In a template it says: "[[Lowell Thomas Award]] (2015)". The link it a redirect to [[Lowell Thomas]] the man the award is named after. This is accepted practice in Wikipedia and it is not a problem. The redirect page has 23 links to articles mostly of people who received the same award.

With a Red&Blue Wikibase for the English Wikipedia, it will be possible to associate a relation with the award. This could fit in a template and additional red links can be added based on the source

When a Wikipedia adds new links, it is done by typing in the name of an potential article. Given that people who received an award are notable, consequently new blue links are highly likely to occur. New red links are entered in a template so there is this implied relation. 

At Wikidata an item for the Lowell Thomas award was recently added because of Mr Hay. It currently only refers to one recipient; Mr Hay. The 23 relations known at the en:red&blue are more than welcome to be added to Wikidata. Red links are more tricky as Wikidata is a superset of data of all the Wikipedias  articles of all Wikipedia and then some. 

So when Wikidata already knows about a recipient, it can make a red Wikibase link blue. When any Wikipedia adds the Lowell Thomas Award as a link, all the information can be populated from Wikidata making it much easier to have sanity checks indicating where data may be right or wrong..

  • Hidden data in redirection articles are given an additional use
  • Data available in multiple Wikipedias is actually shared making knowledge more complete
  • Data only available in one Wikipedia becomes more generally available
So what is not to like?
Thanks,
      GerardM

Thursday, February 29, 2024

A Red&Blue Wikibase for the red, blue and black wikilinks of each @Wikipedia

Wikipedia uses blue links to maneuver between its articles. When there is no article it is called a "red link". This text based functionality works reasonably well but it has important limitations.

  • article names are constructs that makes them unique
  • disambiguation pages need to be maintained
  • there are false positives linking to the wrong articles

When you know your Wikipedia history well, one of the most effective innovations was to remove the interwiki links from the Wikipedias and replace them with links to Wikidata. Wikidata makes use of identifiers and as a consequence the change of an article name has no effect, this ensures that articles on the same subject remain properly linked.

The Wikidata project uses the Wikibase software and this enables the "federation" of multiple databases. This means that data may exist in multiple databases but it all work together. 

Suppose that you replace both the blue links and the red links in a Wikipedia with identifiers of a separate Wikibase. Almost all blue links will implicitly be linked to a Wikidata item and Wikidata already knows about the relations between blue links it has items for. Consequently a Wikipedia Red&Blue Wikibase will be richly populated from the start.

Every Wikipedia remains autonomous and we keep it that way. But we DO know more at Wikidata because it is a superset of all Wikipedias. So when a Wikipedia knows about an award, so does Wikidata. When Wikidata knows about more recipients, it is suggested to include them as red links. It must be a suggestion because a Wikipedia may have another script, another naming convention for names and this has to be correct before it becomes available as text in the Wikipedia proper. 

When a label is correct for a Wikipedia, it is obvious that there is to be a link to the item AND that the label can be used for that language as well. With 200+ Wikipedias enriching Wikidata in this way, both the multilingual and the multicultural quality & quantity of Wikidata will sky rocket.

  • Wikipedias remain autonomous in their content
  • Wikidata will progress from a technically multi lingual project to a functional multi lingual project
  • Disambiguation will be technically available for all accepted Red&Blue labels
  • Known relations with a reference will be available with a reference to every Wikipedia.

So what is not to like?

Thanks,  GerardM

Saturday, February 17, 2024

Be both Anthony G. and Αντώνης Γ. Καφάτος as a scientist and have an ORCiD identifier

Anthony G. Kafatos is a co-author on many papers that are part of the "Seven Countries Study". When you want to know about the many papers he was involved in, it helps when they are all linked. The papers known at Wikidata are linked to his item. When papers are still known as a string, an "author name string", they are hard to spot AND they may be spelled differently AND even be in a different script.

Anthony was also spelled as Antony.. Both work in the same department at the same University making it safe to consider them the same. Someone has to decide, this time it was me. That is not great because what do I know. One alternative is that nothing gets decided but it is much better when scientists themselves are involved.

Data is an ecosystem. Best is when any and all scientists have one ORCiD identifier and authorise the institutions they trust to update their profile with their latest and greatest work. This has profound implications. This data will now be available for many applications including Wikidata. It will become easier to understand what the neutral point of view on a subject is.

This is the Scholia for Mr Kafatos. At this time there are 18 links to papers on the "Seven Nations Study", four more than for Mr Ancel Keys the architect of the study. 

Thanks, GerardM

Friday, February 16, 2024

Food for thought; statistics and Wikidata - DONT BE A KARELIAN

The lumberjacks in Karelia Finland got all the physical activity you can expect for lumberjacks, they looked the part and they died in droves before their fifties. This was as well known in the world of health scientists as well as the fact that in Japan people had the least problems with heart failure. Epidemiologists started one of the most famous studies, the "Seven Countries Study" to learn about these phenomenon. The Karelians ate a lot of meat and butter, this caused arthrosclerosis and it was identified as the cause of all these early demises. 

The Finish government wanted this to change, the lumberjacks loved their meat but their wives loved their hubbies more and they started them on a different diet. The government did a double blind research project and the fine Karelian gentlemen started to outperform their fellow Fins... As a consequence the Finnish government promoted healthy food to all Fins.

In Wikidata we have MANY scientific publications with "Seven Countries Study" in the name of the publications. With more than 100 such publications tagged, many authors, publications and subjects have become apparent. This can be seen in the Scholia for the Seven Countries Study. Statistically it is likely that when another 100 publications are added, the patterns found may slightly differ. Additional authors may be represented but the relative weight of existing authors is likely to remain the same. 

Ancel Keys is the architect of the Seven Countries Study, he authored both papers and books with many publications and publishers and he collaborated with many of the most prominent scientists in his time. The results of all these published studies are profound and not only for the Karelian lumberjacks. Not everybody is happy with the results. Influencers have us believe that Mr Keys misrepresented the facts of the study. However, when you look at the co-author graph, Mr Keys is not really central to all the collaborations. It is also obvious that there were many different publishers involved. 

The meat of the matter is obvious. Don't be a Karelian of centuries past, be smart, be there for your nearest and dearest and understand that a traditional Japanese diet or the Mediterranean diet gives you more mileage. The Seven Countries Study had a run for over fifty years, it knows about what people ate and the mortality that is the consequence of their diet. You can ignore this at your own peril :)

Thanks, GerardM

Saturday, January 20, 2024

A #Netflix documentary, #Youtube reviews and a more #NPOV @Wikidata reaction

I really enjoyed watching "You are what you eat", a Netflix four part documentary based on research of the differences found between a vegan and an omnivorous diet in identical twins. The results of this research can be found in a paper called "Cardiometabolic Effects of Omnivorous vs Vegan Diets in Identical Twins". 

The documentary has several story lines, one is about the research itself, another informs about participants in the study and finally we are informed about the industry that produces our food. The chosen participants are a vehicle for the story, there were chefs, athletes cheese aficionados and people from other cultures (seen from an US-American perspective). What people eat is produced so we are informed about the food industry. The picture painted is not pretty but based in facts.

On YouTube there are several "reviews" and now some reviews as well. All of the "reviews" are really disappointing because they express expectations that are not realistic. The program is NOT about only the science and it is NOT giving equal weight to the production of fish or meat. The results of the research are favorable to a vegan diet and the documentary provides information on what is available when less or no meat is eaten. It is why we learn about the quality of vegan cheese and meat products. Great cheeses and a biltong that is not meat based are explored by participants of the study. 

I found the YouTube "reviews" disappointing because they came across as hatchet jobs. When they consider the documentary biased, it finds its basis in the bias of the reviewer and not necessarily on the results of the research. When it is said that these reviews were requested by "so many people", it feels like that people in the agro business exposed their hand. 

Wikipedia has the article on the documentary and it has an article on the principal author of the paper. They have an appropriate neutral point of view.

My Wikidata reaction is that I added the paper to Wikidata, I added many of its authors and many of the papers cited as references and to be brutally honest, seen from within Wikidata it looks awful, it is one dimensional, it is unusable. However thanks to tools the full impact of available information becomes available. Scholia is my preferred tools for science. This is the Scholia for the paper.
Thanks,
      GerardM

Saturday, January 06, 2024

A Scholia for "water fluoridation"

Some topics are poisonous. People have a set point of view; hell or high water they will budge from their position. Even Wikipedia with its "neutral point of view" makes no dent in their preoccupation. So why argue?

Wikipedia is known for its references to sources and Wikidata is great at connecting these sources together. Particularly scholarly papers with a "DOI" may link to authors, cites works and works citing a paper. When a paper is of particular interest, you can expand the information in all these ways.

So I did not get into an argument about "water fluoridation", I included papers mentioned to Wikidata. I linked some papers with "water fluoridation" in its title to the subject. I attributed papers to authors including one by the Surgeon General of the United States..

Everything that was done on the subject is reflected in the Scholia for the subject. It suffices for me as my participation in an endless argument.

Thanks, GerardM