Saturday, June 22, 2019

Bulk uploads linked to @ORCID_Org and others, then what

Bulk uploads to @Wikidata happen all the time, for instance the latest medical publications. They result in links to existing scholars and new authors. The question: "then what" was raised on Twitter and in the question was the assumption of a quantitative reply.

When such data is imported in Wikidata it does not fall into a vacuum. Many notable scientists are already known because they have a Wikipedia article and because they are linked to "authorities" like ORCiD, VIAF, Google Scholar and many others. The result is a "Scholia" for a scholar and it includes all the known papers, the co-authors, dates of awards. This is one example of a scholar without a Wikipedia article.

Scholia is a very important tool as it enables more work on scholars. The display of co-authors for instance show their gender. Orange for women, blue for men and white when it is not known. Many people are involved in "Women in Red" writing new articles about lady scientists. On the project page of Women in Red you will find lists that are the result of queries run on Wikidata. This is why adding gender info is so important. Notability may be inferred from the awards people received, notability gains relevance when it does not stand alone. This is why a link to "authorities" establish the necessary notability for a Wikipedia article. Objectively this is best presented in a Scholia like the example of Elizabeth Barrett-Connor.

When attention is given to a scholar like Mrs Barrett-Conor, arguably the "ungendered" scholars are relatively new to Wikidata and typically incomplete. There is a tool for that; SourceMD adds missing papers and links to existing papers. It also adds links to known authors and adds missing authors. The effect is a network of information that is increasingly rich. Arguably this is a bulk upload in its own right but the origin is a different one.

Presentations on topics like awards, organisations, topics and much more are available from the Scholia tool. In such a presentation it shows what we have and given that Wikidata is a wiki, there is more to know. Award winners may be enriched with authority information, they may be linked to papers. Frequent publishers to a topic may have co-authors that could do with some TLC.

In answer to the original question; bulk uploads invite additional work, the data is enriched and becomes increasingly relevant.

Saturday, June 15, 2019

@Wikipedia - could give a clue to #deleted articles

Even deleted Wikipedia articles have "false friends". In a list of award winners a Mr Markku Laakso used to have an article. This Mr Laakso was actually a conductor and not the diabetes researcher the list was there for. For whatever reason, the article for the conductor was deemed to be not notable and it was deleted.

When you are NOT a Wikipedia admin, there is no way to know what was deleted.

One solution is for all blue, red and black links to refer to Wikidata items. When an article is deleted, the Wikidata item is still there making it easy to prevent cases of mistaken identity like with Mr and Mr Laakso.

A more expanded proposal you may find here.

Monday, June 10, 2019

@Wikipedia: #notability versus #relevance

I had a good discussion with imho a deletionist Wikipedia admin. For me the biggest take away was how notability is in the way of relevance.

With statements made like: "There are only two options, one is that the same standards apply, and the other is the perpetuation of prejudice" and "I view our decisions of notability as primarily subjective--decisions based on individual values and understandings of what WP should be like" no/little room is given for contrary points of view.

Notability has as its problem that it enables such a personal POV while relevance is about what others want to read. For a professor Bart O. Roep there is no article. Given two relevant diabetes related awards he should be notable and as he started a human study for a vaccine for diabetes type 1, he should be extremely relevant.

A personal POV ignoring the science that is in the news has its dangers. It is easy enough for Wikimedians to learn about scientific credentials, the papers are there to read but what we write is not for us but for our public. Withholding articles opens our public up to fake facts and fake science. An article about Mr Roep is therefore relevant and timely particularly because people die as they cannot afford their insulin. Articles about the best of what science has to bring about diabetes now is of extreme relevance.

At Wikidata, there is no notability issue. Given the relevance of diabetes all that is needed is to concentrate effort for a few days on a subject. New authors and papers are connected to what we already have, genders are added to authors (to document the gender ratio) and as a result more objective facts available for the subjective Wikipedia admins to consider, particularly when they accept tooling like Scholia to open up the available data.

Sunday, June 09, 2019

#Wikidata - Exposing #Diabetes #Research

People die of diabetes when they cannot afford their insulin. There is not much that I can do about it but I can work in Wikidata on the scholars, the awards, the papers that are published that have to do with diabetes. The Wikidata tools that are important in this are: Reasonator, Scholia and SourceMD and the ORCiD, Google Scholar and VIAF websites prove themselves to be essential as well.

One way to stay focused is by concentrating on awards and, at this time it is the Minkowski Prize, it is conferred by the European Association for the Study of Diabetes. The list of award winners was already complete so I concentrated on their papers and co-authors. The first thing to do is to check if there is an ORCiD identifier and if that ORCiD identifier is already known in Wikidata, I found that it often is and merges of Wikidata items may follow. I then submit a SourceMD job to update that author and its co-authors.

The next (manual) step is about gender ratios. Scholia includes a graphic representation of co-authors and for all the "white" ones no gender has been entered. The process is as follows: when the gender is "obvious", it is just added. For an "Andrea" you look them up in Google and add what you think you see. When a name is given as "A. Winkowsky", you check ORCiD for a full name and iterate the process.

Once the SourceMD job is done, chances are that you have to start the gender process again because of new co-authors. Thomas Yates is a good example of a new co-author, already with a sizable amount of papers (95) to his name but not complete (417). Thomas is a "male".

What I achieve is an increasingly rich coverage of everything related to diabetes. The checks and balances ensure a high quality. And as more data is included in Wikidata, people who query will gain a better result.

What I personally do NOT do is add authors without an ORCiD identifier. It takes much more effort and chances of getting it wrong make it unattractive as well. In addition, I care for science but when people are not "Open" about their work I am quite happy for their colleagues to get the recognition they deserve.

Thursday, June 06, 2019

Perspectives on #references, #citations

Wikipedia articles, scientific papers and some books have them: citations. Depending on your outlook, citations serve a different purpose. They exist to prove a point or to enable further reading. These differing purposes are not without friction.

In science, it makes sense to cite the original research establishing a fact. Important because when such a fact is retracted, the whole chain of citing papers may need to be reconsidered. In a Wikipedia article it is imho a bit different. For many people references are next level reading material and therefore a well written text expanding on the article is to be preferred, it helps bring things together.

When you consider the points made in a book to be important, like the (many) points made in Superior, the book by Angela Saini, you can expand the Wikidata item for the book by including its citations. It is one way to underline a point because those who seek such information will find a lot of additional reading and confirmation for the points made.

Adding citations in Wikidata often means that the sources and its authors are to be introduced. It takes some doing and by adding DOI, ORCiD, VIAF, and or Google Scholar data it is easy to make future connections. When you care to add citations to this book with me, this is my project page.