Words and what not: April 2019

Tuesday, April 23, 2019

Scopus is "off side"

At Wikidata we have all kinds of identifiers for all kinds of subjects. All of them aim to provide unique identifiers and the value of Wikidata is that it brings them together; allowing to combine the information of multiple sources about the same subject.

Scientists may have a Scopus identifier. In Wikidata Scopus is very much a second rate system because to learn what identifiers goes with what people requires jumping through proprietary hoops. Scopus is the pay wall, it has its own advertising budget and consequently it does not need the effort of me and volunteers like me to put the spotlight on the science it holds for ransom. When we come across Scopus identifiers we include them but Scopus identifiers are second class citizens.

At Wikipedia we have been blind sighted by scientists who gained awards, became instant sensations because of their accomplishments. For me this is largely the effect of us not knowing who they are, their work. Thanks to ORCiD, we increasingly know about more and more scientists and their work. When we don't know of them, when their work is hidden from the real world, I don't mind. When we know about them and their work in Wikidata it is different. It is when we could/should know their notability.
Thanks,
GerardM

Sunday, April 14, 2019

The Bandwidth of Katie Bouman

First things first, yes, many people were involved in everything it took to make the picture of a black hole. However, the reason why it is justified that Katie Bouman is the face of this scientific novelty is because she developed the algorithms needed to distill the image from the data. To give you a clue about the magnitude of the problem she solved; the data was physically shipped on hard drives from multiple observatories. For big science, the Internet often cannot cope.

There are eternal arguments why people are notable in Wikipedia. For a lot of that knowledge a static environment like Wikipedia is not appropriate and this environment is causing a lot of those arguments. To come back to Katie, eh every scientist, their work is collaborative and much of it is condensed into "scientific papers". One of the black hole papers is "First M87 Event Horizon Telescope Results. I. The Shadow of the Supermassive Black Hole". There are many authors to this paper not only "Katherine L. Bouman". When a major event like a first picture of a black hole is added, it is understandable that a paper like this is at first attributed to a single author..

Wikimedia projects have to deal with the ramifications of science for many reasons. The most obvious one is that papers are used for citations. To do this properly, it is science who defines what is written and not selected papers to support an opinion. The public is invited to read these papers and the current Wikipedia narrative is in the single papers, single points of view. This makes some sense because the presentation is static. In Wikidata the papers on any given topic are continuously expanded, the same needs to be true for papers by any given author. Technically a Wikipedia could use Wikidata as the source for publications on a subject or by an author. The author could be Katie Bouman and proper presentations make it obvious that the pictures of a black hole were a group effort with Katie responsible for the algorithms.
Thanks,
GerardM

Tuesday, April 09, 2019

@Wikidata is no relational #database

When you consider the functionality of Wikidata, it is important to appreciate it is not a relational database. As a consequence there is no implicit way to enforce restrictions. Emulating relational restrictions fail because it is not possible to check in real time what it is that is to be restricted.

An example: in a process new items are created when there is no item available with an external identifier. Query indicates that there is no item in existence and a new item is created. A few moments later the existence of an item with the same external identifier is checked using query. Because of the time lag that exists, what is known to be in the database and what actually is in the database differs and query indicates there is no item and a new but duplicate item is created.

Implications are important.

Wikidata is a wiki. The implications are quite different. In a wiki things need not be perfect, and the restrictions of a relational model are in essence recommendations only. In such a model duplicate items as described above are not a real problem, batch jobs may merge these items when they occur often enough. Processes may use arrays knowing the items it created earlier and thereby minimising the issue.

Important is that we do not blame people for what Wikidata is not and accept its limitations. Functionality like SourceMD enable what Wikidata may become; a link to all knowledge. Never mind if it is knowledge in Wikipedia articles, scholarly articles or in sources used to prove whatever point.
Thanks,
GerardM