Sunday, May 19, 2019

#Scholia: on the "requirement" of completeness

Scholia, the presentation of scholarly information on authors, papers, universities, awards et al is at this time not included in the "Authority control" part of a Wikipedia article. The reason I understand is because Wikipedians "that matter" insist that its information is to be complete.

That is imho utter balderdash.

The first argument is the Wiki principle itself. Things do not need to be complete, in the Wiki world it is all about the work that is underway. The second is in the information that it provides: its information is arguably superior to what a Wikipedia article provides on the corpus of papers written by an author. The third is that with the prospect of all references of all Wikipedias ending up in Wikidata, value is added when a paper can be seen in relation to its authors and citations. It matters when it is known what citations a paper is said to support. It matters that we know the papers that are retracted. The fourth argument is in the  maths of it all; typically scientific papers have multiple authors. It takes only one author with an ORCiD identifier to get its papers included. The other authors have not been open about their work, it is their own doing why they are not known in the most read corpus on the planet. They still exist but as "author strings". When a kind soul wants to remove them from obscurity they can.

As to the "Katie Bouman"s among them? There are many fine people that are equally deserving, that have not been recognised yet for their relevance. Fine people that have a public ORCiD record. For them it is feasible to have their Scholia ready when they are recognised. For the others, well it is not a Pokemon game, it is a Wiki.
Thanks,
      GerardM

Sunday, May 12, 2019

@Wikidata Women in science - Lesley Wyborn

For Lesley Wyborn a Wikipedia article exists. She "built an international reputation for innovative leadership in geoinformatics and global e-research, particularly in the geoscience area" according to the motivation for the "Outstanding Contributions in Geoinformatics" award. Notability, no issue.

When the article was written in 2016, no attention was given to the "authority control" and consequently in 2018 an additional item was created with an ORCID identifier. In 2019 additional work was done and the two items were merged. A Google Scholar identifier and the award was added potentially addressing the issues raised on the Wikipedia article.

Arguably both the Wikidata and the Wikipedia information could be more informative. However, given that both are Wikis that is quite acceptable. It is quite likely that many more papers are already on Wikidata and just need attribution. That is something for others to do.. we are a community remember.
Thanks,
     GerardM

Thursday, May 09, 2019

How and when I trust science, when would you trust science?

When something drops on my head, it is gravity that brings it down. When I travel to the USA, the shortest route is over Iceland, the world is round. I did not get polio, measles or whooping cough, my parents had me vaccinated. I worked in computing, most of the women were better then the men, my observation, I am happy working for women.

When I read articles in Wikipedia, I know that I can trust it up to a certain level because there are citations indicating that something is true or that a given opinion is held. Its neutral point of view means that equal weight is to be given to opinions but not when it flies in the face of proven facts, the science about a subject. The best news; when scientific papers are retracted, we start to know about this and act upon it in Wikipedia. The nonsense, the preconceptions, the paid for science is to be removed once it is retracted.

In the Netherlands a prominent scientist has been tasked to root out those medical practices that are proven not to work. His work will be hard he will have to deal with vested interests, ingrained practices and a public that wants everything to be as expected. People will still be vaccinated, some medications will no longer be available, some treatments will not be there, they do not work even when you are desperate for them to work..

That is me, now you, when can you trust.. Well it is good to be wary, just consider the numbers. When a politician says he was effective because many drug dealers went to jail, ask yourself why should they be in jail, did your community end up safer? If not, not much was achieved. When scientific papers show the numbers of junks go down when substance dependence is treated as a medical and not as a criminal issue. Wonder what this meant for the communities these people come from. Seek out the numbers and you are no longer talking politics but considering the science of it.

A lot of so called science defends points of view that do not fit facts on the ground. This can be tricky/tough to understand because the difference may be local versus global. World wide, temperatures go up. Our climate is no longer stable and yes, in the USA it has been cold lately.. not so in Europe, Africa, Asia. One thing to consider, is it truly science, peer reviewed and everything or is it to shore up a point of view.. A tell tale is when it is from a "research institute" / "policy institute" paid for by an interested party.
Thanks,
      GerardM

Tuesday, April 23, 2019

Scopus is "off side"

At Wikidata we have all kinds of identifiers for all kinds of subjects. All of them aim to provide unique identifiers and the value of Wikidata is that it brings them together; allowing to combine the information of multiple sources about the same subject.

Scientists may have a Scopus identifier. In Wikidata Scopus is very much a second rate system because to learn what identifiers goes with what people requires jumping through proprietary hoops. Scopus is the pay wall, it has its own advertising budget and consequently it does not need the effort of me and volunteers like me to put the spotlight on the science it holds for ransom. When we come across Scopus identifiers we  include them but Scopus identifiers are second class citizens.

At Wikipedia we have been blind sighted by scientists who gained awards, became instant sensations because of their accomplishments. For me this is largely the effect of us not knowing who they are, their work. Thanks to ORCiD, we increasingly know about more and more scientists and their work. When we don't know of them, when their work is hidden from the real world, I don't mind. When we know about them and their work in Wikidata it is different. It is when we could/should know their notability.
Thanks,
      GerardM

Sunday, April 14, 2019

The Bandwidth of Katie Bouman

First things first, yes, many people were involved in everything it took to make the picture of a black hole. However, the reason why it is justified that Katie Bouman is the face of this scientific novelty is because she developed the algorithms needed to distill the image from the data. To give you a clue about the magnitude of the problem she solved; the data was physically shipped on hard drives from multiple observatories. For big science, the Internet often cannot cope.

There are eternal arguments why people are notable in Wikipedia. For a lot of that knowledge a static environment like Wikipedia is not appropriate and this environment is causing a lot of those arguments. To come back to Katie, eh every scientist, their work is collaborative and much of it is condensed into "scientific papers". One of the black hole papers is "First M87 Event Horizon Telescope Results. I. The Shadow of the Supermassive Black Hole". There are many authors to this paper not only "Katherine L. Bouman". When a major event like a first picture of a black hole is added, it is understandable that a paper like this is at first attributed to a single author..

Wikimedia projects have to deal with the ramifications of science for many reasons. The most obvious one is that papers are used for citations. To do this properly, it is science who defines what is written and not selected papers to support an opinion. The public is invited to read these papers and the current Wikipedia narrative is in the single papers, single points of view. This makes some sense because the presentation is static. In Wikidata the papers on any given topic are continuously expanded, the same needs to be true for papers by any given author. Technically a Wikipedia could use Wikidata as the source for publications on a subject or by an author. The author could be Katie Bouman and proper presentations make it obvious that the pictures of a black hole were a group effort with Katie responsible for the algorithms.
Thanks,
       GerardM

Tuesday, April 09, 2019

@Wikidata is no relational #database

When you consider the functionality of Wikidata, it is important to appreciate it is not a relational database. As a consequence there is no implicit way to enforce restrictions. Emulating relational restrictions fail because it is not possible to check in real time what it is that is to be restricted.

An example: in a process new items are created when there is no item available with an external identifier. Query indicates that there is no item in existence and a new item is created. A few moments later the existence of an item with the same external identifier is checked using query. Because of the time lag that exists, what is known to be in the database and what actually is in the database differs and query indicates there is no item and a new but duplicate item is created.

Implications are important.

Wikidata is a wiki. The implications are quite different. In a wiki things need not be perfect, and the restrictions of a relational model are in essence recommendations only. In such a model duplicate items as described above are not a real problem, batch jobs may merge these items when they occur often enough. Processes may use arrays knowing the items it created earlier and thereby minimising the issue.

Important is that we do not blame people for what Wikidata is not and accept its limitations. Functionality like SourceMD enable what Wikidata may become; a link to all knowledge. Never mind if it is knowledge in Wikipedia articles, scholarly articles or in sources used to prove whatever point.
Thanks,
      GerardM

Sunday, March 24, 2019

#Sharing in the Sum of all #Knowledge from a @Wikimedia perspective II

When we are to share in the "sum of all knowledge" we share what we know about subjects; articles, pictures, data. We may share what knowledge we have, what others have and that is what it takes  for us to share in the sum of all knowledge. The question is why should we share all this, how to go about it and finally how will it benefit our public and how will it help us share the sum of all knowledge.

At the moment we do not really know what people are looking for. One reason is that search engines like the ones by Google, Microsoft and DuckDuckGo recommend Wikipedia articles and as a consequence the search process is hidden from us. We do not know what people really are looking for. However, some people prefer the "Wikipedia search engine" in their browser. We can do better and present more interesting search results. From a statistical point of view, we do not need big numbers to gain significant results.

When we check what the "competition" does we find their results in many tabs; "the web" and "images" are the first two. The first is text based and offers whatever there is on the web. What we will bring is whatever we and organisations we partner with, have to offer. It will be centered on subjects and its associated factoids presented in any language.

One template to consider is how Scholia presents. It differs. It depends on whether it is a publication, a university, a scholar, a paper. Large numbers make specific presentations feasible and thanks to Wikidata we know what kind of presentation fits a particular subject. A similar approach is possible for sports, politics. It takes experimentation and that is what makes it a Wiki approach.

Thanks to this subject based approach, language plays a different role. Vital is that for finding the subjects potentially differing labels are available or become available. One important difference with the Google, Microsoft or DuckDuckGo approach is that as a Wiki, we can ask people to add labels and missing statements. This will make our subject based data better understood in the languages people support. Yes, we can ask people to have a Wikimedia profile and yes, we may ask people to support us where we think people looking for information have to overcome hurdles.
Thanks,
       GerardM