Wednesday, December 26, 2018

Professor @steve_hanke and reading what is #FAIR

Professor Hanke is on Twitter. He has his five Wikipedia articles and his info on Wikidata is well developed. With a scholar of his eminence, you would expect a lot of known publications as well. However, never mind the 153 English Wikipedia references, never mind the links to 13 external authorities, finding his work is not easy nor obvious.

The problem with Wikipedia references, it is a hodgepodge of links about him and links to his works. His VIAF registration may bring you some of his works but it will not tell you where his books are cited. Mr Hanke does not have an ORCID identifier and consequently it is not easy to include his data on Wikidata.

This is not about Mr Hanke; in certain fields of science people do not have an ORCID identifier or are not open about their publications. When you are interested in a specific subject or a specific scientist, it helps when the information is FAIR.

So what is missing; there is this database with all Wikipedia references, it needs to be included in Wikidata as soon as possible. It may require a fair deal of social manoeuvring to include all Wikipedia references to Wikidata. But the benefits; the benefits will be huge. Given that Wikipedia references are backed up by the Internet Archive, this will extend for these links in Wikidata as well. It makes them FINDABLE and ACCESSIBLE. At Wikidata, this data becomes INTEROPERABLE and REUSABLE (FAIR).

So my 2019 wish for the Wikimedia Foundation is to become FAIR in what it says and what it does.

Tuesday, December 25, 2018

Dear Katherine: Socialization Tactics in Wikipedia and Their Effects

In the contract of Wikimedia employees it says that they are not allowed to blow their own horn in any of the Wikimedia projects. It is according to a very senior Wikimedia official why they cannot add/contribute to information to scientific papers like Socialization Tactics in Wikipedia and Their Effects in Wikidata.

Dear Katherine, you will agree with me that this is a perverse effect of a well intentioned item in personnel contracts. So let me tell you more about the effects and how we can overcome this issue.

As you know, there is a thriving research community and its recorded presentations showcase the  research on Wikimedia projects. These presentations are recorded and may be found on YouTube. Typically these showcases are based on scientific papers. They should be recorded in Wikidata with all the details like it is done for any and all subjects. When a paper is properly covered, we know all its authors, the papers it cites and in time the papers who in turn cite the paper. When an author is well covered, we know every paper published, co-authors, subjects, subjects, citing authors. We know this because of Scholia. Scholia is what prevents Wikidata from being a stamp collection, Scholia is what makes a subject come alive, it is what brings data together, makes it digestible and gives it relevance.

Not so for subjects relating to Wikimedia apparently for contractual reasons. There are several strategies to overcome this. But first let us decide what we are, what we do and why this matters.

Wikimedia is a publisher of scientific papers; currently there are three and in order to raise the impact of the papers it publishes, they have to gain visibility. To do this we can associate with ORCID, and publish and certify all the details of papers to its authors. One of the things we do on a big scale, is re-publish data from ORCID. They have a program whereby they can sync their information with ours.. They collaborate with Crossref and so could we. When we do, we make Open Science much more visible.

Dear Katherine, what we have shown is that we can and do care about publications, about citations. We care about science. The least we want is our own research to be presented the best we can. In order to achieve this we have to consider the unintended impact of a provision in a labour contract and overcome this self inflicted barricade.

Tuesday, December 18, 2018

#Wikidata and the papers of Professor Wiesje van der Flier

Professor van der Flier has an ORCID identifier. She works at the Neurology/ Alzheimer center of the VU University Medical Center.

Mrs van der Flier has in Iris E. Sommer a co-author. We know that they have at least one co-autor in Edwin van Dellen. There may be more and we will certainly know for those co-authors that are as open about their work. Professor Sommer was the initial interest because she is a member of "de Jonge Akademie".

We will know because they have an ORCID identifier. At Wikidata it serves two vital functions; it helps with disambiguation, a job was ran for all people with the surname "Li"... Given that ORCID allows people to share their information publicly, it allows us to import the publications of authors and identify their equally open co-authors.

The Scholia page for Professor van der Flier knew 31 people who were certainly knew to me. They are being processed and chances are that at the end of it Mrs van der Flier will know more co-authors, more papers and her representation in Wikidata will be more complete.

Yes, it will only know the co-authors that are open about their work but, that is only FAIR.

Sunday, December 09, 2018

#Science; I can read

The basis for what Wikipedia articles offers are its sources. Those sources can be anything and when we want to know the veracity of what we read, the sources have to be available. Not only that, we rely on those sources to be consistent and we rely on those sources to be readable.

When sources are on the web, the Internet Archive will have iterations of a source available in its Wayback machine. It ensures that sources remain available and thereby much of the integrity of Wikipedia is maintained.

For scientific sources we are unlucky. Reading a scientific paper can set you back $45,- and it only allows you to read that paper for a day.. In effect all such papers cannot be read; we "have to" trust them and there are plenty of papers that are extremely problematic and also expensive to read.

Many papers are increasingly FAIR. They are Findable, Accessible, Interoperable and Reusable. The best first line partners we have are again the Internet Archive and ORCiD. Organisations like the Biodiversity Heritage Library store scientific papers at the IA thereby making them available for as long as the IA exists. ORCiD is where living scientists identify themselves and if they so choose, the publications they (co-)authored. It makes them and/or their papers findable. The papers typically include a DOI making them accessible. After that it is anyone's guess if you can actually read them.

Scientists that are open about their work may find that they and their work found its way into Wikidata. For Karsten Suhre this was done; his scientific work is represented in his Scholia and many of his co-authors have been automatically added from ORCiD and have been processed as well. His co-authors that are not as open are largely missing but that is only Fair; I do not volunteer to promote them.

What Wikidata has is not representative of all of science but it increasingly represents the science that is open access, the science that I can read, that you can read that is for all of us there to read. The science that deserves to be used as sources in Wikipedia. We can read.

Thursday, November 15, 2018

Bringing more #science to @Wikidata

Slowly but surely more scientific papers and their authors find their way into Wikidata. Particularly when scientists have staked their claims in ORCiD, adding is easy and obvious.

It is easy because in ORCiD every author, paper, organisation et al have their own unique identifiers. So when you add a paper, all authors who claimed to be author are already linked.

Earlier today, I added papers and co-authors for Jaume Piera. As a consequence Laura Recasens was added today and as you can see in the illustration of her co-authors, several new authors popped up as a consequence.

To do this I use a combination of tools. Reasonator is my preferred tool to display data; for scientists it tells me if he or she is known to be an author. When there are, Scholia presents the scholarly author information. Of particular relevance to me is the co-author presentation. For co-authors shown in white, no gender is given in Wikidata and when the name is an initial and a surname, I will look up the ORCiD information to find a full name. Typically that is how people are known in ORCiD.

I use the SourceMD tool for two purposes; "creating and amending papers for authors" and to "add metadata from ORCiD authors to Wikidata". It is processed in a batch job, I run one job for up to 15 authors at a time and it takes forever to run.

Other people run other jobs, a particular hat tip to Daniel Mietchen who makes sure that recent publications find their way into Wikidata and finds many other reasons to improve on what we have. All this would not be possible without the many tools by Magnus and for Scholia I do thank Finn Årup Nielsen thanks to this evolving presentation, science as a process comes alive.

There is more to do; the Wikipedia citation are in a separate database and much of its data may be found in Wikidata.. Who will merge them. Publications do cite other publications, it is a field I am not really interested in.. They are added so there must be a tool.

When you are interested in a particular scientist, a particular paper.. Just use the tools and slowly but surely we all make Wikidata a great tool to represent science fact.

Saturday, November 10, 2018

More #impact for your #science is in being a #source at @wikipedia

In a study about how students research a new subject it was found that they read the Wikipedia article first. Then they move to its sources and from there it takes off.

In order to have an impact you, as a scientist, wants to be their first getting the attention of your work. There are a few tips.
  • Make sure that you and your work are known. First make your work known at ORCiD. From there it gets into Wikidata
  • PS check out the Scholia presentation of you and your scientific work.. (example)
  • Make sure that your work can be read. Wikipedia actively seeks free reads using the OAbot.
  • Do not think that current practices of your field will benefit new scientists in the future. Many fields are not well represented at ORCiD
For your information. There is a database with the sources used in Wikipedia. The only thing lacking is that this database still needs to be integrated in Wikidata for it to gain a real impact.

Saturday, October 27, 2018

#Library #Science - Prof Dr Frank Huysmans

Mr Huysman's works at the Universiteit Amsterdam. He teaches "Library sciences" and as is usual for a scientist, he has a fair share of publications to his name.

The problem is that this field of science is not well represented in Wikidata. There were no publications to his name. Importing them from ORCiD proved problematic; only four were added out of the 22 known there. Working from what was known, it was possible to add co-authors and enrich those, seek out their co-authors and enrich them as well. The result is the current 40 publications to Mr Huysman's name.

Mr Huysman has both a Twitter and an ORCiD account. Everybody who does, in Wikidata, will have his or her profile in Wikidata updated thanks to a job that is running by Daniel Mietchen. They are the ones who publicly promote their science and in this way they gain some additional credibility.

NB when you have an ORCiD and twitter, tweet #IcanHazWikidata and you will get your Qid.

When you care about your science, do maintain your ORCiD profile because it will make your papers, your co-authors and the organisation for more visible in Wikidata.. Your #Scholia profile will get better and better and chances of being quoted in Wikipedia improve.

Monday, October 22, 2018

#Science - Ladies you work together

Yesterday I singled out a Paola Giardina because she was a co-author of someone who had SO many co-authors, I could not manage the information that was in there. Yesterday Paola had a large number of co-authors that were white (no gender info). Today there are even more present.

One thing is pretty obvious in what I see: women are more likely to work with women than men. When you want to analyse this, it is important to know the data this is based on. At this time 31% of the people with an ORCiD identifier are female. When you consider probability, it is likely that some 31% of people who have not been associated yet with a gender will be female as well.

In many universities the percentage of women studying is more than 50%. All of them get involved in research. All students are involved in the production of papers and all of them are entitled to their ORCiD and to their Wikidata identifier.

So when we want to express the notability of women in modern science, all we have to do is ask any and all scientists to make their publication details part of the open record. Slowly but surely, it will become obvious who and where the best science is produced and who collaborates with whom.

Saturday, October 20, 2018

#Accepting science; the solution is in the reading not the publishing

The most important thing religion has over science? Its papers can be read. Sources like the Bible, he Quran can be read for free. You can get *your* copy from many true believers. A copy is in your library. With science the papers that can prove to you that goldfish should be classified as endangered are behind a paywall. It is only your common sense that might say: "Hey, wait a minute.."

When Wikipedia insists on its sources, they are only functional when these sources can actually be read. This is why the Internet Archive plays such a vital role in maintaining the validity of stated facts.

Some scientists think that "the public" cannot read scientific papers. They forget that even for scientists a paper that cannot be read is a paper that does not exist in their contemplations. The public does read scientific papers. The Cochrane crowd for instance reads papers and checks particular premises for validity.. We know that scientific research of coronary disease was biased for males and as a consequence women still die. A bias like that is what they look for, it is why they reject many papers because they are basically *not* valid.

There is a lot to do about what scientific publishing should be. How it should be funded.. The base line is that when a publication is not available for anyone to read, the facts do not matter. Why believe vaccines are safe when the publications that prove it are behind a paywall?

Friday, October 19, 2018

#Wikidata - the missing #Elsevier papers

It started with a Twitter tweet.. "There is also a professor Elsevier". A search found that Professor Cornelis J. Elsevier works at the "Universiteit of Amsterdam". He did not exist at Wikidata and there was only one paper to be found for him.

Adding this one paper was done with the "Resolve Authors" tool. The Scholia tool for Mr Elsevier showed a few co-authors and in addition to this several "missing co-authors" could be found.

In order to show more papers for Mr Elsevier, more papers needed to be imported into Wikidata. This can be done for authors with an ORCiD identifier, particularly the ones with no known gender. So far they did not get much TLC. Just running the "SourceMD tool" for them will add additional papers and associate other authors to these papers as well.

This is an iterative process and I focused for no particular reason on Mrs Barbara Milani. Processing her co-authors meant that more co-authors came out of the woodwork. At this time, 13 new authors with an ORCiD identifier popped up. Once they are processed more papers will be known to Wikidata and given their relation to Mrs Milani a reasonable chance that these papers link to Mr Elsevier as well.

At this time Mr Elsevier is known to have 7 publications.

Sunday, October 14, 2018

#Wikidata - the #heart of women differs from the heart of men

The assumption that the heart of women and the heart of men are the same proved to be lethal. The "Hartstichting" is a Dutch charity that raises funds to combat heart disease. One of its studies is done by professor Hester den Ruijter of the Utrecht Medical Centre. Her study aims to map those differences and it is part of an effort to provide equal quality medical support for heart matters for both genders.

As a scientist, Mrs den Ruijter was involved in the production of many scholarly papers with many co-authors and this is best presented by Scholia. Yesterday Mrs den Ruijter was only known to Wikidata through her papers. Today she has her own item, the papers have been associated with her and so have been many of her co-authors. Many other authors have their own item who are associated with the research that indicates how the heart and its diseases differs between the genders and differs based on ethnic background.

It is vital to recognise these differences, survival relies on it.

Sunday, October 07, 2018

#WikiCite - Thank you #Orcid ! - #IcanHazWikidata

The question "I have an ORCiD profile, how do I get it in Wikidata" was asked on Twitter. Using Magnus's tool public information was imported and as a result information can be shown in Scholia.

Paolo Cignoni made a request using the #IcanHazWikidata hash tag and his papers were imported and it shows nicely in Scholia. It includes several of his co-authors, for the ones in white we have no indication for their gender in Wikidata. That is easy to fix.

There are probably a lot of co-authors missing.. One way of finding the missing co-authors is by adding "/missing" to the Scholia link. You can check for an ORCiD identifier and add a found identifier. You identify the papers already known to Wikidata and they are attributed to the co-author or, to a citing author.. I added a John W Goodby to make the picture more complete. It is easy and mostly obvious what to do.

What makes all this possible? Open data and a bit of effort.. As you can see in the later picture, just running Magnus's tool for a few co-authors changes the outlook considerably.

Are you a scholar and do you want to see your initial Scholia information? Just add your Ordid ID in a tweet with the #IcanHazWikidata hash tag.

Wednesday, October 03, 2018

#Wikimedia - Relevance of #science - Kate Ricke

A lot of soul searching happened to determine why Wikipedia failed to notice Donna Strickland only once she received the Nobel Prize.. What is more astounding is that Wikidata failed to include her.. No Scholia information for her and her research. What we have at this is likely to be a subset of the "Stricklands papers".

We do not know who will be seen as a scientist of similar relevance but we do know that a lot of rubbish is floating around.. it is called fake science, fake news and countering this is where big organisations like Google and Facebook rely on the information in Wikipedia.

So Mrs Kate Ricke is another scientist that did not get Wikipedia attention so far. Mrs Ricke tweeted about her paper Country-level social cost of carbon. It and the papers produced by her and her co-authors are quite potent.

When you learn about a paper like this, you can add it and its authors to Wikidata. When Orcid has information about other papers, you can import these papers as well building on the web of science about of one of the most important subjects of our time. In addition co-authors of these other papers can be included as well as the authors citing these papers.

When relevance is given to the science of a subject like climate science, it becomes possible to contrast it with what some politicians want us to believe.

Sunday, September 30, 2018

#Wikimedia supported scientific papers supported by #Scholia

There are four scientific journal published by the Wikimedia Foundation they are:
According to an interview, they offer articles available under a free license at no publication cost. With a platform for publications the next thing is to gain notability for the journal and for its authors.

One way to assess the value of these papers is by checking what Scholia has to say about these papers:
Part of the Scholia information are the links to the authors; not only who has been most prolific as an author but also who has been cited the most. There is one caveat; the author needs a Wikidata item and the more complete the information, the more both the journals and the authors gain in notability..

PS Ladies, the ratio men and females is not really what I would expect for a Wiki journal..

[1] The WikJournal of Humanities is under development; its first publication is in the future at this time.

#WikiCite: Edith Abbott; THE REAL JAIL PROBLEM

According to #Wikipedia, Edith Abbott was a pioneer in the profession of social work with an educational background in economics.She published, books and scholarly articles, and "The Real Jail Problem" was singled out in the article for special mention.

There is a link in the article, and I am happy to report that thanks to the Internet Archive, it and some 500.000 more scholarly articles may be found there. They are part of the early JSTOR papers and they are now freely available. It is however not the publication by Mrs Abbott, it is by a Robert H. Gault.

Having scientific papers available is wonderful. It is important and it is unlikely they will get lost. However, there is a difference between being lost and being findable. Mr Gault is notable, there was no mention of him in Wikidata, one of his publications was.. Mr Gault is findable when you look him up in VIAF.

However, the article you want to read, a pamphlet according to Mr Gault is where? A book with the same subject can be found at Open Library. The subject of the book is as potent as it was in the day. The arguments are not dissimilar.

Arguably, when you want to cite your sources, they first have to be knowable, then findable and finally readable. Wikidata is making more and more of a difference. When we collaborate with the Internet Archive, we can make those JSTOR papers findable as well. Slowly but surely all these sources becomes findable. Its authors become notable and we will free us from the curse of only reading the latest research.

Tuesday, September 25, 2018

#Wikidata: Adding credibile info to #Twitter - Margaret Stanley

Twitter, Facebook, organisations like them have this credibility problem. Too many Joes Dicks and dirty Harries poison the well that is the information they provide.

Now meet Margaret Stanley, she is a scientist and, she tweets. Her Twitter name is included in Wikidata and it is highly likely that even though her tweets are her own and, do not indicate the opinion of the institution she works for, what she tweets is credible and well considered.

Margaret is not the only scientist that tweets, there are many of them. More and more of them are included in Wikidata, including their publications, including their twitter handle. One of these scientists, actively encourages female scientists to speak out, seek a platform to encourage women to find their place in science. They twitter, they write Wikipedia articles and they are very much relevant scientists, much of their relevance shines through in their tweets.

Dear Twitter, when scientists have a profile in Wikidata, they personally make a statement. It is theirs. They are as human as everyone else but they are not, as a group, foolish enough to tweet balderdash, nonsense or other stuff you should frown upon.

Tuesday, September 18, 2018

#Wikicite: #Research? Eat your own dogfood!

Another piece of Wiki research has been published and you would not know when you look for it in the Wikiverse. The people who are big in Wiki research run this project called Wikicite. It has multiple aspects; publications and there authors are entered in Wikidata. Blogs are written how wonderful the various aspects are of what is on offer in the growing quality and quantity of data and how well an author may be presented in Scholia.

This is all well and good and indeed there is plenty to cheer about like the documentation about the Zika fever that is included but when a subject that is key to the Wikimedia researchers is not as well represented it will never be good enough.

There is a reason why using your own is so relevant; it shows you where your model fails you. WAll kinds of everything like conference speakers are included, many more female scientists are represented as a result of the indomitable Jess Wade but when new research by Wikimedians, professional Wikimedians, are not included, the effort is not sincere; the people that do go to conferences do not learn from the daily practice and that makes Wikicite stale and mostly academic.

Monday, September 03, 2018

#Wikidata - Presenting Mr Tanvir Hussain

Mr Hussain, a professor at Nottingham University mentioned on Twitter what he would put on his business card. Many of his publications were on Wikidata and after some additional information from Orcid, Mr Hussain looks really good when viewed with Scholia.

He asked me if we could have the Scholia information with a QR code like we do in Reasonator for people with a Wikipedia article.

Having a QR link to information like Scholia would look really good on a business card.. It would also stimulate the colleagues of Mr Hussain to get this well represented.

Tuesday, August 28, 2018

#WikiCite - A #Cochrane Pocketbook #Pregnancy and #Childbirth

In Wikidata many, many publications representing scientific publications from everywhere have found an entry. There are literally millions of publications, in essence it is an increasingly comprehensive stamp collection in need for some structure. Without structure there is no use for it. 

There are many publications about pregnancy and childbirth. When you check out the links, you will find that currently publications about pregnancy are very much dominated with the Zika virus and for childbirth Mr G. Justus Hofmeyr is mentioned as one of the authors important for the subject.

Mr Homeyr is mentioned only because the Cochrane Pocketbook "Pregnancy and Childbirth" was included by hand. This book is quite significant because it represents the best evidence for doctors and other medical professionals who provide maternity care.

As more structure is given to publications, it becomes more useful; publications will gain authors all known individually to Wikidata. The citations of publications will be mapped and publications of Cochrane will bring out what publications truly add to the sum of all knowledge.

Monday, August 20, 2018

#Wikidata - Never heard of "R Andrew Moore"?

When quality in scientific papers, papers that are to be used as sources in Wikipedia, is important, it is relevant to include the papers published by the Cochrane Database of Systematic Reviews.

For one author, R Andrew Moore, I invoked the QuickStatementsBot and he added Mr Moore, added some 131 links to publications. It was a subset of what could be done but hey I already felt I was living dangerously.

When you then have Scholia look at Mr Moore for missing information .. there is a lot. But there is so much information.  In the diagram you see his co-author graph, and you will also find that Mr Moore published 123 times for Cochrane.

There are many publications and arguably we may want them all when we are to share the sum of all knowledge. Not all publications are equal and  Cochrane is special because it reviews what is published elsewhere; their bottom line is not commercial, their motto is Trusted evidence. Informed decisions. Better health. It is what aligns best with what we aim to do.

As I learn how to add authors who published for Cochrane, I will do maybe one a day. When other people take an interest, slowly but surely meta data on research gains relevance in Wikidata and with a little help of our Wikipedia friends we will provide better information.

Saturday, August 18, 2018

#Wikidata - Speakers at the #Cochrane Colloquium 2018

All the speakers announced for the Cochrane Colloquium 2018 have a presence on #Wikidata. Cochrane is a vital organisation; it debunks much of fake science by researching published papers. It asks for volunteers to help with this effort and its mission is to "promote evidence-informed health decision making by producing high-quality, relevant, accessible evidence".

Cochrane and the Wikimedia community collaborate, there is a Wikipedian in Residence. There have been editathons where articles were updated based on evidence (not sources) and consequently there is ample room for collaboration at the Wikidata level as well.

All speakers have been added to Wikidata and Mrs Sue Ziebland a professor at Oxford University, was new. This was not the case for many of her papers; they identified one author as "Sue Ziebland". Daniel Mietchen was so gracious to link the papers to the person and thanks to the Scholia tool Mrs Ziebland gets a lot of depth.

So what could a Wikidata / Cochrane collaboration look like? With a positive spin, all the positive authors according to Cochrane get a Wikidata item and, like Daniel did for Mrs Ziebland, the publications are linked to the person. When Cochrane provides links to its database, we can even see why a specific paper is so relevant. This will help Wikipedia editors because evidence-informed health decisions can be made and determine disputes in articles.

Sunday, August 12, 2018

#Knowledge - three types of knowledge and why "academic" is only one and overrated

There are three types of knowledge; they are academic, professional and knowledge from experience. The scheme to the right was published by Jaap van der Stel. He works in the field of psychiatry and is known for his work on addiction in combination with the use of peers in the recovery from addiction.

In the Wikimedia world, we insist on the primacy of academic knowledge and up to a point it serves us well. Operationally it means that much of the studies are done outside of the WMF, they may point out whatever but they hardly ever make an operational difference. When the internal WMF researchers study a subject, they are typically directed to study particular phenomena and it may point to operational issues. Issues that are either addressed by the WMF itself or adopted by the community.

When scientists make a compilation of all the sources in all the Wikipedias, it is academic work when the result is static. It may indicate what sources are used multiple times but it does not help any editor weed out sources that are biased or false. Magnus started work on a tool that knows about all the sources in two Wikipedia and Wikispecies.  It is updated in real time and that  gives it valid operational credentials.

I know from experience that there are issues with source information as we have it in Wikidata. We cannot invalidate sources by reference. We are only strong in the biomedical field and adding new information is not at all user friendly.

Now this user experience does not get much of a priority for valid operational reasons but the effect is that Wikidata is only useful for the geeks. Its lack of usability prevents its data to be used on Wikipedias in the "other" languages. It is where there is little or no academic nor operational interest.

Saturday, August 11, 2018

#GenderGap - The Gineta Sagan Award (and others)

The Ginetta Sagan award is conferred by Amnesty International USA. It is an annual award, the last recipient according to Wikidata when I looked at it received it in 2014, English Wikipedia has the award as part of the article on Ginetta Sagan and has information including 2017 (when you read the texts, you will find how notable these people are and, by inference the people without an article).

Arguably, there is a lack of balance between the number of men and the number of women having an article in any Wikipedia. This is known as the "gender gap" and the "women in red" project works to great effect to improve that balance. There is no lack of fine notable ladies who have no article.

I am really happy to present two queries. The first query shows women who won an award with no article at all (2502 results). The second shows women who won an award with no article in the English language (29083 results).

Let these women be an inspiration to you.

Monday, August 06, 2018

#MADinAmerica - cause and effect

MAD in America is an organisation about mental health, particularly in America. Their take is that there is a lot that can be improved. The part that I am mostly interested in is that they highlight the science that tells you how the science behind many mental health practices fails scrutiny.

One publication they recently highlighted is about brain abnormalities by people with schizophrenia. Current wisdom has it that "cortical thickness and surface area abnormalities in schizophrenia" is indicative of schizophrenia. This paper compares people with schizophrenia who were medicated and people who were not medicated. The research shows that these differences are due to the medication.

Adding a paper like this in Wikidata is easy. Making it stand out for its results is not. The paper probably indicates previous research that it debunks but how do you model that. When papers like this are to be used as sources, how do you ensure that it is even considered?

NB the first author is employed by the University of California, Irvine

Sunday, August 05, 2018

#Citations - "Verlorene Siegen" and #Wikipedia

The publication The Battle for Wikipedia: The New Age of ‘Lost Victories’? writes about debunked knowledge but used as sources in Wikipedia. Lost Victories is a book by Erich von Manstein, a German military officer convicted at the Nuremberg trials. He served as a witness and there is strong evidence that he perjured himself. He was sentenced to eighteen years in prison

This publication is not only of academic interest. In this day and age where fake facts and science are pervasive, it is a reminder that Wikipedia is a battle ground where debunked sources are used to prove a non-neutral point of view.

One of the objectives of the Wikimedia Foundation is to combat fake facts and use make citations operational as a tool. The main trust will be by adding sources to Wikidata. "Verlorene Siegen" obviously was present but even though there is a large body of work debunking this book, there was nothing to refer either to Mr von Manstein or his book in a critical way.

It was easy enough to add a few individual sources but it takes time. For analysis of sources used in Wikipedia there are dumps containing all the citations of all Wikipedias and now Magnus has started on a tool that initially includes real time sources for the German, English Wikipedia and Wikispecies. Of these publications 36% are linked to Wikidata and this provides a great start but it will take more. We need to know what papers debunked what knowledge. We need to know what papers a Retraction Watch is critical of, or what the relevance of a paper is according to the Cochrane Database of Systematic Reviews because that is how their facts are operationalised. We need to know because that is one way to debunk fake facts.

Saturday, August 04, 2018

#Wikidata - User versus bot updates and #Scholia

These are the aggregated subjects that are associated with all the papers for the winners of the Fields Medal. Given that there are some 60 award winners for the most prestigious award in the field of mathematics, this is not a representative reflection. That is not a problem, that is an opportunity.

I added one paper, "Singularities of linear systems and boundedness of Fano varieties". Given the title, I added "Fano variety" and "Linear system" as subjects. This made no difference in the Scholia tool and after some five minutes I asked what was happening. I was told that it takes a large interval before the data in the Toolserver get updated.

Typically, information about papers are added by bot. Not so much for mathematics but still. Mr Birkar for instance has only two papers in Wikidata at this time and for the other paper no subjects are given. When you add data by hand, instant gratification or instant visibility is important as it is a potent motivator.

The best reflection of work done in Wikidata is not given by Wikidata itself. It is either by tools like Scholia or Reasonator or it is by query. When query does give instant gratification, it has much of its potency because of the instant gratification.

Tools have one important benefit over query; it provides a standard layout for the information. Queries are potent and many people contributing content to Wikidata use it in tools like Petscan. But in reality, the typical difference between one query and the next are only in the qualifiers.

At this time the best user experience is given by tools. It often suffers from a time lag and this is of little relevance to bots. For humans though it is different.

Monday, July 30, 2018

#Wikidata - #Skills and tools needed to add #awards

Adding awards to Wikidata is one way to signal notable people who could do with some tender loving care, maybe even an article. Typically it starts with someone who was awarded. This time, three fine ladies: Kay Davies, Alice Rogers and Sarah Cleaveland.

Mrs Davies received the "Croonian Medal and Lecture", an award conferred by the Royal Society. The award was already known as the "Croonian Lecture" and the Wikipedia article contains a long list of recipients. The website included a few recipients and consequently the award was positively identified.

Thanks to Reasonator, I easily navigated from Mrs Davies to the award, I noticed how few recipients were known and using text from the article in combination with the Awarder tool I started adding 250+ recipients within ten minutes. Other awards like the Harveian Oration are still missing in Wikidata.

Mrs Rogers received the "Kavli Education Medal", only four recipients so far. The name Kavli in combination with awards proved a bit ambigue; finding the correct medal was the one challenge. One recipient was missing a Mrs Margaret Brown, easy enough to add her as well. The mother of Mrs Rogers was said to be a very accomplished mathematician of Bletchley Park fame but details are lacking.

Sarah Cleaveland received many awards; of interest are two because they are not linked. She was the first woman to win the Trevor Blackburn Award in 2008. The other, the Leeuwenhoek Medal and Lecture in 2018, got my attention. It did not have many recipients and again, the Awarder tool made it easy to add many missing recipients. There are several red links, I did not add them this time.

When I am to add the Trevor Blackburn Award, I first have to find it. It is mentioned on several Wikipedia articles but the award and Mr Blackburn are missing. Google helps me find the website of the award. With Mrs Cleaveland there are 13 recipients. The first thing to do is add the award, link the organisation that conferred it and the web address for the award.

When you then start looking for recipients, Reasonator immediately provides an updated view. No need to query for its recipients they are obvious. Just to show that you can, I added another fine lady; Mrs Karen Reed.

Sunday, July 29, 2018

#AfricaGap - #Nigeria, politics as a family business

A Wikipedia friend asked on Facebook for people to neutralise an all too advertisery article on a Nigerian Senator Ademola Adeleke. I have updated his Wikidata profile based on the information in the article.

The father of Mr Adeleke, Raji Ayoola Adeleke and his brother Isiaka both preceded him as a senator.

When you google Mr Adeleke, the first thing you find is his Wikipedia article, then you cannot miss that the University of Jacksonville denied that Mr Adeleke finished his education. Consequently it is a stretch to call him Dr Adeleke.

When Wikipedia and by inference Wikidata register the education of humans, it follows that such easy scams will be more prominently displayed and known. When an article makes plain that Mr Adeleke was born with a silver spoon, it will make it easy to question his ability to truly represent adequately.

Saturday, July 28, 2018

#Wikipedia, where is all that research?

Pine, a well known Wikipedian, asked attention for the registration of the 2018 "State of Wikimedia Research". Benjamin Mako Hill mentioned that a humongous amount of publications were published on Wikipedia in the last year alone.

That is great.

I checked the numbers using the Scholia tool and was a bit disappointed. The total numbers was "only" 337 for every year. Benjamin uses different tools; he mentioned his use of Google Scholar and indeed it shows so much more.

I was really pleased with Daniel Mietchen helping out on the subject of "probiotics" and I asked him if he could run his bot for the subject of "Wikipedia" and "Wikidata". But nevermind what he decides to do, running a bot adding key words to research is not scalable when you consider the overwhelming amount of research known to Wikidata. It is not only running it one time, it is also adding key words for any and all new research entered to Wikidata.

Given that we work in a Wiki way, this is totally acceptable. We do what we can, what takes our fancy and slowly but surely new approaches, tools improve on the quality and quantity of the data that we have. When Scholia was a commercial enterprise it would be different; the exposure and use of data would be a primary concern.

Friday, July 27, 2018

#Wikidata - I do not use query and here is why

When I edit Wikidata, I never use queries and here is why. I do not need them. For instance, I added an award to a person because it was obvious it was missing. I had no need for a query because everything that I wanted to know about the award was visible.

When you use query, you have to use a tool, define a query, run it, maybe tune it and then analyse the results. Using my beloved Reasonator, all the queries that I need are included. This is the same award and the same person but in the standard user interface of Wikidata. It is not informative, I only use it to edit.

A person wanting to teach Wikidata asked how do I structure a program? The first thing proposed was teach them to query. I agree that query is important, it has its use cases but it should not be the first introduction to Wikidata because it makes it too complicated at the start and even worse it is not necessary.

Thursday, July 26, 2018

#Wikidata - Pushback on probiotics with citations

Recently I added a paper to Wikidata. The paper indicated its subjects and mania, the immune system and probiotics are among its subjects. I dutifully added some of these subjects to the article and was surprised that a topic as controversial as probiotics did not relate to many many papers.When you check out #probiotics on Twitter, you will realise that a healthy mix of fact is much needed to counter the inundation of commercial offerings you will find.

I mentioned on Twitter my surprise that there was so little to find on Wikidata about this subject and Daniel Mietchen picked this up, had a bot run adding probiotic as a topic on Wikidata. The result is wonderful.

It is almost too good. We now run the risk not to see the forest for the trees.When you are looking for sources to cite, you want to narrow down on sources that were checked by Cochrane, you may want to find/dismiss the papers mentioned by Retraction Watch.

The best part; this is an embarrassment of riches. With bots running and updating topics mentioned on papers, we gain relevance to our collection of papers, authors are linked giving a clue who might be notable enough to get a Wikipedia article. As we gain more and more data with better links to indicators to the quality of papers, we gain terrain in the battle on false facts.

Sunday, July 22, 2018

#AfricaGap - Sean Jacobs at #Wikimania

To be blunt, what Mr Jacobs is talking about is one or more step removed of the Wikimedia reality. His story is important and indicates that a specific type of source exists and is available for study. Mr Jacobs informs on the importance of Twitter for the Zulu language.

Mr Jacobs is an academic and the reality of Zulu Wikipedia is that only a few days ago we celebrated article number 1000 for the Zulu Wikipedia. What the Zulu Wikipedia needs is high school students writing in Zulu. Writing about what is important to them, what is important to their curriculum and to their world.

Just consider what one high school could do. Now consider what ten high schools could do. Compare that with one academic or what all the Zulu students currently in university could do.

Yes, history has been written so far and it does report in a biased way. When the Zulu language is to gain a foothold in the Wikimedia world, we need many people being involved in writing first the most basic information. Once there is a basis, the sources Mr Jacobs mentions become relevant in a Zulu Wikipedia.

Saturday, July 21, 2018

#AfricaGap - A #Wikidata based watch list about a Africa reality II

When there are many Listeria lists that you follow, when you care about the development about the subject, it is wonderful to see so much activity related to Africa.  As more people care to work on African politicians or "administrative territorial entities", the Listeria lists that also exist on Wikipedias in African languages will be updated as well.

When the Listeria lists become part of the main body of a Wikipedia, the politicians and entities will be found. When the info boxes as presented at the Celtic knot conference follow, slowly but surely quality content in quantity about Africa will no longer be a mirage.

Wednesday, July 18, 2018

#AfricaGap - Guinea; standing on the shoulders of giants

This map comes courtesy of the UN to Commons. It was downloaded in 2007 by Jeroen, the language on the map is French and Wikidata has much of its data in English. The names in French are mostly the same but that is for someone else to consider.

Many of the articles on "administrative territorial entities" are written by a small group of people. I want to single out Shevon Silva, the user page expresses the amount of work that went into adding stubs for so many African territories. The important thing about data is; once it is there you can change it in any way necessary.

When data gets entered into Wikidata, certain Wikipedia things are not possible; a "human settlement" is not a "administrative territorial entity". Such conflations need to be undone in Wikidata. Obviously the human settlement is located only in that administrative territorial entity and others only by inference. Attributes like "inception date" and links to other human settlements that are part of a sub-perfecture are for someone else to add/get right. Another consideration are historic administrative territorial entities particularly those of historic countries.

At this time it is important to celebrate what we have, morph it into a format that can be used on any and all of our projects. Once it is available in all the Wikipedias, it will generate more and more links and this will put Africa on the map.

Saturday, July 14, 2018

#AfricaGap - Where Wikipedias collide

The German and the English Wikipedia collide on the "administrative territorial entities" of the Gambia. I was told to remove entries that I made to Wikidata because they were "Falschinformationen". The German article is much better written but the English article indicates that the German information is likely to be outdated.

A discrepancy like this is obviously best solved to insist on "your" solution. The point that I have been making quite often is that such differences are commonplace and require proper sourcing. The obvious source will not be found on a university website, it will be found in governmental information of the Gambia.

Making information about Africa available in Wikidata makes the errors, the inconsistencies and the lack of data in the Wikipedias more visible. This is not solved by considering your "own" data to be best, it is by proving that information is up to date. According to the English Wikipedia, the Upper River Division is no longer; it is largely replaced by the Basse Local Government Area.

My question: what does it take for the Wikipedias to take their inconsistencies serious?

#AfricaGap - Support for "minority" languages

Support for "minority" languages was the subject of the Celtic knot conference. I have watched some of the presentations and find that there is a lot more to supporting minority languages from a Wikidata point of view than just adding missing labels. A vital strength of any Wikipedia is found in its relations between articles and that subjects of interest may be found.

Minority languages are a misnomer, what we mean is that the Wikipedias are small. They have a lack of articles, stucture is missing and subjects of interest are not found. Subjects have the same relations in any language and consequently lists expressing these relations can be shared using Wikidata in any language including "minority" languages. Missing labels need not be an issue; this is expressed nicely in this list of subdivisions of Egypt; the labels for most of them are only available in the Arabic script. A nice invite for people to add labels in the Latin and any other script.

The Welsh Wikipedia makes use of "Listeria" list in its main space and as a consequence, all items in these list can be found. They are available in a context, associated information may be available and they link to articles in other languages. The Welsh Wikipedia did implement the "Article Placeholder" and in this way they provide even more information for the ffspecific subjects.

When you consider Africa and information about Africa, there is no Wikipedia that provides adequate information. The data is incomplete, unstructured and often out of date. It is easy enough to improve on the quality of the data in Wikidata and when the information is updated in many Listeria lists on many Wikipedias, the impact is great.

The lack of coverage of subjects about Africa is huge. Less than 1% of humans is from Africa, we do not have up to date information about "administrative territorial entities" like provinces and districts. In my AfricaGap project only a limited range of subjects get some attention at this time. Obviously there is more that could be done. African cinema is one subject that is of interest to a group of Wikimedians. When they write their articles it will eventually translate to Wikidata and information about movies, actors and directors may be shown in Listeria lists in all the African language Wikipedias. This may generate interest from an African public for our projects.

There is only one purpose for Wikidata, Wikipedia and it is to find a public, a use case for the data, the articles, the information we provide. The one challenge we face is in both the quantity and quality of our articles and data.

Thursday, July 12, 2018

#AfricaGap - Considerations on the "Article Placeholder"

Having listened to a Youtube presentation on Article Placeholder, I am seriously disappointed. There are a few statements in there that show a lack of understanding on the functionality of the Reasonator. It is dismissed for all the wrong reasons and as a result there are a lot of missed opportunities.

What is missed is that Reasonator, as it is, provides superior representation in any language. It is a tool that helps with missing labels from within the tool. Missing descriptions in Reasonator do not need to be a problem; there are automated functionality that has shown its merits in many languages. Do compare the representation of Wikidata data and the structured representation will be seen to be more rich with the inclusion of maps, images and data linked to the subject in question.

What is particularly galling is that Reasonator is dismissed because "it is an external tool". Before work on the Article Placeholder started, it would have been easy enough to adopt functionality as provided by this external tool and it would not have been an external tool, an obvious argument AFTER the fact.

Where Reasonator provides texts, it is done based on little scripts. This is seen as problematic as is seen as a drain on the community. Templates on the other hand may be a part of the Article Placeholder and they have the same problem.

For me the bottom line is not so much about the Article Placeholder but the lack of usability of Wikidata. It is only because of Reasonator that it is easy and obvious to work on the subjects I work on. I have not spend hours learning how to query, Reasonator provides me instantly with the results in any context like the missing "Districts of Djibouti".

Monday, July 09, 2018

#AfricaGap - the Subprefectures of the Central African Republic

Even the best query is impotent when the data is not there. There were no known subprefectures of the Central African Republic when I started looking for them.

Best practice has it that any "human settlement" is located in the lowest administrative territorial entity available. It follows that the city of Baoro  is in the Baoro subprefecture and, it in turn is in the Nana-Mambéré Prefecture. This is nominally a Wikipedia best practice and a Wikidata best practice.

When a Wikipedia article indicates a "human settlement" category for an subprefecture, we get it wrong in Wikidata. When we change this in Wikidata, it is still problematic when many articles consider the town and the administrative entity to be the same thing.. Then again, this is Africa and who notices?

When there are multiple items by the same name and one is about the city and the other is not, it is just a matter of making one a subprefecture. For the Central African Republic, this is rather straightforward and it just takes a lot of work to get some structure in the data. At the same time there are many articles in the wrong basket. That problem is for another day.

Fixing the data for the CAR is doable. It takes someone with infinite time on his hands to fix the administrative entities for Angola. Most of the data is wrong and entities by the same name and type often exist multiple times. The queries will show anyone brave enough to work on it.