Sunday, February 17, 2019

@WikiResearch - Nihil de nobis, sine nobis

There is this wonderful notion how Research is going to tell us what to do in light of the strategic Wikimedia 2030 plans. Wonderful. There is going to be this taxonomy of the information we are missing.

Let me be clear. We do need research and the data it is based on, it is to be available to us. There is no point in a future taxonomy of missing knowledge when we have been asking for decades : "what articles are people looking for that they cannot find". If there is to be a taxonomy what else should it be based on?

When we are to fill in the gaps of what Wikipedia covers, we can stimulate more new articles by indicating what traffic they get in the first month. Stimulate our readers to learn more by showing what Wikidata has to offer and show its links to texts in other languages. It may even result in new stubs even articles in "their" language. This technology has been available for years now.

The WikiResearch is full of arguments on the importance of citations and Wikidata as the platform for all Wikipedia sources, why then are the WikiResearch papers not in Wikidata from the start. What is it, that WikiResearchers consider that Wikidata is not about them? Just as it is about any other subject Wikidata covers? What is it that makes their work less findable (FAIR) than what is known to have been published as open content by the NIH?

The point I want to make is that no matter how well intended it is what the WikiResearch aims to achieve, they lose the interest, involvement and commitment of people like me, the people they need to get the results they aim for.

Yes do research, but we should not wait for its results, we know how to stimulate people to write new articles.
Thanks,
      GerardM

Sunday, February 10, 2019

#Wikidata - A quick and dirty "HowTo" to improve exposure of a subject in Wikidata

When you want to expose a particular subject, any subject, in Wikidata. This is the quick and dirty way to expose much of what there is to know. There are a few caveats. The first is that the aim is not to be complete, the second that it is biased towards scientists who are open about their work at ORCiD.

You start with a paper, a scientist. They have an DOI / ORCiD identifier and, they may already be in Wikidata. First there is the discovery process of the available literature and the authors involved. The SourceMD tool is key; with a SPARQL query or with a QID per line, you run a process that will update publications by adding missing authors or it will add missing publications and missing authors to known publications.

When you treat this as an iterative process, more authors and publications become known. When you run the same process for (new) co-authors, more publications and authors become known that are relevant to your subject.

To review your progress, you use Scholia. it has multiple modes that help you gain an understanding of authors, papers, subjects, publications, institutions.. You will see the details evolve. NB mind the lag Wikidata takes to update its database. It is not instant gratification.

A few observations, your aim may be to be "complete" but publications are added all the time and the same is true for scientists. People increasingly turn to ORCiD for a persistent identifier for their work. The real science is in designating a subject to a paper. Arguably the subject may be in the name of the article but as an approach it is a bit coarse. I leave that to you as your involvement makes you a subject "specialist".
Thanks,
       GerardM

Tuesday, February 05, 2019

#Wikidata - Naomi Ellemers and the relevance of #Awards

In a 2016 blogpost, I mentioned the relevance of awards. At the time Professor Ellemers received an award and it was the vehicle to make that point in the story.

Today in an article in a national newspaper, Mrs Ellemers makes a strong point that the perception of awards is really poblematic. What they do is reinforce a bias that American science is superior. It leads to a perception by European students that it is the USA "where it is all happening". A perception that Mrs Ellemers argues is incorrect.

NB Mrs Ellemers is the recipient of the 2018 Career Contribution Award of the Society for Personality and Social Psychology.

Wikidata re-inforces this bias for American science by including a rating for "science awards". This rating values awards by comparing them. This rating is done by an American organisation and the whole notion behind it is suspect because the assumptions are not necessarily / not at all beneficial for the practice of science.

How to counter such a bias? As far as I am concerned there is no value in making a distinction between awards and "science awards" and biased information like this should be removed. Just consider, when European science is considered less than American science... how would  African science be rated?
Thanks,
     GerardM

Sunday, February 03, 2019

Dr Matshidiso Moeti, an exeption to my rules

When I add scientists to Wikidata, I really want something to link to, an external source like ORCID, Google Scholar Viaf.. When I link publications it is the data at ORCID I link to, I don't do manual linking.

From the sources I have read, Dr Moeti is the kind of person who deserves a Wikipedia article. Her work and the people she works with, the cases she works not only deserve recognition it is imho vitally important that they do, that you learn about them. This is why I made exceptions to my rule.

This is her Scholia, this is her Reasonator and please, take an interest.
Thanks,
      GerardM

The case for #Wikimedia Foundation as an #ORCID member organisation

The Wikimedia Foundation is a research organisation. No two ways about it; it has its own researchers that not only perform research on the Wikimedia projects and communities, they coordinate research on Wikimedia projects and communities and it produces its own publications. As such it qualifies to become an ORCID Member organisation.

The benefits are:
  • Authenticating ORCID iDs of individuals using the ORCID API to ensure that researchers are correctly identified in your systems
  • Displaying iDs to signal to researchers that your systems support the use of ORCID
  • Connecting information about affiliations and contributions to ORCID records, creating trusted assertions and enabling researchers to easily provide validated information to systems and profiles they use
  • Collecting information from ORCID records to fill in forms, save researchers time, and support research reporting
  • Synchronizing between research information systems to improve reporting speed and accuracy and reduce data entry burden for researchers and administrators alike
At this time the quality of information about Wikimedia research is hardly satisfactory. As is the standard; announcements are made about a new paper and as can be expected the paper is not in Wikidata. The three authors are not in ORCID, as is usual for people who work in the field of computing so there is no easy way to learn about their publications.

What will this achieve; it will be the Wikimedia Foundation itself that will push information about its research to ORCID and consequently at Wikidata we can easily update the latest and greatest. It is also an important step for documentation about becoming discoverable. It is one thing to publish Open Content, when it is then hard to find, it is still not FAIR and the research does not have the hoped for impact. It also removes an issue that some researchers say they face; they cannot publish about themselves on Wikimedia projects. 

Another important plus; by indicating the importance of having scholarly papers known in ORCID we help reluctant scientists understand that yes, they have a career in open source, open systems but finding their work is very much needed to be truly open.
Thanks,
       GerardM

Sunday, January 27, 2019

@Wikidata #quality - one example: Leonardo Quisumbing

Quality happens on many levels. Judge Leonardo Quisumbing passed away and a lot of well meant effort went into his Wikidata item.  The data is inconsistent with our current practice so in the Wikidata chat people were asked to help fix the data.

Judge Quisumbing held many positions, one of them was "Secretary of Labor and Employment". This is a cabinet position and it follows that Mr Quisumbing was also a "politician". It is one thing to include this position and occupation to a person, from a quality point of view it is best to include a "start date" a "replaces" an "end date" and a "replaced by". The problem: the predecessor and successor do not exist in Wikidata.

Many a secretary of Labor do have a Wikipedia article and they are included in a category. Using the "Petscan" tool it is easy to import all those mentioned. Typically the quality of the info is good however there is always the "six percent" error rate. Indeed one person was erroneously indicated as a "secretary of labor".  The problem is that people who only care about quality on the item level are really hostile to such imported issues. They are best ignored for their ignorance/arrogance.

A next level of quality is to complete the list with all missing secretaries. This can be done warts and all from the Wikipedia article. It results in a Reasonator page that includes all the red and black links of the article. Many new items are created in the process and having automated descriptions are vital in finding as many matches as possible.

Judge Quisumbing became an "Associate Justice of the Supreme Court of the Philippines" and became the senior associate justice in 2007. Adding associate judges from a category was obvious, adding senior associated judges is a task similar to secretaries of labor. However, a senior is the first among the many and consequently it requires a judgment call on how to express this.

Given that Wikidata is a wiki, you do the best you can to the level that has your interest. There is still a need to improve the Wikidata item for judge Quisumbing but that is for someone else.

Thanks,
       GerardM

Sunday, January 20, 2019

@Wikidata - #Quality in a #Wiki environment

What quality is, quality in a data environment has been studied often enough. Lots of words are spend about it but one notion is always left out. What is data quality in a Wiki environment. How does that translate to Wikidata.

First of all; Wikidata serves many purposes. The initial purpose of Wikidata was to replace the in-article "interwiki" links. They were notoriously difficult to maintain, often wrong. A single Wikidata item replaced the links for a subject in all Wikipedias and this brought stability and a high level of confidence in the result. Over time the quality of the "interwiki' links went down; there are fewer people involved adding and curating these links and it is seen as a quality issue when new items are generated for new articles; they do not have statements and are often not linked. There have been protests against these new additions.

A second purpose is the use of Wikidata statements in Wikipedia templates. Assessing data quality becomes complicated as there are micro, mesa and macro levels of quality at play. The micro level: is sufficient data available for one template in one Wikipedia article. The mesa level: is sufficient data available for one template in Wikipedia articles on the same topic. The macro level: is the same data available for all interested Wikipedias and do we have the required labels in those languages.

Quality considerations are driven by this approach. On a micro level you want all awards for a scientist to be linked on an item. On the mesa level you want all recipients of an award to be linked to their item. On the macro level you want all awards to have labels in the language of a Wikipedia and have all local considerations been met.

Standard quality considerations in a Wiki environment are not helpful; they are judgemental. People contribute to Wikidata and all have their own purposes. A Wiki is a work in progress and when quality assessments are to be performed, the question should focus on the extend a specific function is supported. What people seek in support also changes; as long as there was no article for professor Angela Byars-Winston it was fine only to know about her for one publication. Now that Jess Wade picked her for an article, it may be relevant that she is the first and so far only person known to Wikidata who was a "champion of change" and that more papers are identified for her.

Wikidata includes many references to scientific papers and authors. However, so far it serves no purpose. Allegedly there is a process underway that imports papers used as citations in the Wikipedias but it is not clear what papers are used in what Wikipedia article. So far it is a big stamp collection, a collection with a rapidly growing quality. A collection that highlights authors who are open about their work and who share the details of their work at ORCID. In effect, this data set indicates that the relevance of a scientist improves by being open.

Wikidata invites people to add/curate the data that is of interest to them. Particularly the esoteric data, data about subjects like African geography, Islamic history need a lot of tender loving care. It is where Wikidata and the large Wikipedias are weak. For as long as Wikidata is largely defined by the large Wikipedias it will reflect the same biases and these biases will be hard to assess and curate.
Thanks,
      GerardM

Tuesday, January 01, 2019

The #decline of #Wikipedia (as we know it)

Regularly, we are told about misgivings about Wikipedia. It can not stay as it is, it is in decline; it is all doom and gloom.  NB the use of the phrase "doom and gloom" increased in the 1950s.

So Wikipedia will not remain as we know it? GOOD, it forces us to think how we can improve what we have. When things are to change, what will have a healthy impact? How will we get something that serves us better in "sharing the sum of all knowledge". How will we get more people use what we have to offer and how will we entice more people to contribute to the data collection that is included in all the Wikimedia Foundation projects.

First thing; our projects need to be less US-American. For me, a POV situation I was in, was "obviously"decided in favour of only considering the USA point of view; I let it slide but went to pastures green. The money we raise is for: "keeping the servers going". An objective a bit too limited to my taste but it raises the cash. Money is mainly raised in the USA but in order to be truly global, it is better to raise more equally in every country at least for the amount it cost to serve it. Gapminder is where you may be reminded that money is everywhere. As to the servers, why have all crucial eggs in one USA basket? Given its current politics, there is indeed a potential doom and gloom scenario possible. Having them more dispersed will bring our data closer to our audience, our editors as well. Benefiting them with better performance; that is the easy win. A more complicated solution is in the implementation of the Vrije Universiteit research of a peer to peer MediaWiki.

When our projects are to be less US-American, it is important for spending to be more global too.

When today's Wikipedia practices are no longer considered to be set in stone, we can finally implement features that enable, ensure and enhance its future. First, we should be less self centric; after all there is only one sum of all knowledge and we define only a part of it. Magnus showed how to maintain lists in an efficient way and Amir added recently a "task" to Phabricator to implement proper disambiguation of "red links". We are increasingly aware, not only of the references of all Wikipedias but also of publications by scientists that enable their work to be found. Complement this with the scientific papers we publish and we improve the public relevance of scientists by making them findable, by pointing to their science.

With a changed approach at Wikipedia, we may be bold and change the outlook on what Wikipedia is there for as well. Why not make Wikipedia the gateway to information held elsewhere? Why not show a Scholia page for every scientist we know, why not offer the books at OpenLibrary or inform on the availability of books at the local library?  Why not partner with other organisation we have a shared objective in. But most importantly let us be aware that an African professor teaches in Africa and that we allow for and enable the context of our partners and volunteers.

For me there is no reason for doom and gloom as there are so many opportunities to become even more effective. With a whole new year in front of us; let us do well.
Thanks,
        GerardM