Saturday, March 16, 2019

#Sharing in the Sum of all #Knowledge from a @Wikimedia perspective I

Sharing the sum of all knowledge is what we have always aimed for in our movement. In Commons we have realised a project that illustrates all Wikimedia projects and in Wikidata we have realised a project that links all Wikimedia projects and more.

When we tell the world about the most popular articles in Wikipedia, it is important to realise that we do not inform what the most popular subjects are. We could, but so far we don't. The most popular subjects is the sum of all traffic of all Wikipedia articles on the same subject. Providing this data is feasible; it is a "big data" question.

We do have accumulated data for the traffic of articles on all Wikipedias, we can link the articles to the Wikidata items. What follows is simple arithmetic. Powerful because it will show that English Wikipedia is less than fifty percent of all traffic. That will help make the existing bias for English Wikipedia and its subjects visible particularly because it will be possible to answer a question like: "What are the most popular subjects that do not have an article in English?" and compare those to popular diversity articles.

In Wikidata we know about the subjects of all Wikipedias but it too is very much a project based on English. That is a pity when Wikidata is to be the tool that helps us find what subjects people are looking for that are missing in a Wikipedia. For some there is an extension to the search functionality that helps finding information. It uses Wikidata and it supports automated descriptions.

Now consider that this tool is available on every Wikipedia. We would share more information.With some tinkering, we would know what is missing where. There are other opportunities; we could ask logged in users to help by adding labels for their language to improve Wikidata. When Wikidata does not include the missing information, we could ask them to add a Wikidata item and additional statements, a description to improve our search results.

This data approach is based on the result of a process; the negative results of our own Search and it is based on active cooperation of our users. At the same time, we accumulate negative results of search where there has been no interaction, link it to Wikidata labels and gain an understanding of the relevance of these missing articles. This fits in nicely with the marketing approach to "what it is that people want to read in a Wikipedia".
Thanks,
      GerardM

Saturday, March 09, 2019

A #marketing approach to "what it is that people want to read in a @Wikipedia"

All the time people want to read articles in a Wikipedia, articles that are not there. For some Wikipedias that is obvious because there is so little and, based on what people read in other Wikipedias, recommendations have been made suggesting what would generate new readers.This has been the approach so far; a quite reasonable approach.

This approach does not consider cultural differences, it does not consider what is topical in a given "market". To find an answer to the question: what do people want to read, there are several strategies. One is what researchers do: they ask panels, write papers and once it is done there is a position to act upon. There are drawbacks; 
  • you can only research so many Wikipedias
  • for all the other Wikipedias there is no attention
  • the composition of the panels is problematic particularly when they are self selecting
  • there are no results while the research is being done
The objective of a marketing approach is centered around two questions: 
  • what is it that people are looking for now (and cannot find) 
  • what can be done to fulfill that demand now
The data needed for this approach; negative search results. People search for subjects all the time and there are all kinds of reasons why they do not find what they are looking for.. Spelling, disambiguation and nothing to find are all perfectly fine reasons for a no show. 

The "nothing to find" scenario is obvious; when it is sought often, we want an article. Exposing a list of missing articles is one motivator for people to write. Once they have written, we do have the data of how often an article was read. When the most popular new articles of the last month are shown, it is vindication for authors to have written popular articles. It is easy, obvious and it should be part of the data Wikimedia Foundation already collects.. In this way the data is put to use. It is also quite FAIR to make this data available. 

For the "disambiguation" issue, Wikidata may come to the rescue. It knows what is there and, it is easy enough to add items with the same name for disambiguation purposes. Combine this with automated descriptions and all that is requires is a user interface to guide people to what they are looking for. When there is "only" a Wikidata item, it follows that its results feature in the "no article" category.

The "spelling" issue is just a variation on a theme. Wikidata does allow for multiple labels. The search results may use of them as well. Common spelling errors are also a big part of the problem. With a bit of ingenuity it is not much of a problem either.

Marketing this marketing approach should not be hard. It just requires people to accept what is staring them in the face. It is easy to implement, it works for all the 280+ language and it is likely to give a boost to all the other Wikipedias but also to Wikidata.
Thanks,
        GerardM

Sunday, February 17, 2019

@WikiResearch - Nihil de nobis, sine nobis

There is this wonderful notion how Research is going to tell us what to do in light of the strategic Wikimedia 2030 plans. Wonderful. There is going to be this taxonomy of the information we are missing.

Let me be clear. We do need research and the data it is based on, it is to be available to us. There is no point in a future taxonomy of missing knowledge when we have been asking for decades : "what articles are people looking for that they cannot find". If there is to be a taxonomy what else should it be based on?

When we are to fill in the gaps of what Wikipedia covers, we can stimulate more new articles by indicating what traffic they get in the first month. Stimulate our readers to learn more by showing what Wikidata has to offer and show its links to texts in other languages. It may even result in new stubs even articles in "their" language. This technology has been available for years now.

The WikiResearch is full of arguments on the importance of citations and Wikidata as the platform for all Wikipedia sources, why then are the WikiResearch papers not in Wikidata from the start. What is it, that WikiResearchers consider that Wikidata is not about them? Just as it is about any other subject Wikidata covers? What is it that makes their work less findable (FAIR) than what is known to have been published as open content by the NIH?

The point I want to make is that no matter how well intended it is what the WikiResearch aims to achieve, they lose the interest, involvement and commitment of people like me, the people they need to get the results they aim for.

Yes do research, but we should not wait for its results, we know how to stimulate people to write new articles.
Thanks,
      GerardM

Sunday, February 10, 2019

#Wikidata - A quick and dirty "HowTo" to improve exposure of a subject in Wikidata

When you want to expose a particular subject, any subject, in Wikidata. This is the quick and dirty way to expose much of what there is to know. There are a few caveats. The first is that the aim is not to be complete, the second that it is biased towards scientists who are open about their work at ORCiD.

You start with a paper, a scientist. They have an DOI / ORCiD identifier and, they may already be in Wikidata. First there is the discovery process of the available literature and the authors involved. The SourceMD tool is key; with a SPARQL query or with a QID per line, you run a process that will update publications by adding missing authors or it will add missing publications and missing authors to known publications.

When you treat this as an iterative process, more authors and publications become known. When you run the same process for (new) co-authors, more publications and authors become known that are relevant to your subject.

To review your progress, you use Scholia. it has multiple modes that help you gain an understanding of authors, papers, subjects, publications, institutions.. You will see the details evolve. NB mind the lag Wikidata takes to update its database. It is not instant gratification.

A few observations, your aim may be to be "complete" but publications are added all the time and the same is true for scientists. People increasingly turn to ORCiD for a persistent identifier for their work. The real science is in designating a subject to a paper. Arguably the subject may be in the name of the article but as an approach it is a bit coarse. I leave that to you as your involvement makes you a subject "specialist".
Thanks,
       GerardM

Tuesday, February 05, 2019

#Wikidata - Naomi Ellemers and the relevance of #Awards

In a 2016 blogpost, I mentioned the relevance of awards. At the time Professor Ellemers received an award and it was the vehicle to make that point in the story.

Today in an article in a national newspaper, Mrs Ellemers makes a strong point that the perception of awards is really poblematic. What they do is reinforce a bias that American science is superior. It leads to a perception by European students that it is the USA "where it is all happening". A perception that Mrs Ellemers argues is incorrect.

NB Mrs Ellemers is the recipient of the 2018 Career Contribution Award of the Society for Personality and Social Psychology.

Wikidata re-inforces this bias for American science by including a rating for "science awards". This rating values awards by comparing them. This rating is done by an American organisation and the whole notion behind it is suspect because the assumptions are not necessarily / not at all beneficial for the practice of science.

How to counter such a bias? As far as I am concerned there is no value in making a distinction between awards and "science awards" and biased information like this should be removed. Just consider, when European science is considered less than American science... how would  African science be rated?
Thanks,
     GerardM

Sunday, February 03, 2019

Dr Matshidiso Moeti, an exeption to my rules

When I add scientists to Wikidata, I really want something to link to, an external source like ORCID, Google Scholar Viaf.. When I link publications it is the data at ORCID I link to, I don't do manual linking.

From the sources I have read, Dr Moeti is the kind of person who deserves a Wikipedia article. Her work and the people she works with, the cases she works not only deserve recognition it is imho vitally important that they do, that you learn about them. This is why I made exceptions to my rule.

This is her Scholia, this is her Reasonator and please, take an interest.
Thanks,
      GerardM

The case for #Wikimedia Foundation as an #ORCID member organisation

The Wikimedia Foundation is a research organisation. No two ways about it; it has its own researchers that not only perform research on the Wikimedia projects and communities, they coordinate research on Wikimedia projects and communities and it produces its own publications. As such it qualifies to become an ORCID Member organisation.

The benefits are:
  • Authenticating ORCID iDs of individuals using the ORCID API to ensure that researchers are correctly identified in your systems
  • Displaying iDs to signal to researchers that your systems support the use of ORCID
  • Connecting information about affiliations and contributions to ORCID records, creating trusted assertions and enabling researchers to easily provide validated information to systems and profiles they use
  • Collecting information from ORCID records to fill in forms, save researchers time, and support research reporting
  • Synchronizing between research information systems to improve reporting speed and accuracy and reduce data entry burden for researchers and administrators alike
At this time the quality of information about Wikimedia research is hardly satisfactory. As is the standard; announcements are made about a new paper and as can be expected the paper is not in Wikidata. The three authors are not in ORCID, as is usual for people who work in the field of computing so there is no easy way to learn about their publications.

What will this achieve; it will be the Wikimedia Foundation itself that will push information about its research to ORCID and consequently at Wikidata we can easily update the latest and greatest. It is also an important step for documentation about becoming discoverable. It is one thing to publish Open Content, when it is then hard to find, it is still not FAIR and the research does not have the hoped for impact. It also removes an issue that some researchers say they face; they cannot publish about themselves on Wikimedia projects. 

Another important plus; by indicating the importance of having scholarly papers known in ORCID we help reluctant scientists understand that yes, they have a career in open source, open systems but finding their work is very much needed to be truly open.
Thanks,
       GerardM

Sunday, January 27, 2019

@Wikidata #quality - one example: Leonardo Quisumbing

Quality happens on many levels. Judge Leonardo Quisumbing passed away and a lot of well meant effort went into his Wikidata item.  The data is inconsistent with our current practice so in the Wikidata chat people were asked to help fix the data.

Judge Quisumbing held many positions, one of them was "Secretary of Labor and Employment". This is a cabinet position and it follows that Mr Quisumbing was also a "politician". It is one thing to include this position and occupation to a person, from a quality point of view it is best to include a "start date" a "replaces" an "end date" and a "replaced by". The problem: the predecessor and successor do not exist in Wikidata.

Many a secretary of Labor do have a Wikipedia article and they are included in a category. Using the "Petscan" tool it is easy to import all those mentioned. Typically the quality of the info is good however there is always the "six percent" error rate. Indeed one person was erroneously indicated as a "secretary of labor".  The problem is that people who only care about quality on the item level are really hostile to such imported issues. They are best ignored for their ignorance/arrogance.

A next level of quality is to complete the list with all missing secretaries. This can be done warts and all from the Wikipedia article. It results in a Reasonator page that includes all the red and black links of the article. Many new items are created in the process and having automated descriptions are vital in finding as many matches as possible.

Judge Quisumbing became an "Associate Justice of the Supreme Court of the Philippines" and became the senior associate justice in 2007. Adding associate judges from a category was obvious, adding senior associated judges is a task similar to secretaries of labor. However, a senior is the first among the many and consequently it requires a judgment call on how to express this.

Given that Wikidata is a wiki, you do the best you can to the level that has your interest. There is still a need to improve the Wikidata item for judge Quisumbing but that is for someone else.

Thanks,
       GerardM