Sunday, January 27, 2019

@Wikidata #quality - one example: Leonardo Quisumbing

Quality happens on many levels. Judge Leonardo Quisumbing passed away and a lot of well meant effort went into his Wikidata item.  The data is inconsistent with our current practice so in the Wikidata chat people were asked to help fix the data.

Judge Quisumbing held many positions, one of them was "Secretary of Labor and Employment". This is a cabinet position and it follows that Mr Quisumbing was also a "politician". It is one thing to include this position and occupation to a person, from a quality point of view it is best to include a "start date" a "replaces" an "end date" and a "replaced by". The problem: the predecessor and successor do not exist in Wikidata.

Many a secretary of Labor do have a Wikipedia article and they are included in a category. Using the "Petscan" tool it is easy to import all those mentioned. Typically the quality of the info is good however there is always the "six percent" error rate. Indeed one person was erroneously indicated as a "secretary of labor".  The problem is that people who only care about quality on the item level are really hostile to such imported issues. They are best ignored for their ignorance/arrogance.

A next level of quality is to complete the list with all missing secretaries. This can be done warts and all from the Wikipedia article. It results in a Reasonator page that includes all the red and black links of the article. Many new items are created in the process and having automated descriptions are vital in finding as many matches as possible.

Judge Quisumbing became an "Associate Justice of the Supreme Court of the Philippines" and became the senior associate justice in 2007. Adding associate judges from a category was obvious, adding senior associated judges is a task similar to secretaries of labor. However, a senior is the first among the many and consequently it requires a judgment call on how to express this.

Given that Wikidata is a wiki, you do the best you can to the level that has your interest. There is still a need to improve the Wikidata item for judge Quisumbing but that is for someone else.

Thanks,
       GerardM

Sunday, January 20, 2019

@Wikidata - #Quality in a #Wiki environment

What quality is, quality in a data environment has been studied often enough. Lots of words are spend about it but one notion is always left out. What is data quality in a Wiki environment. How does that translate to Wikidata.

First of all; Wikidata serves many purposes. The initial purpose of Wikidata was to replace the in-article "interwiki" links. They were notoriously difficult to maintain, often wrong. A single Wikidata item replaced the links for a subject in all Wikipedias and this brought stability and a high level of confidence in the result. Over time the quality of the "interwiki' links went down; there are fewer people involved adding and curating these links and it is seen as a quality issue when new items are generated for new articles; they do not have statements and are often not linked. There have been protests against these new additions.

A second purpose is the use of Wikidata statements in Wikipedia templates. Assessing data quality becomes complicated as there are micro, mesa and macro levels of quality at play. The micro level: is sufficient data available for one template in one Wikipedia article. The mesa level: is sufficient data available for one template in Wikipedia articles on the same topic. The macro level: is the same data available for all interested Wikipedias and do we have the required labels in those languages.

Quality considerations are driven by this approach. On a micro level you want all awards for a scientist to be linked on an item. On the mesa level you want all recipients of an award to be linked to their item. On the macro level you want all awards to have labels in the language of a Wikipedia and have all local considerations been met.

Standard quality considerations in a Wiki environment are not helpful; they are judgemental. People contribute to Wikidata and all have their own purposes. A Wiki is a work in progress and when quality assessments are to be performed, the question should focus on the extend a specific function is supported. What people seek in support also changes; as long as there was no article for professor Angela Byars-Winston it was fine only to know about her for one publication. Now that Jess Wade picked her for an article, it may be relevant that she is the first and so far only person known to Wikidata who was a "champion of change" and that more papers are identified for her.

Wikidata includes many references to scientific papers and authors. However, so far it serves no purpose. Allegedly there is a process underway that imports papers used as citations in the Wikipedias but it is not clear what papers are used in what Wikipedia article. So far it is a big stamp collection, a collection with a rapidly growing quality. A collection that highlights authors who are open about their work and who share the details of their work at ORCID. In effect, this data set indicates that the relevance of a scientist improves by being open.

Wikidata invites people to add/curate the data that is of interest to them. Particularly the esoteric data, data about subjects like African geography, Islamic history need a lot of tender loving care. It is where Wikidata and the large Wikipedias are weak. For as long as Wikidata is largely defined by the large Wikipedias it will reflect the same biases and these biases will be hard to assess and curate.
Thanks,
      GerardM

Tuesday, January 01, 2019

The #decline of #Wikipedia (as we know it)

Regularly, we are told about misgivings about Wikipedia. It can not stay as it is, it is in decline; it is all doom and gloom.  NB the use of the phrase "doom and gloom" increased in the 1950s.

So Wikipedia will not remain as we know it? GOOD, it forces us to think how we can improve what we have. When things are to change, what will have a healthy impact? How will we get something that serves us better in "sharing the sum of all knowledge". How will we get more people use what we have to offer and how will we entice more people to contribute to the data collection that is included in all the Wikimedia Foundation projects.

First thing; our projects need to be less US-American. For me, a POV situation I was in, was "obviously"decided in favour of only considering the USA point of view; I let it slide but went to pastures green. The money we raise is for: "keeping the servers going". An objective a bit too limited to my taste but it raises the cash. Money is mainly raised in the USA but in order to be truly global, it is better to raise more equally in every country at least for the amount it cost to serve it. Gapminder is where you may be reminded that money is everywhere. As to the servers, why have all crucial eggs in one USA basket? Given its current politics, there is indeed a potential doom and gloom scenario possible. Having them more dispersed will bring our data closer to our audience, our editors as well. Benefiting them with better performance; that is the easy win. A more complicated solution is in the implementation of the Vrije Universiteit research of a peer to peer MediaWiki.

When our projects are to be less US-American, it is important for spending to be more global too.

When today's Wikipedia practices are no longer considered to be set in stone, we can finally implement features that enable, ensure and enhance its future. First, we should be less self centric; after all there is only one sum of all knowledge and we define only a part of it. Magnus showed how to maintain lists in an efficient way and Amir added recently a "task" to Phabricator to implement proper disambiguation of "red links". We are increasingly aware, not only of the references of all Wikipedias but also of publications by scientists that enable their work to be found. Complement this with the scientific papers we publish and we improve the public relevance of scientists by making them findable, by pointing to their science.

With a changed approach at Wikipedia, we may be bold and change the outlook on what Wikipedia is there for as well. Why not make Wikipedia the gateway to information held elsewhere? Why not show a Scholia page for every scientist we know, why not offer the books at OpenLibrary or inform on the availability of books at the local library?  Why not partner with other organisation we have a shared objective in. But most importantly let us be aware that an African professor teaches in Africa and that we allow for and enable the context of our partners and volunteers.

For me there is no reason for doom and gloom as there are so many opportunities to become even more effective. With a whole new year in front of us; let us do well.
Thanks,
        GerardM