Thursday, February 27, 2020

Balancing arguments - Gender and the #Wikimedia projects

Some say, gender is important because there is a serious imbalance in the reporting on people in Wikipedia. There are many people who dedicate their time to bring some balance by writing Wikipedia articles. At the same time it is important to be cognizant of the fact that gender is not binary; the point it brings is that when you write an article you need a source to know what gender a person identifies with.

So far so good. At Wikidata other things are at play. It is vital to understand that Wikidata items are not so much about an individual, an item. When recipients of an award are included like for "Member of the Hassan II Academy of Sciences and Technologies". There is often nothing more than Moroccans that received an award because a source says so. Determining a gender relies on googling for images of the person and when the name is decidedly male like Omar, Hakim, Mustapha the gender is implied.

Why include a gender? Because projects like Women in Red rely on prospects to write articles about. Because tools like Scholia do express what we know about all the recipients of an award.. It tells us that there are currently two ladies known and 22 gentlemen. We know nothing of their work because the bias against Africa is staggering and because performance for inclusion at Wikidata is abysmal.

The arguments why we should not include gender is often based on what people expect; "Wikidata contains large sets of data and consider that it makes no statistical difference one way or the other". The reality however is that when you consider the use of data in for instance Scholia, the subsets are small. One more fine lady makes a statistical difference.

When people write about a person for a Wikipedia, they do get to know the person, they have multiple sources at hand. At Wikidata not so much. One purpose of adding people is to nibble away at our bias.

Requiring sources to indicate gender is what takes away the usefulness of the data and is counter productive when we are talking bias. For me it is a Wikipedia argument, an article based argument and it is counter productive to translate it to the set based approach of Wikidata.

Saturday, February 15, 2020

Wikipedia consensus? - It is who you ask but what are the facts

An article in VICE starts as follows: 'Wikipedia consensus is that an unedited machine translation, left as a Wikipedia article, is worse than nothing'. This article is problematic in so many ways, it starts with this premise because the Cebuano Wikipedia does not contain machine translation. It contains machine generated text and, to add insult to injury this same article states: 'the majority (generated articles) are surprisingly well constructed'.

An article like this can be sanity checked. Principles come first;
  • This is about a Wiki in contrast to the Nupedia approach. 
  • Wikipedia’s founding goal is to make knowledge freely available online in as many languages as possible.
  • There is a difference between opinions and facts
It is important how arguments are made. When "highly trusted users who specialize in combating vandalism" are introduced and comment that "many articles are created by bots", it does not follow that the quality is low nor that this is to be considered vandalism but the implication is made.

It is a fact that the Cebuano Wikipedia has 5,378,563 articles and also that there are some 16.5 million people who understand Cebuano. There is however no relation between these two facts. More relevant is that the wife of Sverker Johansson has Cebuano as her mother tongue and his two kids learn from their maternal cultural heritage also thanks to the work he does for the Cebuano Wikipedia. That is very much a classic Wiki approach.

In contrast the English Wikipedia has its bot policy preventing the use of bots for generating content. These notions should be local to the English Wikipedia and need not have relevance elsewhere. These highly trusted users can be expected to proselyte this point of view and thanks to this POV they take away a source of information without offering any credible alternative for the existing lack of information available to the rest of the world. At the same time the English Wikipedia is biased in the information it provides and does not provide the same quality of service for the domains selected for the Cebuano Wikipedia.

Sadly the Wikimedia Foundation itself makes no effective difference in support of the "other" languages it is said. An alternative to the LSJbot was introduced and it may be able to make a difference but as it does not provide a public facing service making it very much a paper tiger. Even worse are the Nupedia notions in the combination of two things: "Due to its heavy reliance on Wikidata entries, the quality of content produced is heavily influenced by the quality of the Wikidata available." and "It can discredit other Wikipedia entries related to automatic creation of content or even the Wikipedia quality.” These notions are problematic for several reasons.
  • No information is preferred over little information when our service to an end user is considered
  • Quality of information is framed in the light of existing Wikipedia entries. Whose Wikipedia entries are we considering? They are however irrelevant as our aim is to inform our end users; they do not cover the same subject.
  • When the quality is considered of Wikidata .. Why, it is a wiki and its quality is improving particularly as so many eyes shine their light on it.
  • We can inform, in any and all languages, and we do not even have to call it Wikipedia, we do not even have to save it in a Wikipedia when we only cache the results from the automated text generation.
  • When we cache results of automated text generation, texts can be generated again when the data is expanded or changed.
So far the critique of the VICE article, but then again does English not have its own problems?
  • Its 1,143 administrators and 137,368 active users are struggling to keep up, when you compare it with the 6 administrators and 14 active users for the Cebuano Wikipedia it is understandable that, as they grow, the English have to rely more and more on bots and artificial intelligence.
  • Magnus has demonstrated that the maintenance of lists is better served not by editors but by using the data from Wikidata
  • The Wikipedia technology has a problem with false friends. Arguably some 4% of list entries are wrong because the wrong article is linked to. When links are solidified by using Wikidata identifiers instead, this problem disappears in the same way as the problems with interwiki links disappeared.
The biggest problem "Wikipedia consensus" has is that it was formulated in the past by a tiny in-crowd making up the "accepted" big words for the rest of us and worse they can not be swayed from their POV by facts.

Sunday, February 09, 2020

Dear @krmaher @Wikipedia is not the #flagship to win our war

The virtues of Wikipedia have been expressed in millions of words, on many conferences and in many interviews by you, Jimmy and countless others. Nothing wrong with that. Wikipedia has been extremely useful, it has a dedicated following and it is going exactly nowhere new. What it is expected to bring is more of the same old old.

Wikipedia pundits use their own idiom, have their own values and easily dismiss what does not comply with their notions. Notions not based in actual facts but in opinions.

Our aim was to share in the sum of all knowledge, what we have is a domineering English Wikipedia expecting everything to be shaped in its image. The result is many malfunctioning sister projects that do not get attention "because what is good for the goose is good for the gander". It is not. I can find a picture of the Vasa, a former flagship but not in Commons (it uses the same technology as Wikipedia). There are many books in Wikisource but we do not know what is completed and we do not market these books to a public. When Wikidata was created its first achievement was taking inter wiki links away from Wikipedia providing a functional platform and removing millions of edits from all Wikipedias. So far functionality that does improve on what Wikipedia has is dismissed while facts show how Wikipedia under performs.

The question is if Wikimedia as an organisation is beholden to Wikipedia. If its aspirations are more than only that, it has an obligation to the other projects. It is to find a public for what the finished content of Wikisource. It has to find a public for the biggest open content resource of images making it actually easy and obvious to find pictures of for instance the Vasa. Finally there is Wikidata that is crippled by its own success and hampered from what seems to me to be a lack of organisational attention.

Dear Katherine, I am happy when a technician expresses his plans to mitigate a disaster. He does this within the restrictions he is under. It is however for you, in your capacity as director of the Wikimedia Foundation to express what relevance is given to Wikidata. We have a war chest, we are challenged to take up a new role in the war for factual and balanced information. With only English Wikipedia we have already lost the rest of the world and with English Wikipedia we also have a very biased world view. Never mind, nothing new here.

My question to you, are you aware that Wikidata has no room for growth? Is that acceptable to the Foundation? How are we going to share the sum of the knowledge that is available to us when our flagship is about to sink while sailing out of the harbor?

Saturday, February 08, 2020

The performance of Wikidata - Denis Karuhize Byarugaba

Professor Denis Karuhize Byarugaba is one of the Fellows of the Uganda National Academy of Sciences. At Wikidata we know about papers that he wrote, we know this because of the author strings that point to him.

One of the Scholia tools allows for the disambiguation of these papers by linking to the Wikidata item. It is important that we do because in this way we build on the existence of African scientists on Wikidata.

That is the theory, the practice is that it is increasingly cumbersome to even try to add papers because Wikidata more often than not informs about Too many requests. When this happens occasionally it is fine but when only 10 percent of the requests is honoured, the tool is effectively dead.

Wikidata is the most promissing tool of the Wikimedia Foundation and there is as far as I know no path forward. Obviously it affects people in what they do it affects the projects that are not progressing as fast as they could or should. Even when there is a notion of improved performance it is easily missed because of the pent up demand for much more power. Power to query and power to edit the data. We are not sharing in the sum of all knowledge when it is this hard to make it available.

Saturday, February 01, 2020

Prof Salimata WADE - some thoughts

This picture of professor Wade implies that she received multiple awards, her dress is particular to members of the National Academy of Sciences and Techniques of Senegal and multiple medals show. I added her to Wikidata but the data is sparse, it is better than what is there for most members of this academy of sciences.

In Wikidata we standardise names by having surnames at the end and they have a capital at the end. The result is Salimata Wade not Salimata WADE as you may find on many African websites..

When you google for professor Wade, it is easy to realise that she is quite notable.. It is easy even when you don't get much from French. There is work of professor Wade to find in Wikidata but attributing her work takes too much effort. She does not have an ORCiD id nor a Google Scholar ID. It is only because of googled texts that you feel safe to use quickstatements for what you find. It is super slow going but it is what you do when you expose what is possible.

Adding her papers should affect a change in two places on my African Science scaffolds, Wikipedia administrators permitting, the Listeria bot seems to be blocked for whatever reason.. Then again, other pages using the same bot are not..

When you consider the ratio of males / females it is 64 / 8. When you consider the ratio of Wikipedia articles I expect a quite different ratio. I do not know how to effectively make us a ration of US or UK scientists and compare that with African scientists. One reason is that I typically do not add nationality and I know the flaws in attributing a nationality to US scientists.. Whatever approach, Africa will show to be underrepresented both in Wikipedia and Wikidata.. Without the scaffolding, the preliminary data, there is no data approach to this.. No data means no clue.

Anyway, for countries like Senegal it makes sense to add the scaffolding to the French Wikipedia..