Thursday, May 28, 2020

@WikiCommons - Sarah T. Roberts versus Sarah T. Roberts

I have a renewed interest in Commons because the first steps have been made to make it actually useful. According to Wikidata there are two distinct Sarah T. Roberts. One is an epidemiologist the other is into information & media studies.

At Commons it was a mess, the picture of Sarah was used to illustrate an info box of the other Sarah. It is not that interesting to tell you how I did what. Relevant is that I did. I did because you will will find things when there is a label for whatever in "your" language..

Given that we do not research the use of Commons or Wikidata for that matter, why should the WMF give priority to opening up Commons even further? After all, there is no data to support it..
Thanks,
      GerardM

Tuesday, May 26, 2020

@WikiCommons - Meanwhile in a school in India, Japan, Russia

These students in India have to do a project. The subject is Botswana. Their teacher wants them to find many pictures so he searched Wikimedia Commons among others for pictures of  Mokgweetsi Masisi, the president of Botswana. He marked the pictures that depicts Mr Masisi and now his pupils will find more pictures of him when they look for मोकेगसेसी मासी.

At the same time in Japan students have to do a project about Botswana. Their teacher is pleasantly surprised when he find so many pictures for モクウィツィ・マシシ...
Thanks,
       GerardM

Monday, May 25, 2020

@WikiCommons - meanwhile in a different universe

And again there was a discussion that it should not be this hard to find pictures in Commons. The big difference this time is that there is now a wealth of images that have been tagged for what they "depict". They are linked to Wikidata items and they have a wealth of labels in many, many languages. In essence it has always been an objective of Wikidata to share its content in any and all of the 300+ languages supported by a Wikipedia.

The ideas that floated around soon made it into a "proof of concept" and as so often it actually worked after a fashion. The first iteration was in true Wikimedia tradition English only. The proof of concept got its second language in Dutch, Hay Kranen the developer is Dutch. Now there are nine languages and we are waiting for French to be the tenth.

So what does it do. You can look for pictures in Commons, it has 61 million media files, and when you are looking for available pictures in your language, you will find it as long as Wikidata has a label in your language.  This is for instance a result in Japanese and this is the result in German.

What can you do to make it better? Add labels in your language for the things you want to find and find media files that depicts what you are looking for. When nobody translated the software in your language, you can even do that.

Why is this so relevant? Have you ever wondered how many pictures you find in one of the smaller languages using Google or Bing? Let me tell you, it is disappointing to be polite. Commons is the repository of the mediafiles that illustrate all the Wikipedias so yes, it covers "almost anything".

The Wikimedia Foundation has this big strategy for its movement to be inclusive. This is a wonderful opportunity to show how agile it is, that it understands and supports a need that has been expressed for many many years. The beauty is the the way forward has been expressed in something that already works.

ABSOLUTELY, there will be challenges in integrating this functionality where it fulfills a need.

Luckily it is not necessary for it all to be done in one go. The first step can be as little as to take the "proof of concept" an rewrite it in the preferred language of the WMF, internationalise and localise it and keep it stand alone for now. The people who know about it will use it and they will be the first to point out what more they want to be done. A priority will be to retain its KISSable nature.

The objective is to open up Commons. Open it up in any and all languages. For me it is obvious. I will gladly give it my attention in the expectation that both Wikidata and Commons actually find a public, have a purpose that is more than what we do for ourselves.
Thanks,
      GerardM

Sunday, May 03, 2020

These scientists saw the coronavirus coming. Now they're trying to stop the next pandemic before it starts.

When you read an article with the same title as this blog post, it is one among many clamoring for attention. There is so much that can be qualified as not worth your time. In this blogpost I describe my way of adding value for articles that I think are worthwhile.

What I do is look for people in the article. In this article it is a Jonathan Epstein. The first thing is to look for Jonathan in Wikidata. Disambiguation is the name of the game and, finding candidates who might be Jonathan is the first step. Jonathan proved to be Jonathan H Epstein, there was also a Jonathan H. Epstein. Because of sharing characteristics they could be merged. Vital in this are authority identifiers and links to papers that make it reasonable to assume that they are the same person. It is helpful when Jonathan is part of the disambiguation list when people look for "Jonathan Epstein" so it is added as an alias.

The next step is to enrich the data about Jonathan P.. Authorities may identify where he works and from the website of Columbia university additional information is digested into Wikidata statements, information like the alma maters. In Wikidata many authors are only known as "author name strings", meaning they are only known as text. With available tooling, papers are linked to Q88406948, the identifier for our Jonathan.

After these steps, there is a reasonable impression of the relevance of Jonathan as a scholar and this supports the likelihood that the article that cites him can be trusted. Do this for others presented as authorities in an article and by repeating the process you provide a way for Wikidata to become a source that helps identify fake news.
Thanks,
      GerardM

Sunday, April 19, 2020

@Wikimedia interconnection, what it looks like for me

On twitter a reference was made to an article in the Sunday Times. The article is about the response of the UK government to the COVID-19 pandemic. It mentions many people and mentions their roles.

It is up to you to have your own opinion, but most if not all people are known in Wikidata, some have a Wikipedia article and all of them are in the spotlight. So when you get an edited sound bite, when you want to know if someone is "for real", it helps when you can turn to Wikimedia and find what there is to know.

This sound bite about "herd immunity" is too short to be properly understood. The argument made is that herd immunity is all that we have now that the genie is out of the bottle and, who can argue with that? Read the article as well.. After some tinkering, the Scholia for Prof Edmunds shows some 235 papers, many co-authors and still, even more co-authors are missing. The subjects he covered are extensive.. check out that Scholia. Prof Edmunds takes/tool part in UK government deliberations; it is mentioned in that Sunday Times article. He is asked to explain epidemiology to the public.

Wikimedia interconnection for me is to enrich our existing knowledge in cases like this. Tweeting about it, blogging about it may lead to even more and better information like a Wikipedia article. What we as Wikimedians do does not happen in a vacuum, connecting to what happens and who the players are help us and our readers understand who they are in  these early days of the COVID-19 pandemic.
Thanks,
      GerardM

Monday, April 13, 2020

The CDC and its National Center for Immunization and Respiratory Diseases

Because of the COVID-19 pandemic, there is so much attention to every aspect of it; the epidemiology, virology, vaccination, co-morbidity. Mix it with a heady mix of economics, profiteering and graft and what are you to think of it all. What is fact and what is not.

When I read that there is an "Outbreak Management Team" in the Netherlands, an advisory body to the Dutch government, I had a look. I added all the known scientists to Wikidata, looked for "authority identifiers" and attributed some of the papers that are likely theirs to them. It generated a really nice Scholia for them and the team as well.

At first I wanted to do similar European organisations but it takes quite some effort to find them. So I took the easy route and went for the CDC. Its organisational chart contains a wealth or smaller orgs among the the NCIRD and it has its own organisational chart. I did the same routine, adding the obvious scientists to Wikidata, looked for the authority identifiers for them, attributed papers.

The best bit? While adding people one at a time, you see how the Scholia evolves. Authors are reordered based on their number of papers, you find the ones that are co-authors and colleagues. The latest papers are shown first.. It is nice. However, this is management only, I cannot wait and see it evolve as staff finds its place in the Scholia as well.
Thanks,
     GerardM

Sunday, April 12, 2020

False friends and ListeriaBot - finding a way out of an impasse

ListeriaBot is a bot that maintains lists based on information in Wikidata. In this blogpost I will explain what a Listeria list is, what it is used for. I will point out its qualitative benefits and explain how Listeria can be instrumental to limit bias, stimulate collaboration and help us share in the sum of the knowledge available for us.

The heart of a Listeria list is a query. In this query it is defined what data is retrieved from Wikidata, it includes the order of presentation and shows this information in a language depending on the availability of labels.

Listeria lists are defined only once and every day a job run by the ListeriaBot updates all lists with the latest data from Wikidata. In this way available information is provided even when articles are still to be written. When there is an article to read, the label is shown in the upright position, when there is not is shows in cursive.

The biggest difference between a Wikipedia list and a Listeria list? No false friends. When you seek a specific "Rebecca Cunnigham", it is really powerful to know that your Prof Cunningham will always be known as Q77527827 and is also authoritatively known by other identifiers. From a qualitative point of view, particularly in lists, red links even blue links such disambiguation is a big thing. At this time a typical Wikipedia list has an error rate because of disambiguation issues of around 4%. I frequently blogged about this, the Listeria list I often referred to is for the George Polk award.

Maintenance is another reason to choose for Listeria lists. This was documented by Magnus, a list was maintained up to a point in time as a Listeria list and for all the wrong reasons human qualities were to prevail. Magnus compared the results after some time and the human maintained list proved to be the poorly maintained list.

Categories are lists of a kind, for many categories it is defined what they contain. Consequently Wikidata is easily updated from Wikipedias and can serve as a source for updating categories as well.

Ok, the impasse. ListeriaBot is blocked because of a false friend issue. The objective is to find a resolution that will benefit us all. The false friend issue is that images can have a same name in both Wikimedia Commons and in English Wikipedia. The existing algorithm for showing pictures is that local pictures take precedence. When ListeriaBot is to do things differently, it can. Thanks to the wikidatification at Commons, we can indicate with a Wikidata identifier what a picture "depicts". Wikidatification of images can also be introduced for pictures at English Wikipedia and it is then becomes easy to always show what Commons has unless a preference is given to show a specific image for a particular project.

I have been told that I do not assume good faith. When I see the extend people care to go to resolve this issue I am only amused. The objective of what we do is share in the sum of all knowledge and do this in a collaborative way.

English Wikipedia fails spectacularly by assuming that their perceived consensus is in the best interest of what we aim to achieve. There is no reflection on the quality brought by Listeria, there is no reflection on how its quality can substantially be improved. I fail to understand what they achieve except for feeling safe by insisting on dated practices and dated points of view.

I wish we could be one community that is known by a best of breed effort with one common goal; sharing the sum of all the knowledge that is available to us.
Thanks,
        GerardM

Friday, April 10, 2020

When crossing the street in the days of Corona, look left, right and left again

Many of us are at home, waiting to go out. We are all obsessed with the latest statistics and read what pundits have to say.  It is likely that you are cognizant of the statistics for your country, state or county.

I learned that Jonathan P. Tennant died in a traffic accident. When you care for statistics, you will wonder what are my chances of dying in a traffic accident at this time. Deduct it from your chances of dying of Corona and things look up.

Not so much for Protohedgehog, he met with an accident. It is sad, he was young, full of promise; just became a member of the Global Young Academy. If anything, it serves as a reminder for us to look left, right and left again to not become a bus factor.
Thanks,
     GerardM

Sunday, April 05, 2020

Edwin G. Abel aka Ed Abel

Professor E.G. Abel came on my radar because he is a recipient of the Daniel X. Freedman Award. He has a Wikipedia article as "Ed Abel" and the information of the award has him as "Edwin G. Abel".

I looked into the Freedman award because of a criticism on the Wikipedia article of Professor Montegia. The superior article of Prof Montegia is criticised because it is an orphan. It now has a Scholia template and that links the 105 scholarly papers known in Wikidata. Its timeline does include the Freedman award linking the Professors Abel en Montegia.

I doubt it is considered enough to remove the orphan template. I have added a redirect for the Freedman award to the issuing organisation. Maintaining a Wikipedia list is not one of my ambitions.. It could be a Listeria list like this one..
Thanks,
      GerardM

Sunday, March 15, 2020

#SwineFlue management with #wolves

A lot is being said about viruses and pandemics, they do not only exist in humans but also in animals particularly in kept animals. One knee jerk reaction is that by an outbreak of a disease animals in nature are blamed.

A good example is swine flue and African swine flue. It is a tradition to call for the culling, the extermination of wild boar and, traditionally the result is an increase in boar being killed.

A real solution may be found in an ecological solution, wolves who predate on boar prefer a sickly animal over a healthy animal that is better able to fight back. There is documentation of wolves determining the extend of outbreaks of a swine flue. Areas with wolves do better.

As an apex hunter the effects of wolves on its ecology are profound. There are all kinds of arguments why people oppose the reintroduction of animals that are essential for a functional ecology, animals like wild boar, beaver, wolf are extinct in places. We argue that we need more trees to offset climate change but this will not work when those trees are not placed in a functioning ecology. In Scotland trees will not grow because they will be eaten by overabundant elk.. Scotland has no functioning ecology it lacks predators like wolves and lynx to keep the elk in check.

When we consider pandemics, viral diseases, our ecology it is important to consider our own effects. We will do better when we enable ecological functionality and consider building with nature for more sustainable results.
Thanks,
        GerardM

Wednesday, March 04, 2020

@Wikipedia; the dread that is one identity that binds us all

On Twitter Janeen Uzzell praised a blogpost that is the Wikimedia Foundation All Hands: 2020 Sketchbook and indeed it informs about current thinking, most of it is great and still, I find it absolutely terrifying.

There are several great sketches in there. Katherine Maher gave an asperational talk, I love it for Wikimedia to be seen as infrastructural, inclusive and even that that what we do does not have to be in our projects. Important is that she mentions "support systems" because they provide the input for much of our processes.

Important is the page on security and risk. All the important concepts are mentioned among them; likelihood, relative impact and management preparedness but also "plan for and mitigate risks".

What truly makes me uneasy is when it is said that we aim to clarify who we are in the world in one brand, Wikipedia. The idea is that when we are all branded as Wikipedia, things are likely to become easier. When you check out the website brandingwikipedia.org there is no argument; Wikipedia is free knowledge. When you check out what it is to do
  • project and improve our reputation
  • support our movement/growth
  • be opt-in
In the abstract Wikipedia IS wonderful, in reality the concept of what Wikipedia is, is largely determined by the English Wikipedia. It it is fiercely independent, it is hardly inclusive and it has largely determined the maneuvering space the Wikimedia Foundation has. In order to "plan for and mitigate risks", I will mention several reasons why I am anxious because of this branding initiative.
  • In the Commons OTRS they use English Wikipedia notions to determine if pictures can stay or are to be removed. Commons provides a service to all Wikimedia projects
  • The query functionality for Commons is maintained by people from the Foundation. For more than half a year it puts a strain on the growth and usefulness of Wikidata. Tools have become glacially slow and often malfunction because an edit is not available when needed in further processing. It is not known what the position of the WMF director is in this
  • This is about marketing and we have never done much marketing for any of our projects. What we have done was reactive and has been all about the English Wikipedia. Now consider this:
    • Wikisource, we do not know what is available at what quality, it is all about editing and not about having people read the finished article, consequently we do not value Wikisource and fulfill its potential.
    • So far Commons has always been English only. With the support of the "depicts" functionality, there is room to enable and market  a multilingual search engine. In the spirit of "it is a Wiki", it serves as an open invite to add labels in any and all of our languages and open up what Commons has to offer. It is how to market free content the Wiki way.
    • In Wikidata we know many more concepts than what we know in any individual Wikipedias. We could use our data and inform as we have done for years in multilingual tools like Reasonator. This is an example in English Russian Chinese and Kannada. NB it takes additional labels to improve results and consequently this is the inclusive approach.
    • When Wikipedians were willing to reflect on their own performance, we could help them solve their false friends issues.
One sketch in the sketchbook is a presentation by Jess Wade. It says that even Academia is biased. As the Wikimedia community we do not need to be subservient to any bias and most certainly not the bias that Wikipedia has brought us.

Tuesday, March 03, 2020

"Building with Nature" .. a case for a beaver solution

The Markermeer is a lake with an ecological problem; the water is cloudy, plants and mussels do not grow. In order to alleviate that problem, the Marker Wadden was developed and in order to future proof the Houtribdijk the same "building with nature" concepts are used; the extensive water features will enable the growth of plants and the intended result is not only that the water will be clear again but also that the dyke will better withstand future storms.

With ecology part of the solution, it is relevant to appreciate ecology as part of a solution for open issues. There are two open issues: geese and willows. So far, geese are kept at bay at some areas with fences and young willows are being rooted out by volunteers.

When willows are allowed to grow, they will mature quickly and enable the next ecological succession. The wood and bark provides food and building material for beavers and this makes for an even more robust defense against storm damage. Some trees will mature anyway and this provides natural nesting places for white tailed eagles. Given that the wels catfish is endemic in the Markermeer, it will find its place among the Marker wadden and it may even predate on the over abundant geese.

So given that Natuurmonumenten, the organisation looking after the Marker Wadden is happy about beavers in its terrains, maybe it is the "building with nature" engineers who have to consider succession in their deliberations.
Thanks,
      GerardM

Thursday, February 27, 2020

Balancing arguments - Gender and the #Wikimedia projects

Some say, gender is important because there is a serious imbalance in the reporting on people in Wikipedia. There are many people who dedicate their time to bring some balance by writing Wikipedia articles. At the same time it is important to be cognizant of the fact that gender is not binary; the point it brings is that when you write an article you need a source to know what gender a person identifies with.

So far so good. At Wikidata other things are at play. It is vital to understand that Wikidata items are not so much about an individual, an item. When recipients of an award are included like for "Member of the Hassan II Academy of Sciences and Technologies". There is often nothing more than Moroccans that received an award because a source says so. Determining a gender relies on googling for images of the person and when the name is decidedly male like Omar, Hakim, Mustapha the gender is implied.

Why include a gender? Because projects like Women in Red rely on prospects to write articles about. Because tools like Scholia do express what we know about all the recipients of an award.. It tells us that there are currently two ladies known and 22 gentlemen. We know nothing of their work because the bias against Africa is staggering and because performance for inclusion at Wikidata is abysmal.

The arguments why we should not include gender is often based on what people expect; "Wikidata contains large sets of data and consider that it makes no statistical difference one way or the other". The reality however is that when you consider the use of data in for instance Scholia, the subsets are small. One more fine lady makes a statistical difference.

When people write about a person for a Wikipedia, they do get to know the person, they have multiple sources at hand. At Wikidata not so much. One purpose of adding people is to nibble away at our bias.

Requiring sources to indicate gender is what takes away the usefulness of the data and is counter productive when we are talking bias. For me it is a Wikipedia argument, an article based argument and it is counter productive to translate it to the set based approach of Wikidata.
Thanks,
       GerardM

Saturday, February 15, 2020

Wikipedia consensus? - It is who you ask but what are the facts

An article in VICE starts as follows: 'Wikipedia consensus is that an unedited machine translation, left as a Wikipedia article, is worse than nothing'. This article is problematic in so many ways, it starts with this premise because the Cebuano Wikipedia does not contain machine translation. It contains machine generated text and, to add insult to injury this same article states: 'the majority (generated articles) are surprisingly well constructed'.

An article like this can be sanity checked. Principles come first;
  • This is about a Wiki in contrast to the Nupedia approach. 
  • Wikipedia’s founding goal is to make knowledge freely available online in as many languages as possible.
  • There is a difference between opinions and facts
It is important how arguments are made. When "highly trusted users who specialize in combating vandalism" are introduced and comment that "many articles are created by bots", it does not follow that the quality is low nor that this is to be considered vandalism but the implication is made.

It is a fact that the Cebuano Wikipedia has 5,378,563 articles and also that there are some 16.5 million people who understand Cebuano. There is however no relation between these two facts. More relevant is that the wife of Sverker Johansson has Cebuano as her mother tongue and his two kids learn from their maternal cultural heritage also thanks to the work he does for the Cebuano Wikipedia. That is very much a classic Wiki approach.

In contrast the English Wikipedia has its bot policy preventing the use of bots for generating content. These notions should be local to the English Wikipedia and need not have relevance elsewhere. These highly trusted users can be expected to proselyte this point of view and thanks to this POV they take away a source of information without offering any credible alternative for the existing lack of information available to the rest of the world. At the same time the English Wikipedia is biased in the information it provides and does not provide the same quality of service for the domains selected for the Cebuano Wikipedia.

Sadly the Wikimedia Foundation itself makes no effective difference in support of the "other" languages it is said. An alternative to the LSJbot was introduced and it may be able to make a difference but as it does not provide a public facing service making it very much a paper tiger. Even worse are the Nupedia notions in the combination of two things: "Due to its heavy reliance on Wikidata entries, the quality of content produced is heavily influenced by the quality of the Wikidata available." and "It can discredit other Wikipedia entries related to automatic creation of content or even the Wikipedia quality.” These notions are problematic for several reasons.
  • No information is preferred over little information when our service to an end user is considered
  • Quality of information is framed in the light of existing Wikipedia entries. Whose Wikipedia entries are we considering? They are however irrelevant as our aim is to inform our end users; they do not cover the same subject.
  • When the quality is considered of Wikidata .. Why, it is a wiki and its quality is improving particularly as so many eyes shine their light on it.
  • We can inform, in any and all languages, and we do not even have to call it Wikipedia, we do not even have to save it in a Wikipedia when we only cache the results from the automated text generation.
  • When we cache results of automated text generation, texts can be generated again when the data is expanded or changed.
So far the critique of the VICE article, but then again does English not have its own problems?
  • Its 1,143 administrators and 137,368 active users are struggling to keep up, when you compare it with the 6 administrators and 14 active users for the Cebuano Wikipedia it is understandable that, as they grow, the English have to rely more and more on bots and artificial intelligence.
  • Magnus has demonstrated that the maintenance of lists is better served not by editors but by using the data from Wikidata
  • The Wikipedia technology has a problem with false friends. Arguably some 4% of list entries are wrong because the wrong article is linked to. When links are solidified by using Wikidata identifiers instead, this problem disappears in the same way as the problems with interwiki links disappeared.
The biggest problem "Wikipedia consensus" has is that it was formulated in the past by a tiny in-crowd making up the "accepted" big words for the rest of us and worse they can not be swayed from their POV by facts.
Thanks,
      GerardM

Sunday, February 09, 2020

Dear @krmaher @Wikipedia is not the #flagship to win our war

The virtues of Wikipedia have been expressed in millions of words, on many conferences and in many interviews by you, Jimmy and countless others. Nothing wrong with that. Wikipedia has been extremely useful, it has a dedicated following and it is going exactly nowhere new. What it is expected to bring is more of the same old old.

Wikipedia pundits use their own idiom, have their own values and easily dismiss what does not comply with their notions. Notions not based in actual facts but in opinions.

Our aim was to share in the sum of all knowledge, what we have is a domineering English Wikipedia expecting everything to be shaped in its image. The result is many malfunctioning sister projects that do not get attention "because what is good for the goose is good for the gander". It is not. I can find a picture of the Vasa, a former flagship but not in Commons (it uses the same technology as Wikipedia). There are many books in Wikisource but we do not know what is completed and we do not market these books to a public. When Wikidata was created its first achievement was taking inter wiki links away from Wikipedia providing a functional platform and removing millions of edits from all Wikipedias. So far functionality that does improve on what Wikipedia has is dismissed while facts show how Wikipedia under performs.

The question is if Wikimedia as an organisation is beholden to Wikipedia. If its aspirations are more than only that, it has an obligation to the other projects. It is to find a public for what the finished content of Wikisource. It has to find a public for the biggest open content resource of images making it actually easy and obvious to find pictures of for instance the Vasa. Finally there is Wikidata that is crippled by its own success and hampered from what seems to me to be a lack of organisational attention.

Dear Katherine, I am happy when a technician expresses his plans to mitigate a disaster. He does this within the restrictions he is under. It is however for you, in your capacity as director of the Wikimedia Foundation to express what relevance is given to Wikidata. We have a war chest, we are challenged to take up a new role in the war for factual and balanced information. With only English Wikipedia we have already lost the rest of the world and with English Wikipedia we also have a very biased world view. Never mind, nothing new here.

My question to you, are you aware that Wikidata has no room for growth? Is that acceptable to the Foundation? How are we going to share the sum of the knowledge that is available to us when our flagship is about to sink while sailing out of the harbor?
Thanks,
      GerardM

Saturday, February 08, 2020

The performance of Wikidata - Denis Karuhize Byarugaba

Professor Denis Karuhize Byarugaba is one of the Fellows of the Uganda National Academy of Sciences. At Wikidata we know about papers that he wrote, we know this because of the author strings that point to him.

One of the Scholia tools allows for the disambiguation of these papers by linking to the Wikidata item. It is important that we do because in this way we build on the existence of African scientists on Wikidata.

That is the theory, the practice is that it is increasingly cumbersome to even try to add papers because Wikidata more often than not informs about Too many requests. When this happens occasionally it is fine but when only 10 percent of the requests is honoured, the tool is effectively dead.

Wikidata is the most promissing tool of the Wikimedia Foundation and there is as far as I know no path forward. Obviously it affects people in what they do it affects the projects that are not progressing as fast as they could or should. Even when there is a notion of improved performance it is easily missed because of the pent up demand for much more power. Power to query and power to edit the data. We are not sharing in the sum of all knowledge when it is this hard to make it available.
Thanks,
      GerardM

Saturday, February 01, 2020

Prof Salimata WADE - some thoughts

This picture of professor Wade implies that she received multiple awards, her dress is particular to members of the National Academy of Sciences and Techniques of Senegal and multiple medals show. I added her to Wikidata but the data is sparse, it is better than what is there for most members of this academy of sciences.

In Wikidata we standardise names by having surnames at the end and they have a capital at the end. The result is Salimata Wade not Salimata WADE as you may find on many African websites..

When you google for professor Wade, it is easy to realise that she is quite notable.. It is easy even when you don't get much from French. There is work of professor Wade to find in Wikidata but attributing her work takes too much effort. She does not have an ORCiD id nor a Google Scholar ID. It is only because of googled texts that you feel safe to use quickstatements for what you find. It is super slow going but it is what you do when you expose what is possible.

Adding her papers should affect a change in two places on my African Science scaffolds, Wikipedia administrators permitting, the Listeria bot seems to be blocked for whatever reason.. Then again, other pages using the same bot are not..

When you consider the ratio of males / females it is 64 / 8. When you consider the ratio of Wikipedia articles I expect a quite different ratio. I do not know how to effectively make us a ration of US or UK scientists and compare that with African scientists. One reason is that I typically do not add nationality and I know the flaws in attributing a nationality to US scientists.. Whatever approach, Africa will show to be underrepresented both in Wikipedia and Wikidata.. Without the scaffolding, the preliminary data, there is no data approach to this.. No data means no clue.

Anyway, for countries like Senegal it makes sense to add the scaffolding to the French Wikipedia..
Thanks,
        GerardM


Sunday, January 12, 2020

Science and Africa - what colloboration exists and how do we know?

As I am adding large amount of African scientists to Wikidata, I find that I have moved into a green field. A green field as far as Wikipedia and Wikidata are concerned.

To learn about how the information about African science evolves in Wikidata, I created Listeria lists that inform about universities by country, fellows/member of academies of science and members of African young science organisations.

What I produce is a scaffolding; basic information that enables. The information that I use from the Royal Society of South Africa for its fellows includes dates, other awards, employers and even dates of death. Slowly but surely more information is being added for these people and consequently you will also find for, for instance Rhodes University, more employees and additional papers (currently only 1385 papers for its 84 scholars are known).

A scholar like Tebello Nyokong, a Rhodes scholar, has 637 papers to her name. She is a world class scientist and has four Wikipedia articles to her name. All kinds of questions may be queried for her co-authors; the gender distribution, the organisations they represent, the nationality of the co-authors.

Obviously, African science is not well represented at this time. This is a reflection of how people perceive and value African science... In essence it reflects a bias of regular Wikimedia editors. The regular Wikimedia editors are in the west, they have no reason to consider African science but this is a bias. It is highly likely that it will be hard to get Wikipedia articles accepted for African scientists because of a lack of sources and probably a lack of this perceived Western relevance.

Adding one scientist at a time does not make much of a difference. When scientists are added as part of a SourceMD process, any and all scientists who have a public ORCiD profile are likely to get included in Wikidata. This is why so many African scientist are already known. When a notable scientist is then recognised as a recipient of an award, we may already know about the papers they authored.

The SourceMD process is no longer available. It coincides with a lack of resources at Wikidata so any and all resources used for science papers are now available to something else. Understandable, but the result is that I am no longer motivated to seek ORCiD identifiers and consequently, the process is increasingly broken.
Thanks,
      GerardM

Thursday, January 02, 2020

Scaffolding in Wikidata - the Christiaan Hendrik Persoon Medal

Professor Brenda D. Wingfield is another scientist who won an award Wikidata knew nothing about. This time the Christiaan Hendrik Persoon Medal. The Wikidata basics for an award are its name, the conferring organisation and a website with details. A bonus is when there is a link to a person, an organisation the award is named after..

The Persoon Medal is a South African award, its conferring org is new to Wikidata as well. In an article about Prof B.D. Wingfield, all the other recipients are named as well; there are only six in a time span of 53 years so it was not hard to add them all.

The objective is to make connections. For both the award and Prof Wingfield, connections shows best in a Scholia. One of the more frequent co-authors is a Michael J. Wingfield.. He is a co-recipient of the Persoon Medal, a co-member of the Member of the Academy of Science of South Africa and, a duplicate of Mike Wingfield and yes, he is the spouse of Prof B.D. Wingfield as well. He is at this time Q73879566 in the Scholia waiting for papers to be attributed to the earlier item.

Another frequent co-author, Bernard Slippers, is known from a different context. He is both a member of the South African Young Academy of Science and the Global Young Academy. Given some personal connections it was easy to ask if by chance Prof Wingfield is his doctoral advisor (he is the primary, they both are).

The point of scaffolding is that it provides the structures that enable finding the data, preferably in a context. Given that most data is static, the static representation that is Listeria is really powerful. When you group them like I did for African science or Young Academies, you get the satisfaction of understanding what work is done/has been done on a subject. The icing on the cake is when you enable collaboration. I am grateful for Robert Lepenies to pick up the lead and inform other young academies for what Wikidata may mean for them. I am grateful for Daniel Mietchen for improving on the queries I use; they now show the number of publications known for each scientists and a link to the tool that enables attribution to that scientists.

My role is a simple one. I add data. Data that connects, gives relevance but most importantly data that may be picked up in queries, lists by others. The scaffolds are made by others, relevance only happens when others pick it up. My point: there has to be something to pick up.
Thanks and happy new year,
          GerardM