Sunday, August 12, 2018

#Knowledge - three types of knowledge and why "academic" is only one and overrated

There are three types of knowledge; they are academic, professional and knowledge from experience. The scheme to the right was published by Jaap van der Stel. He works in the field of psychiatry and is known for his work on addiction in combination with the use of peers in the recovery from addiction.

In the Wikimedia world, we insist on the primacy of academic knowledge and up to a point it serves us well. Operationally it means that much of the studies are done outside of the WMF, they may point out whatever but they hardly ever make an operational difference. When the internal WMF researchers study a subject, they are typically directed to study particular phenomena and it may point to operational issues. Issues that are either addressed by the WMF itself or adopted by the community.

When scientists make a compilation of all the sources in all the Wikipedias, it is academic work when the result is static. It may indicate what sources are used multiple times but it does not help any editor weed out sources that are biased or false. Magnus started work on a tool that knows about all the sources in two Wikipedia and Wikispecies.  It is updated in real time and that  gives it valid operational credentials.

I know from experience that there are issues with source information as we have it in Wikidata. We cannot invalidate sources by reference. We are only strong in the biomedical field and adding new information is not at all user friendly.

Now this user experience does not get much of a priority for valid operational reasons but the effect is that Wikidata is only useful for the geeks. Its lack of usability prevents its data to be used on Wikipedias in the "other" languages. It is where there is little or no academic nor operational interest.

Saturday, August 11, 2018

#GenderGap - The Gineta Sagan Award (and others)

The Ginetta Sagan award is conferred by Amnesty International USA. It is an annual award, the last recipient according to Wikidata when I looked at it received it in 2014, English Wikipedia has the award as part of the article on Ginetta Sagan and has information including 2017 (when you read the texts, you will find how notable these people are and, by inference the people without an article).

Arguably, there is a lack of balance between the number of men and the number of women having an article in any Wikipedia. This is known as the "gender gap" and the "women in red" project works to great effect to improve that balance. There is no lack of fine notable ladies who have no article.

I am really happy to present two queries. The first query shows women who won an award with no article at all (2502 results). The second shows women who won an award with no article in the English language (29083 results).

Let these women be an inspiration to you.

Monday, August 06, 2018

#MADinAmerica - cause and effect

MAD in America is an organisation about mental health, particularly in America. Their take is that there is a lot that can be improved. The part that I am mostly interested in is that they highlight the science that tells you how the science behind many mental health practices fails scrutiny.

One publication they recently highlighted is about brain abnormalities by people with schizophrenia. Current wisdom has it that "cortical thickness and surface area abnormalities in schizophrenia" is indicative of schizophrenia. This paper compares people with schizophrenia who were medicated and people who were not medicated. The research shows that these differences are due to the medication.

Adding a paper like this in Wikidata is easy. Making it stand out for its results is not. The paper probably indicates previous research that it debunks but how do you model that. When papers like this are to be used as sources, how do you ensure that it is even considered?

NB the first author is employed by the University of California, Irvine

Sunday, August 05, 2018

#Citations - "Verlorene Siegen" and #Wikipedia

The publication The Battle for Wikipedia: The New Age of ‘Lost Victories’? writes about debunked knowledge but used as sources in Wikipedia. Lost Victories is a book by Erich von Manstein, a German military officer convicted at the Nuremberg trials. He served as a witness and there is strong evidence that he perjured himself. He was sentenced to eighteen years in prison

This publication is not only of academic interest. In this day and age where fake facts and science are pervasive, it is a reminder that Wikipedia is a battle ground where debunked sources are used to prove a non-neutral point of view.

One of the objectives of the Wikimedia Foundation is to combat fake facts and use make citations operational as a tool. The main trust will be by adding sources to Wikidata. "Verlorene Siegen" obviously was present but even though there is a large body of work debunking this book, there was nothing to refer either to Mr von Manstein or his book in a critical way.

It was easy enough to add a few individual sources but it takes time. For analysis of sources used in Wikipedia there are dumps containing all the citations of all Wikipedias and now Magnus has started on a tool that initially includes real time sources for the German, English Wikipedia and Wikispecies. Of these publications 36% are linked to Wikidata and this provides a great start but it will take more. We need to know what papers debunked what knowledge. We need to know what papers a Retraction Watch is critical of, or what the relevance of a paper is according to the Cochrane Database of Systematic Reviews because that is how their facts are operationalised. We need to know because that is one way to debunk fake facts.

Saturday, August 04, 2018

#Wikidata - User versus bot updates and #Scholia

These are the aggregated subjects that are associated with all the papers for the winners of the Fields Medal. Given that there are some 60 award winners for the most prestigious award in the field of mathematics, this is not a representative reflection. That is not a problem, that is an opportunity.

I added one paper, "Singularities of linear systems and boundedness of Fano varieties". Given the title, I added "Fano variety" and "Linear system" as subjects. This made no difference in the Scholia tool and after some five minutes I asked what was happening. I was told that it takes a large interval before the data in the Toolserver get updated.

Typically, information about papers are added by bot. Not so much for mathematics but still. Mr Birkar for instance has only two papers in Wikidata at this time and for the other paper no subjects are given. When you add data by hand, instant gratification or instant visibility is important as it is a potent motivator.

The best reflection of work done in Wikidata is not given by Wikidata itself. It is either by tools like Scholia or Reasonator or it is by query. When query does give instant gratification, it has much of its potency because of the instant gratification.

Tools have one important benefit over query; it provides a standard layout for the information. Queries are potent and many people contributing content to Wikidata use it in tools like Petscan. But in reality, the typical difference between one query and the next are only in the qualifiers.

At this time the best user experience is given by tools. It often suffers from a time lag and this is of little relevance to bots. For humans though it is different.

Monday, July 30, 2018

#Wikidata - #Skills and tools needed to add #awards

Adding awards to Wikidata is one way to signal notable people who could do with some tender loving care, maybe even an article. Typically it starts with someone who was awarded. This time, three fine ladies: Kay Davies, Alice Rogers and Sarah Cleaveland.

Mrs Davies received the "Croonian Medal and Lecture", an award conferred by the Royal Society. The award was already known as the "Croonian Lecture" and the Wikipedia article contains a long list of recipients. The website included a few recipients and consequently the award was positively identified.

Thanks to Reasonator, I easily navigated from Mrs Davies to the award, I noticed how few recipients were known and using text from the article in combination with the Awarder tool I started adding 250+ recipients within ten minutes. Other awards like the Harveian Oration are still missing in Wikidata.

Mrs Rogers received the "Kavli Education Medal", only four recipients so far. The name Kavli in combination with awards proved a bit ambigue; finding the correct medal was the one challenge. One recipient was missing a Mrs Margaret Brown, easy enough to add her as well. The mother of Mrs Rogers was said to be a very accomplished mathematician of Bletchley Park fame but details are lacking.

Sarah Cleaveland received many awards; of interest are two because they are not linked. She was the first woman to win the Trevor Blackburn Award in 2008. The other, the Leeuwenhoek Medal and Lecture in 2018, got my attention. It did not have many recipients and again, the Awarder tool made it easy to add many missing recipients. There are several red links, I did not add them this time.

When I am to add the Trevor Blackburn Award, I first have to find it. It is mentioned on several Wikipedia articles but the award and Mr Blackburn are missing. Google helps me find the website of the award. With Mrs Cleaveland there are 13 recipients. The first thing to do is add the award, link the organisation that conferred it and the web address for the award.

When you then start looking for recipients, Reasonator immediately provides an updated view. No need to query for its recipients they are obvious. Just to show that you can, I added another fine lady; Mrs Karen Reed.

Sunday, July 29, 2018

#AfricaGap - #Nigeria, politics as a family business

A Wikipedia friend asked on Facebook for people to neutralise an all too advertisery article on a Nigerian Senator Ademola Adeleke. I have updated his Wikidata profile based on the information in the article.

The father of Mr Adeleke, Raji Ayoola Adeleke and his brother Isiaka both preceded him as a senator.

When you google Mr Adeleke, the first thing you find is his Wikipedia article, then you cannot miss that the University of Jacksonville denied that Mr Adeleke finished his education. Consequently it is a stretch to call him Dr Adeleke.

When Wikipedia and by inference Wikidata register the education of humans, it follows that such easy scams will be more prominently displayed and known. When an article makes plain that Mr Adeleke was born with a silver spoon, it will make it easy to question his ability to truly represent adequately.

Saturday, July 28, 2018

#Wikipedia, where is all that research?

Pine, a well known Wikipedian, asked attention for the registration of the 2018 "State of Wikimedia Research". Benjamin Mako Hill mentioned that a humongous amount of publications were published on Wikipedia in the last year alone.

That is great.

I checked the numbers using the Scholia tool and was a bit disappointed. The total numbers was "only" 337 for every year. Benjamin uses different tools; he mentioned his use of Google Scholar and indeed it shows so much more.

I was really pleased with Daniel Mietchen helping out on the subject of "probiotics" and I asked him if he could run his bot for the subject of "Wikipedia" and "Wikidata". But nevermind what he decides to do, running a bot adding key words to research is not scalable when you consider the overwhelming amount of research known to Wikidata. It is not only running it one time, it is also adding key words for any and all new research entered to Wikidata.

Given that we work in a Wiki way, this is totally acceptable. We do what we can, what takes our fancy and slowly but surely new approaches, tools improve on the quality and quantity of the data that we have. When Scholia was a commercial enterprise it would be different; the exposure and use of data would be a primary concern.

Friday, July 27, 2018

#Wikidata - I do not use query and here is why

When I edit Wikidata, I never use queries and here is why. I do not need them. For instance, I added an award to a person because it was obvious it was missing. I had no need for a query because everything that I wanted to know about the award was visible.

When you use query, you have to use a tool, define a query, run it, maybe tune it and then analyse the results. Using my beloved Reasonator, all the queries that I need are included. This is the same award and the same person but in the standard user interface of Wikidata. It is not informative, I only use it to edit.

A person wanting to teach Wikidata asked how do I structure a program? The first thing proposed was teach them to query. I agree that query is important, it has its use cases but it should not be the first introduction to Wikidata because it makes it too complicated at the start and even worse it is not necessary.

Thursday, July 26, 2018

#Wikidata - Pushback on probiotics with citations

Recently I added a paper to Wikidata. The paper indicated its subjects and mania, the immune system and probiotics are among its subjects. I dutifully added some of these subjects to the article and was surprised that a topic as controversial as probiotics did not relate to many many papers.When you check out #probiotics on Twitter, you will realise that a healthy mix of fact is much needed to counter the inundation of commercial offerings you will find.

I mentioned on Twitter my surprise that there was so little to find on Wikidata about this subject and Daniel Mietchen picked this up, had a bot run adding probiotic as a topic on Wikidata. The result is wonderful.

It is almost too good. We now run the risk not to see the forest for the trees.When you are looking for sources to cite, you want to narrow down on sources that were checked by Cochrane, you may want to find/dismiss the papers mentioned by Retraction Watch.

The best part; this is an embarrassment of riches. With bots running and updating topics mentioned on papers, we gain relevance to our collection of papers, authors are linked giving a clue who might be notable enough to get a Wikipedia article. As we gain more and more data with better links to indicators to the quality of papers, we gain terrain in the battle on false facts.

Sunday, July 22, 2018

#AfricaGap - Sean Jacobs at #Wikimania

To be blunt, what Mr Jacobs is talking about is one or more step removed of the Wikimedia reality. His story is important and indicates that a specific type of source exists and is available for study. Mr Jacobs informs on the importance of Twitter for the Zulu language.

Mr Jacobs is an academic and the reality of Zulu Wikipedia is that only a few days ago we celebrated article number 1000 for the Zulu Wikipedia. What the Zulu Wikipedia needs is high school students writing in Zulu. Writing about what is important to them, what is important to their curriculum and to their world.

Just consider what one high school could do. Now consider what ten high schools could do. Compare that with one academic or what all the Zulu students currently in university could do.

Yes, history has been written so far and it does report in a biased way. When the Zulu language is to gain a foothold in the Wikimedia world, we need many people being involved in writing first the most basic information. Once there is a basis, the sources Mr Jacobs mentions become relevant in a Zulu Wikipedia.

Saturday, July 21, 2018

#AfricaGap - A #Wikidata based watch list about a Africa reality II

When there are many Listeria lists that you follow, when you care about the development about the subject, it is wonderful to see so much activity related to Africa.  As more people care to work on African politicians or "administrative territorial entities", the Listeria lists that also exist on Wikipedias in African languages will be updated as well.

When the Listeria lists become part of the main body of a Wikipedia, the politicians and entities will be found. When the info boxes as presented at the Celtic knot conference follow, slowly but surely quality content in quantity about Africa will no longer be a mirage.

Wednesday, July 18, 2018

#AfricaGap - Guinea; standing on the shoulders of giants

This map comes courtesy of the UN to Commons. It was downloaded in 2007 by Jeroen, the language on the map is French and Wikidata has much of its data in English. The names in French are mostly the same but that is for someone else to consider.

Many of the articles on "administrative territorial entities" are written by a small group of people. I want to single out Shevon Silva, the user page expresses the amount of work that went into adding stubs for so many African territories. The important thing about data is; once it is there you can change it in any way necessary.

When data gets entered into Wikidata, certain Wikipedia things are not possible; a "human settlement" is not a "administrative territorial entity". Such conflations need to be undone in Wikidata. Obviously the human settlement is located only in that administrative territorial entity and others only by inference. Attributes like "inception date" and links to other human settlements that are part of a sub-perfecture are for someone else to add/get right. Another consideration are historic administrative territorial entities particularly those of historic countries.

At this time it is important to celebrate what we have, morph it into a format that can be used on any and all of our projects. Once it is available in all the Wikipedias, it will generate more and more links and this will put Africa on the map.

Saturday, July 14, 2018

#AfricaGap - Where Wikipedias collide

The German and the English Wikipedia collide on the "administrative territorial entities" of the Gambia. I was told to remove entries that I made to Wikidata because they were "Falschinformationen". The German article is much better written but the English article indicates that the German information is likely to be outdated.

A discrepancy like this is obviously best solved to insist on "your" solution. The point that I have been making quite often is that such differences are commonplace and require proper sourcing. The obvious source will not be found on a university website, it will be found in governmental information of the Gambia.

Making information about Africa available in Wikidata makes the errors, the inconsistencies and the lack of data in the Wikipedias more visible. This is not solved by considering your "own" data to be best, it is by proving that information is up to date. According to the English Wikipedia, the Upper River Division is no longer; it is largely replaced by the Basse Local Government Area.

My question: what does it take for the Wikipedias to take their inconsistencies serious?

#AfricaGap - Support for "minority" languages

Support for "minority" languages was the subject of the Celtic knot conference. I have watched some of the presentations and find that there is a lot more to supporting minority languages from a Wikidata point of view than just adding missing labels. A vital strength of any Wikipedia is found in its relations between articles and that subjects of interest may be found.

Minority languages are a misnomer, what we mean is that the Wikipedias are small. They have a lack of articles, stucture is missing and subjects of interest are not found. Subjects have the same relations in any language and consequently lists expressing these relations can be shared using Wikidata in any language including "minority" languages. Missing labels need not be an issue; this is expressed nicely in this list of subdivisions of Egypt; the labels for most of them are only available in the Arabic script. A nice invite for people to add labels in the Latin and any other script.

The Welsh Wikipedia makes use of "Listeria" list in its main space and as a consequence, all items in these list can be found. They are available in a context, associated information may be available and they link to articles in other languages. The Welsh Wikipedia did implement the "Article Placeholder" and in this way they provide even more information for the ffspecific subjects.

When you consider Africa and information about Africa, there is no Wikipedia that provides adequate information. The data is incomplete, unstructured and often out of date. It is easy enough to improve on the quality of the data in Wikidata and when the information is updated in many Listeria lists on many Wikipedias, the impact is great.

The lack of coverage of subjects about Africa is huge. Less than 1% of humans is from Africa, we do not have up to date information about "administrative territorial entities" like provinces and districts. In my AfricaGap project only a limited range of subjects get some attention at this time. Obviously there is more that could be done. African cinema is one subject that is of interest to a group of Wikimedians. When they write their articles it will eventually translate to Wikidata and information about movies, actors and directors may be shown in Listeria lists in all the African language Wikipedias. This may generate interest from an African public for our projects.

There is only one purpose for Wikidata, Wikipedia and it is to find a public, a use case for the data, the articles, the information we provide. The one challenge we face is in both the quantity and quality of our articles and data.

Thursday, July 12, 2018

#AfricaGap - Considerations on the "Article Placeholder"

Having listened to a Youtube presentation on Article Placeholder, I am seriously disappointed. There are a few statements in there that show a lack of understanding on the functionality of the Reasonator. It is dismissed for all the wrong reasons and as a result there are a lot of missed opportunities.

What is missed is that Reasonator, as it is, provides superior representation in any language. It is a tool that helps with missing labels from within the tool. Missing descriptions in Reasonator do not need to be a problem; there are automated functionality that has shown its merits in many languages. Do compare the representation of Wikidata data and the structured representation will be seen to be more rich with the inclusion of maps, images and data linked to the subject in question.

What is particularly galling is that Reasonator is dismissed because "it is an external tool". Before work on the Article Placeholder started, it would have been easy enough to adopt functionality as provided by this external tool and it would not have been an external tool, an obvious argument AFTER the fact.

Where Reasonator provides texts, it is done based on little scripts. This is seen as problematic as is seen as a drain on the community. Templates on the other hand may be a part of the Article Placeholder and they have the same problem.

For me the bottom line is not so much about the Article Placeholder but the lack of usability of Wikidata. It is only because of Reasonator that it is easy and obvious to work on the subjects I work on. I have not spend hours learning how to query, Reasonator provides me instantly with the results in any context like the missing "Districts of Djibouti".

Monday, July 09, 2018

#AfricaGap - the Subprefectures of the Central African Republic

Even the best query is impotent when the data is not there. There were no known subprefectures of the Central African Republic when I started looking for them.

Best practice has it that any "human settlement" is located in the lowest administrative territorial entity available. It follows that the city of Baoro  is in the Baoro subprefecture and, it in turn is in the Nana-Mambéré Prefecture. This is nominally a Wikipedia best practice and a Wikidata best practice.

When a Wikipedia article indicates a "human settlement" category for an subprefecture, we get it wrong in Wikidata. When we change this in Wikidata, it is still problematic when many articles consider the town and the administrative entity to be the same thing.. Then again, this is Africa and who notices?

When there are multiple items by the same name and one is about the city and the other is not, it is just a matter of making one a subprefecture. For the Central African Republic, this is rather straightforward and it just takes a lot of work to get some structure in the data. At the same time there are many articles in the wrong basket. That problem is for another day.

Fixing the data for the CAR is doable. It takes someone with infinite time on his hands to fix the administrative entities for Angola. Most of the data is wrong and entities by the same name and type often exist multiple times. The queries will show anyone brave enough to work on it.

Sunday, July 08, 2018

#AfricaGap - A #Wikidata based watch list about a Africa reality

Wikimania 2018 will be in Cape Town and a lot of words will be used to express the importance of adequate coverage of everything Africa. Words do not express the extend Africa is lacking in coverage. My estimate is that less than 1% of all humans known to Wikidata (ie all humans in all Wikipedias) is African. We cannot properly say where someone was born or died because we do not know all the places of Africa, we do not know its administrative divisions and we do not know its politicians. We have not properly structured the former countries and colonies of Africa.

We do not know really about Africa.

When one guy from the Netherlands can make a noticeable difference, it is obvious what two, three or one hundred people can do who care about Africa. In the Listeria list on several of my user pages, you find what are in effect watch lists about Africa. Every day I notice what changed about several aspects of Africa and regularly I add lists to it and become more aware how limited our coverage about Africa is.

It is 13 days to Wikimania and, you can make a difference by making a difference on the subjects I follow. You can add information, you can add even more Listeria lists. The biggest difference will be by relating all the loose ends and curating and refactoring what is wrong.

What is the point of a Wikimania in Africa when our coverage is at this level? Obviously, a call to make up for what we have not done so far.

Saturday, June 30, 2018

#AfricaGap - #Wikidata localisation is about location, location, location

Beitbridge is both a town, a ward and a district in Zimbabwe. Particularly for Wikidata they are distinct; the town is together with other human settlements part of the ward and it is with other wards part of the district.

In Wikidata it is best practice to associate buildings, monuments, bridges whatever with the lowest local authority.

It is obvious that when you cannot find the associated item for an authority, or associated structures on maps such associations will not happen. Human settlements in Lesotho for instance will not be found because at this time they only exist as "black links".. eg here for the community council of Likhutloaneng in Lesotho.

In order to find any and all of the African local authorities in a language, there has to be a label in that language. For us, in any and all of the Wikimedia projects we rely on our own labels, titles whatever. When we want to show them on a map, the best maps available will be more and more the OpenStreetMap maps. Thanks to a very important project just finished by Wikimedia developers, we can show localised labels. Our labels. To do this properly our and the OSM data needs to be linked on the object level. This provides us with map functionality in our 280+ languages and makes it obvious that the location for localisation is at Wikidata not OSM.

Thursday, June 21, 2018

#AfricaGap - The #notability of Chemba, the district and the eponymous ward

Chemba is a ward in the eponymous district of Tanzania. Chemba had 16047 inhabitants at the 2012 census. Lately a lot of additional information has been added to the existing articles (in Swahili or English) to Wikidata.

There are plenty of practical reasons why Chemba is notable. In 2010 I blogged about the "Geograph" project in the UK. Britain was divided by a raster in order to have representative pictures for the whole of the country. Obviously we could do the same for any and all countries in Africa. They do have digital cameras in Africa, maybe not everyone but that is not the point.

Africa is notable and, we want to close the gap in coverage of Africa. So we want to know about all the wards of Tanzania, not just Chemba. We need coordinates, maps and photos. There is census data for 2012 and all this, including maps and photos, can be shared in any language once the data is available in Wikidata.

The point of all this; make obvious what we do know and what we know is missing. In the end, the devil is in the details but Africa is a continent full of bright people who can make the difference.

Saturday, June 16, 2018

#AfricaGap - OkayAfrica's 100 Women

When you visit the OkayAfrica website, you are kindly invited to learn about the 100 women celebrated in 2018. Having a list of 100 fine ladies is nice but what makes them really inspirational are their stories. All of them have been added as references for your pleasure in Wikidata.

We do not have the same pleasure for the fine women celebrated in 2017. Most likely those stories existed once upon a time but as the Internet Archive only knows in 2018 the 2017 stories are lost at this time. We only know about them through a secondary source.

Kelly Foster added most of the 2017 ladies and we are now adding the remaining ladies to Wikidata. We would be very much obliged when the back stories as they were published in the past are added a reference.. Fine journalism deserves a place as a reference.

Thursday, June 14, 2018

#AfricaGap - A is for apple

When you watch this talk, you learn that teaching the alphabet with "A is for apple" is problematic in Africa. People do not eat apples, it is an exotic fruit, and it does not relate to the world of African children.

In my #AfricaGap project I aim to enrich information relevant to Africans. I started with African politicians and added a map of Africa with labels in the local language.

Thanks to Kelly Foster I added the 100 African women celebrated by OkayAfrica. This addresses the gender gap to some extend  and adds a healthy dose of women in the mix. Kelly is adding the 2017 women and I am not done yet with the 2018 women.

Obviously there are more politicians and, obviously the information about politicians is not complete. However, when people do add information about any of them it will update on Listeria lists on the English, the Zulu, Yoruba and Swahili Wikipedia. There must be other African awards as well.. Additional lists will happen when they do..

What the Ted talk taught me is that African food is different and, there is a point in highlighting these differences. There are categories specific for the national cuisine on the English Wikipedia. So when I am done with the OkayAfrica women African cuisine is next.

PS I am happy when people suggest other subjects particularly relevant to Africa. Collaborating on exposing them using Listeria lists and maybe info boxes is what I can achieve.

Sunday, June 10, 2018

#AfricaGap - Recent changes in Zulu and Yoruba #Wikipedia

Listeria list are now updated on a daily basis on the Yoruba and Zulu Wikipedia. That is really fortunate because one person active on the Zulu Wikipedia is making changes in Wikidata and it now results in local changes. A next step could be info boxes to be used for the African politicians known in the existing lists.

When African Wikipedias adopt lists like these, other subjects may become popular. Anything popular could do, soccer for instance. With information maintained in Wikidata, all Wikipedias could benefit. At some stage the perception that lists maintained as text in any Wikipedia is behind the curve. That moment will arrive for any Wikipedia and when it does, information becomes more timely and complete.

Friday, June 08, 2018

#Wikipedia - where anyone can edit?

It used to be that anyone could edit Wikipedia. In theory this is still the case HOWEVER with university students increasingly refining their skills on Wikipedia, with scholars featuring as "Wikipedians Fellow" for particular subjects, the quality bar has been raised making the environment increasingly hostile for non scholars.

These scholars are active mainly on the English Wikipedia but the effects are felt everywhere; as it is perceived as the standard to aspire for. This is hugely problematic. Compare for instance these articles about Cyril Ramaphosa; the English and the Zulu article. There is little purpose in comparing them. There are hardly any editors on the Zulu Wikipedia and almost every subject is missing; so what to do what to write? When you then apply the scholarly standards of the English Wikipedia, it is akin to insisting on Nupedia standards and, as an aside how much different are the Nupedia standards when they are compared with current English Wikipedia standards?

What a Wikipedia like the Zulu Wikipedia needs is scaffolding for the information it wants to supply. Mr Ramaphosa is the current president of South Africa and a list like this would serve much better than the existing red link. With an English Wikipedia background you would not consider this because almost all basic information already exists.

In Wikimedia publications on scholarly editing almost always English Wikipedia is the platform. Arguably the kind of articles written by scholars would not even fit. They would exist in a vacuum. Arguably the kind of articles currently written by scholars have little context in a Zulu Wikipedia because the subjects have little bearing on what is relevant in an African context.

Given the state of a Wiki like the Zulu Wikipedia, we do not need scholars. We need high school students who write many basis articles. We need many high schools with all their students writing articles. If anything for the majority of Wikipedias we do not need scholars.

It would serve the Wikimedia Foundation well and consider less scholarly options.

Tuesday, June 05, 2018

#AfricaGap - Lucien Xavier Michel-Andrianarahinjaka a politician from Madagascar

Lucien Xavier Michel-Andrianarahinjaka is a politician from Madagascar with two Wikidata items that have now been merged. People at the Zulu Wikipedia expressed an interest in information on all African politicians and as a consequence they want to have all information available in Zulu.

Reasonator is a tool that makes this easy. As you can see in the screen shot, there is a lot to do; the label for "male" for instance is not even known yet. When someone who knows Zulu sees the information in this way, it is extremely easy to add a label in Zulu.

When you consider names of people, typically the spelling of a name is the same when seen in the same script. When this is true for Zulu or Xhosa or .. it is easy enough to have a bot add the missing labels in Zulu.

When labels are available, it becomes practical to have info boxes particularly when the info in these info-boxes is updated from Wikidata. In this way it becomes possible to provide information in Zulu, information that will be hard to find in Zulu elsewhere on the Internet.

Monday, June 04, 2018

#AfricaGap - Requested lists from the Zulu #Wikipedia

After the introduction of Listeria lists for African politicians on the Zulu Wikipedia, the question was raised how complete lists could be made available for the localisation of names and the positions held by these politicians.

A good question. The answer is problematic. These Listeria list are the best we have available. Typically they are incomplete and it takes effort to make them complete. I completed the list of Foreign Ministers of Madagascar based on information from the English Wikipedia. To achieve this I added eight people to Wikidata and I merged one.

The cool thing is that a similar Listeria lists will be updated daily with new and updated information. The same list could be used on 280+ Wikipedias and as you can see, there are many, many positions held by national politicians in Africa. For all of them you want more specific information and obviously, that is even more work that coule benefit 280+ Wikipedias.

Friday, June 01, 2018

#AfricaGap - maps in Yoruba, Swahili, Hausa and Zulu

As no percent of Wikipedia is about Africans, it is great to learn how "map internationalisation" works on African language Wikipedias. The original example has Africa in Russian and with two tweaks, this same map shows really well in Yoruba, Swahili, Hausa or Zulu.

My objective is to bring focus to African content. I am still adding African politicians and this new functionality is too good to miss. It works really well, the documentation is there and it becomes a matter of localising the information for it to provide the same information properly in the languages of Africa.

While adding such information it becomes painfully obvious how much is still lacking. It would be cool to have information in long lists like this in columns. It takes more expertise than I have to make this happen. I have not figured out how to get Listeria to update these lists either.. At some stage I will or probably sooner, someone else will help me out.

Monday, May 28, 2018

#AfricaGap - Doing a bit more when time permits

In a previous blog post I indicated that I would not include ministers. Well, I changed my mind because of feedback that I received. The list of African politicians on the English Wikipedia can be found here. There is now a list on the Yoruba Wikipedia that I update regularly with new enties. I am also considering adding information on the Swahili Wikipedia.

The list is far from complete and there are several challenges. The first one is that what I copy across is in English. When someone volunteers, it is easy to change fixed text. The second one, the Listeria bot is to include new Wikipedias and update there as well.

Listeria is run by Magnus and the functionality he provides is important. It would be better when Listeria lists would be standard functionality. When it starts with the functionality as provided by Listeria it would already be good because it would acknowledge the need for this functionality. Improvements are welcome but are secondary.

When I have time I add additional lists, add new "positions" and I do monitor changes like the people involved, their nationality and the dates they started and ended their position. I may tweet occasionally using #AfricaGap.

At this time no percent of humans known to Wikidata is from Africa..

Sunday, May 27, 2018

#Wikidata - No #copyright on common knowledge

The approach on copyright for text and data is imho utterly different. For your text you seek a reputable source, you cite it. All in all a lot of work.

A proper approach to data is that you seek confirmation on what you already know and it is encouraging when there are many sources that agree on what is common to all of them. When you add new data, typically most of what you care for you will find through links from existing shared data probably in multiple sources. This is not done by hand, too much work, it is done by bot and consequently there is not even some "sweat on the brow".

Arguably, common data exists as common knowledge. It is not proprietary to any one of these sources and consequently claiming copyright let alone a license is at least problematic.

When data is specific to one data source, it is inherently problematic. It may be wrong, particularly when it differs from what other sources state. It follows that there is a need for care before this data is used. You then get into a manual process of reconciling and curating the data, you may even decide to diverge from what all the others say. The confirmation and the creation of new data both is actually research. It is not the using of data from the other source. In my mind this means that there is no burden of copyright applicable.

When data is considered from Wikipedias for Wikidata, the same considerations apply. When you think about it, it is quite bizarre; you take expressions in words and convert it into a qualifier that represent said words, words that can be in any language. Words that may not even be what you see on your screen. The processing of texts may be automated and, it is easy to understand that from the input of all the Wikipedias alone a superset of data is created that is more than any one article. The notion that copyright can be legitimately claimed is problematic at best.

When you take all this on board, and the fact that individual facts cannot be copyrighted, it is obvious to me that the choice for a CC-0 licence for Wikidata is fortunate. A license implies copyright but it is given away with this licence. The claim of the copyright is at best a defensive strategy.

Thursday, May 24, 2018

#BRAVEedit - Lessons learned at the Amsterdam editathon

The Amsterdam editathon for female civil rights activists was a success. This Listeria list shows it well. All the articles not cursive now have an article in Dutch. It may be that some Dutch articles were written in Brussels. The request to those who joined was to write or translate articles in English or Dutch. This is the same Listeria list for English..

Editathons happened in twenty countries concurrently. Organising such an event is a mammoth task, it is easy for things to go wrong, not to be clear. With 20 countries many languages are involved producing sufficient information for all of them is not easy. The question is how to do this optimally.

Information is to be available for all the languages a particular person or organisation is targetted for. Care needs to be taken that people are uniquely identified; this to prevent the creation of duplicate articles (this happened for Fartuun Abdisalaan Adan aka Fartuun Adan). Now you can make lists in a spreadsheet, you can write texts with the sources but the challenge is to maintain this for twenty languages!

Enter a Wikidata property: "on focus list of Wikimedia project". As you research a person that is to be on the focus list for the "Amnesty International Editathon", you either append it to an existing item or a new item. You can add the location as a "qualifier", this has been done for Amsterdam enabling a list specific for this editathon.

Articles that are to be written will be in italic on a Wikipedia and, this helps prevent duplicate articles. Sources can be added as qualifiers as well and with a small change they show in a list. Images and all kinds of other information can be added and this shows really well in Reasonator.. The family of Fartuun Adan for instance was already known and, she did win the International Women of Courage Award in 2013.

So the editathons happened and, for all the people and organisations the "on focus list" property can still be added. This will make it easier to analyse the impact of all the work; queries can be made to learn how many articles exist and, it makes it easy to learn how many of these articles are actually read.

One of the things these list also show is that for many people we do not have an illustration. I think that when Amnesty makes images available for all of them, the articles become more attractive and are likely to gain more attention.

For me the Amsterdam was wonderful. I enjoyed seeing people grapple with the idiosyncrasies of Wikipedia editing. I had it confirmed again that I am not a Wikipedian.. I do not edit really but being there had me experiment and think about organising multi lingual projects and find confirmation for my understanding how this could be done efficiently.

PS I do think that the people at the English chapter and the people at Amnesty did a great job.

Monday, May 14, 2018

#AfricaGap - #Wikidata; its quality as Wikidata matures

Currently there are 45 countries that I monitor for their national politicians. When I add a specific national "position", I do several things; I add existing politicians that are known in a particular category and I include a definition of what that category contains.

I give hardly any attention to details; my objective here is simple I want to see how this (underdeveloped) data evolves. There is a huge gap in what we know about Africa and as it is, we hardly inform about Africa, we need Africans to help us gain the most basic facts straight for ourselves.

As Wikidata matures, we gain subsets of data that is of varying quality. The most mature living data are our interwiki links. It is live data and it serves a purpose. Changes require attention to detail it has an immediate effect in the discoverability of information. When data comes alive, when it serves a purpose, it has people who will invest their time to get the data right. They will give attention to detail because that serves their purpose.

For arcane subjects like the Ottoman Empire, even Africa, there are few people who find a purpose in the data. Arguably there is so little data that almost everything added is a 100% gain in quality (a person exists, he is a member of parliament of ***, I do not understand African names so it could be male or female I do not know). Sometimes there are whole lists of people like these people from the Bosnian Eyalet, it is easy enough to complete such a list. But will it serve a purpose? How to give it a purpose?

There is no uniform quality to Wikidata. There are whole areas where we are 100% of the mark as we do not have the data nor the ability to link to data elsewhere. There are other areas like in biomedical literature where our quality is such that it is actually useful. As this becomes known thanks to its evangelists, more attention is given by a wider public and more attention to detail is given in the process.

Arguably the quality of subsets of our data depends on its usefulness. When it is useful, people will come and give the attention to detail as it serves their purpose.

Saturday, May 12, 2018

#Wikidata - #Copyright and linked data

There are many points of view when it comes to copyright and data. In the Wikipedia world the discussion is different because each text has its own copyright. Data is different because you can not own ie copyright a separate fact.

When data is open or opened up, it follows that much of the data that exist in multiple sources is identical. When the data is the same, it has two benefits. The first is quality. When multiple sources agree on something, it is more likely to be correct. The second is copyright; whose copyright?

Every now and again, the license used by Wikidata is questioned. Typically by Wikipedians who think they know their stuff. They will be the first to tell you the importance of sources and, indeed many factoids in Wikidata do not have a source. When a factoid is sourced, a statement like John Doe died on Friday, 13th, that factoid only links to the source and hardly to the place where it came to the attention of the person or the bot adding it to Wikidata.

When I add the fact that someone is a member of the Somalian parliament, when a list is used like this one, that information is sourced, there is no added value except for a name being on a list. It has been in the news that in the last year parliamentarians have been murdered, there is no article for them and consequently even in Wikipedia it is only a name on a list, no added value, no arguable reason for copyright.

Value is in the links, it is in knowing the same data to be true in many sources. Claiming copyright, particularly in data, is predatory. It prevents people from bringing facts together. Only when facts are brought together informed knowledge exists. Only in linked data, sourced data, there is a handle on fake facts and fake news.

Thursday, May 10, 2018

#Wikimedia - What I am willing to do for the #AfricaGap

Africa hardly gets attention in Wikimedia projects. When the one project that brings together, Wikidata, does not know the people who are or used to be president of an African country, this is obvious. There is no reasonable argument to counter this.

What I can do is "watch the gap". To do this I have a growing list of African National politicians. The list is not complete, I am still adding countries. I do not add ministers and I have not included "first wives", this to reach out to people who care about that other gap, a gap that is no longer as wide.

When people add data about politicians, it will update Listeria lists. There are many of them and they will show up on my watch list. It means that I can tweet about changes as they occur.

To be perfectly honest; I expect it to be like in a railstation; typically you wait for the trains and are watching a chasm and not a divide.

Friday, May 04, 2018

#Wikimedia - Introducing the #AfricaGap

Minding the gaps is  important in all our projects. The #GenderGap program is an excellent project that shows the important and impressive results possible when we make a deliberate effort.

One area where we are weak is in our coverage of everything Africa. One area where we are particularly weak is in providing support for our readers and editors in Africa.

There are many things that can be done to improve upon the current situation and I am grateful to the people who have worked so hard to get us where we are.

To mind a gap, it starts with awareness. My "Africa" page provides some insight in the politicians of African countries. Obviously most politicians are missing and as my page links to Listeria list, every time a new African politician becomes known in Wikidata, it will show up on my watch list.

I intent to include all African countries and their national politicians. I will remain committed to bring more information about Turkey and its history, this project will show through the daily Listeria updates the extend of our African efforts. It would be cool when 1% of the humans we know is from Africa.

Tuesday, April 17, 2018

#Wikimedia - please mind the Africa data gap

A friend attended a Wikimedia conference in Africa. He asked me for the number of people known to be from Mozambique. A question like this is really relevant, I asked for a query and I am happy there is a result however, only 319 people known from Mozambique in Wikidata (that is all Wikipedias together) is a really low number. It is not an exception, countries like Rwanda or Niger, Malawi or Gabon do not fare better.

When you consider that there are more people known to be from Andorra (339) it is obvious that there is a real issue with how we cover "the rest of the world".

Sunday, April 15, 2018

#WeMissTurkey - six times #Listeria for best results

Reading about the history of the Ottoman Empire is a different experience on every Wikipedia. Typically most of the "humans" involved do not have an article and the spelling of the names differs. In English one title is pasha, in Catalan it is paixà and in Turkish and Bosnian it is paša. In all these languages it is part of the name of many dignitaries.

There are often multiple items for the same person thanks to these differences in spelling. Disambiguating the information together comparing the articles. The English, Catalan, Turkish, Bosnian, Greek and Arabic Wikipedia have their reasons to have an interest in the Ottoman Empire. For this reason I copied the Listeria list from my English user page to the ca.wp, tr.wp, bs.wp, el.wp and ar.wp.

When labels in a particular language are changed or when data is added, the Listeria list will update on a daily basis. So as we work on Ottoman information, the best effect will now show six times on my user page.

Wednesday, April 11, 2018

#WeMissGerardM - Got banned from #Wikidata

A friend of mine got into problems on #Wikidata.

The story:
She had proposed a property that was talked into something else that would not work for her or me. There was no obvious consensus particularly because of a lack of agreement how the property would function. Someone in power stated consensus and created the property and my friend proposed for its deletion.

What followed was awful because it shows how we interact. When someone says:  "This is a disruptive and bad-faith nomination" in my book this is aggressive and a personal attack. When someone else follows up with a request for a ban because of something that happens elsewhere it becomes a mob howling for blood. The arguments used were personal, had no relation to what happened at Wikidata and I objected strongly.

What else:
I objected to the language used, to the fact that you do not attack someone this personally. The language used is in my opinion not critical but overly aggressive even brutal. I object to how we behave, the language used, the personal attacks. It is not the first time that I objected to the way how we treat each other. Given that a friend was victimised this time I did not back down. Now I am banned from editing Wikidata for a week even though the admin who banned me agrees that I did not do anything that is a "banning offence". What I did was not let others "get away with murder" and not agree that the common good gets damaged by me in this way.

What next:
I do not know. I will miss working on the Ottoman Empire, I will miss working on awards, I will miss working on the BHL. But I will miss my friend most and I am sad that we Wikidatans treat each other in such an adversarial way.

Tuesday, April 10, 2018

#WeMissTurkey - #Bosnia and its Otoman governors

We miss Turkey as readers and contributors of our project. In the larger picture, when we write about the history of the world, we have to pay attention to the Ottoman Empire. Its history ended only 96 years ago and its influence is underestimated.

Take Bosnia; many people from the eyalet of Bosnia have been really influential and many people were beylerbey of the Bosnia eyalet. Adding all these people can be done from a list on the Bosnian Wikipedia. The list is in Bosnian and paša is pasha in translation and that makes it a lot of work.. In Catalan, the same word is paixà and Catalan is a language that covers the history of Africa rather well. Many people known in Bosnian are known in Catalan and not in English.

The dignitaries of the Ottoman Empire often stayed for a short period before they moved on to another place. Getting the curriculum vitae for the top will help understand history. In true Wiki fashion we have a start and it will improve when we collaborate.

Thursday, March 29, 2018

Dear #Vogelnieuws - more birds for the #Oostvaardersplassen

In an editorial the Dutch birding organisation is pleading for a more bird friendly Oostvaardersplassen. They specifically want room for the Eurasian Spoonbill and Eurasian Bittern. The reason why they are in decline are carp, mature carp. They eat anything including small fish and, they can live to be some eighty years old. There is no real predation on the carp.

There is only one predator that will have a significant impact when it finds its way into the Oostvaardersplassen; it is the Wels Catfish. A current proposal is to drain the Oostvaardersplassen regularly consequently most water animals will die and repeats will be "needed". When the Wels Catfish is introduced after a drainage, they will eat the maturing carp and ensure a more healthy age distribution in carp and make room for birds and other fish as well.

In this way we come to rely on ecological processes and realise a more complete (age) distribution of species. In the editorial they call for ecological connections so that animals can migrate for optimal results. Introducing such connections for fish as well would be of a huge benefit for the birds that predate on smaller fish. One of the fish that would find its way into the Oostvaardersplassen is the Three-Spined Stickleback. It serves birds like the European Spoonbill really well.

One other thing to consider; the Oostvaardersplassen was developed to support the goose, a bird that was rare at the start of the Oostvaardersplassen. When we make the ecosystem more complete, we will end up with a different environment, an environment that will be more diverse with more species finding their niche.

Wednesday, March 28, 2018

#WeMissTurkey - Beylerbeys of the Bosnian Elayet .. in Bosnian

The Ottoman Empire was huge. It existed for 623 years and its armies threatened Vienna at one time. When you want to understand the existence of countries and the politics of the modern time, it helps to have a sense of the past.

The Wikimedia coverage of the Ottoman Empire is patchy. We do not have all information readily available about its geography and administrations over time.

For me, adding information is easiest when the source is the English Wikipedia but other Wikipedias are often more complete. The Ottoman Empire was for a long time divided in Eyalets and they were governed by Beylerbeys. The list of Beylerbeys for the Bosnian Elayet is linked to many articles on the Bosnian Wikipedia. Articles about people that did not include even the most basic information in Wikidata. Adding missing information was easy but labels differ from English; they show in red in Reasonator.

When all red links are linked to Wikidata, it would be easy and obvious what English labels to add. It requires just one thing; acknowledgement that list in different language Wikipedias provide the same information.

Sunday, March 25, 2018

#WeMissTurkey - Lutfi Pasha, a Grand Vizier of the Ottoman Empire

In October 2013 I wrote about Lutfi Pasha. I wondered if he as an author of several books would have an VIAF registration (he does) and if the information from Wikidata would end up at the VIAF registration (it does). It shows in the "personal information" that the German National Library and ISNI still call him Turkish where Wikidata knows him to be from the Ottoman Empire. At this time VIAF links to sixteen Wikipedias for more information. :)

As a young boy Lutfi Pasha was taken from his parents and under the Devshirme system brought to the palace where he was converted to Islam and given a thorough education. Lutfi Pasha had a distinguished career; he even married the sister of the later Sultan Suleiman the Magnificent. His downfall? he beat his wife and was banished.

Saturday, March 24, 2018

#WeMissTurkey - The Shihab and Ma'an families

The Shihab family succeeded the Ma'an family because of a marriage. That and because the male line of succession came to an end. There is no complete name for Mrs Shihab-Ma'an but it is what is needed to link two families and to link the succession of power.

When you consider history, it is often told through the conflicts and the succession of office holders (Fashr-al_Din II and three of his sons were execured).. It is not only what shaped history, the relations through marriage prevented many conflicts and allowed for cultural development in times of peace.

Mrs Shihab-Ma'an was married to Haydar Shihab and her son Mulhim became the next Emir. My big question: does anyone have a name for her? She must be notable by linking two families.

Tuesday, March 20, 2018

#WeMissTurkey - the geography and organisation of the #Ottoman Empire

This map shows the development of the Ottoman Empire over time. Its accuracy may be disputed but it is among the best Wikimedia has to offer at this time.

This animated gif is really good at what it does. With all the basic parts available, it becomes possible to expand on these maps. The Ottoman empire was divided in "eyalets" and these were divided in "sanjaks". The size and the composition of these eyalets changed over time. An animation of these changes helps understand developments in for instance the Balkan.

At this time sanjaks are added to Wikidata and, this proves to be not that straight forward. Most of them do not have an article in any language. The spelling of the same sanjak differs in places and for some eyalets a modern interpretation is sought in order to provide some "legitimacy" of later developments; in one instance even the mentioning of the composite sanjaks is deliberately missing.

The governance of the Ottoman Empire was obviously along the line of these eyalats and sanjaks. For the eyalets there were "beylerbeys" and for the sanjaks "sanjak-beys". These offices were largely non-hereditary and during one time the composition of them was for quite some time by people originating from the Balkan.

When you consider the administrative organisation of the Ottoman Empire, there is a list of all the Sultans and their Grand Viziers. For the successions of other important functionaries there is still a lot that can be done.

When you are willing to help; please. Adding labels in other languages particularly Turkish will make a real difference. Adding missing humans in Wikidata and link them into a succession of functionaries will help a lot. It enables the provision of lists and they may be used in any language. When you are able to hack maps.. That would be really important; it is how all this information may come together.

Thursday, March 15, 2018

#Wikipedia - throwing the baby out with the bath water

Dear Asaf; there are no pet peeves. There is only my wish for us to be the best we can.

When YouTube is to use Wikipedia to give a background to its offerings, there will be a lot where Wikipedia falls short. We do not offer information on May Ying Welsh for instance. We do not know about the Pardes Humanitarian Prize and, do we report on the current Dalit protests in Maharashtra?

It is not a peeve when I notice how many errors can be found in Wikipedia, particularly in lists, and people do not concentrate on the differences of what Wikipedia knows and what is known elsewhere. This is particularly sad because time invested curating these differences is well spend and it is imho the most effective defence against fake news and fake facts.

When my question is "will YouTube use more than just English", you know as well as I do that English Wikipedia is less than 50% of what our audience read. When the other half does not deserve consideration, it is more than a peeve. It is in these other languages where the danger of fake news is even worse.

Basic facts on any NPOV article are the same in any language.  When they differ, they are where you can expect misinformation. With curated basis information available, it is possible to use natural language technology to provide at least some basic information. You have expressed that this is not something for the Wikimedia Foundation to be interested in (Cebuano remember?).

Asaf; you may hold the keys to what I post on the Wikimedia mailing list and you may privately consider me problematic. However, it is your excess in public ridicule and lack of arguments that is a disservice to what we aim to achieve; it is why we face of. In this you represent an attitude that will not see us provide the best we can offer in a changing landscape where we now have an opportunity to become relevant in debunking the worst of what YouTube has to offer.