Sunday, October 22, 2017

#Katherine - When the Cebuano issue is no longer about #Wikipedia

Dear Katherine, I loved your presentation at the Berkman Klein Center for Internet and Society. It has much to think about and <grin> it is great that you answer the question you want to answer </grin>.

You address questions like "will we let external organisations use our data for their own purposes". My suggestion to you, us all, is why not use our own data for our own purposes.

The Cebuano Wikipedia is seen as problematic on many levels. It is one of the biggest Wikipedias in number of articles and one of the smallest in the size of its community. Like any Wikipedia, its articles are harvested for use in Wikidata and that brings us to several problems but more importantly in the light of your presentation, opportunities.

Problem: the data used the Cebuano articles are based is problematic
Opportunity: import the data in Wikidata first and first do some curation there.

Problem: the data is licensed under a CC-by-sa license and Wikidata is CC-0
Opportunity: collaborate with the copyright holder and ask their permission to include the data in Wikidata

Problem: when text is generated by a bot, the text when saved in an article is fixed
Opportunity: do not save it as an article but generate the text and maybe cache the text

Problem: other organisations use our data to generate information
Opportunity: we generate information in all the 300 languages where Wikipedia does not have an article

Problem: we have information that has no article in any language
Opportunity: we generate the text and maybe cache the text

Problem: Wikimedia officials indicated that issues like the Cebuano Wikipedia are not relevant
No opportunity; opportunities for all our projects are missed

Katherine, we already generate texts using bots, we already cache our data, we do it for English, we do it for Swedish, Cebuano. Why leave it for the companies of our world to generate text where there is already so much? We can do better, do the same and do it for all our languages as well.
Thanks,
      GerardM

Saturday, October 21, 2017

#Wikidata - just an award winner: Mr Shuming Nie


Mr Shuming Nie is the 2007 winner of the Heinrich Emanuel Merck Prize. As such he was notable for inclusion in Wikidata.

A Wikipedia stub article was created. The article makes it plain that Mr Nie was a serial awardee and when you google Mr Nie, you find for instance the picture you see above. Mr Nie is one of many award winners that are "waiting for the recognition" of a Wikipedia article. By having these award winners in Wikidata, it becomes more easy to find people like someone you care for waiting for an article.
Thanks,
     GerardM

Sunday, October 15, 2017

#Wikidata - motivation; thank you #Magnus

I added a Baratunde A. Cola to Wikidata because he won the Alan T. Waterman Award. This month a Wikipedia article was written and I wanted to add some data to the item.

I did not because functionality that is key to me was broken. A new property was added and all the work that I had done on categories no longer showed in Reasonator. There was no willingness to consider the consequential loss of functionality and the result was a dip in my motivation.

Wikidata is important to me and I asked Magnus if he would help out and change Reasonator. He did.

Now I have added information to Mr Cola based on his categories. It matters that a category like this one reflects all the people known to have played in the Vanderbilt Commodores football team.

The issue is that at Wikidata, we have lost sight of these collaborative aspects. Everybody does his own thing and we hardly consider why. It is why user stories are so important; they tell you why something is done and what the benefit is.  In the end without a benefit there is no reason to do it.
Thanks,
      GerardM

Thursday, October 12, 2017

#Wikisource - the proof of the pudding

A user story for Wikisource could be: As Wikisourcerers we transcribe and format books so that our public may read these books electronically.

The proof of the pudding is therefore in the people who actually read the finished books.  To future proof the effort of the Wikisourcerers, it is vital to know all the books that are ready for reading. It is vital to know this for books in any and all languages supported.

There are two issues:
  • The status of the books is not sufficiently maintained in all the Wikisources
  • There is no tool that advertises finished books
To come to a solution, existing information could be maintained in Wikidata for all Wikisources in a similar way as done for badges. With the information in Wikidata a queries can be formulated that shows the books in whatever language, by whatever author.

Currently there are Wikisources that do not register this information at all. This does not prevent us from making the necessary steps towards a queriable solution. After all adding missing badges at a later date only adds to the size of the pudding, not to the proof of the pudding.
Thanks,
     GerardM

Tuesday, October 10, 2017

#Wikipedia discovers #OpenLibrary

On Facebook, Dumisani Ndubane posted his discovery of Open Library:
I just discovered that The Internet Archive has a book loan system, which gives me access up to 5 books for 14 days. So I have a library on my laptop!!! This is awesomest!!!
And it is. Anybody can borrow books from the Open Library (is is part of the Internet Archive). What Dumisani did not know at the time is that there are books in other languages to be found as well.

Dumisani found out by accident; he googled for an ebook called "Heart of darkness" by Joseph Conrad. What Dumisani did not know at the time is that the Open Library includes books in many languages. His next challenge: find the books in Xitsonga, and tell his fellow Wikipedians about it.
Thanks,
      GerardM

Wednesday, October 04, 2017

#Wikimedia - A user story for libraries

The primary user story for libraries is something like: As a library we maintain a collection of publications so that the public may read them in the library or at home .

Whatever else is done, it is to serve this primary purpose. In the English Wikipedia you will find at the bottom for many authors a reference to WorldCat. WorldCat is to entice people to come to their library.

It does not work for me.

My library is in Almere and, I have stated in my profile in WorldCat that I live in Almere, I have indicated that my local library is my favourite. WorldCat indicates that the Peace Palace Library is nearby.. It isn't.

When it does not work for me, it does not work for other people reading Wikipedia articles and consequently it needs to be fixed. So what does it take to fix WorldCat for the Netherlands; for me. WorldCat is used for a wordwide public and all the libraries of the world may benefit when WorldCat gets some TLC.
Thanks,
     GerardM

Monday, October 02, 2017

#Wikipedia - A user story for WikipediaXL: an end to the Cebuano issue

The user story for #Wikimedia is something like: As a Wikimedia community we share the sum of all knowledge so that all people have this available to them. 

As an achievable objective it sucks. The sum of all knowledge is not available to us either. To reflect this, the following is more realistic: As a Wikimedia community we share the sum of all knowledge available to us so that all people have this available to them.

When all people are to be served with the sum of all knowledge that is available to us, it is obvious that what we do serve depends very much on the language people are seeking knowledge in. What we offer is whatever a Wikipedia holds and this is often not nearly enough.

To counter the lack of information, bots add articles on subjects like "all the lakes in Finland". This information is not really helpful for people living in the Philipines but it does add to the sum of available information in Cebuano.

The process is as follows: an external database is selected. A script is created to build text and an infobox for each item in the database. This text is saved as an article in the Wikipedia. From the article information is harvested and it is included in Wikidata. One issue is that when the data is not "good enough", subsequent changes in Wikidata are not reflected in the Wikipedia article.

Turning the process around makes a key difference. An external database is selected. Selected data is merged into Wikidata. This data is used to generate only new article texts that are cached in all languages that have an applicable script. As the quality of the data in Wikidata improves, the cached articles improve.

With Wikipedia extended in this way, WikipediaXL, we become more adept at sharing the sum of our available knowledge. With caching enabled in this way, any language may benefit from all the data in Wikidata. It is considered important to consider the quality of new data. Data may come from a reputable source or from a source we collaborate with on the maintenance of the data. What is to be preferred is for another blogpost.

Saturday, September 30, 2017

#Wikipedia - #Wikidata user stories

User stories are important. They indicate why a certain functionality exists or the purpose of a project. A "user story" has a fixed format:
As a <insert a role> I would like to <insert an activitiy> so that I <insert a purpose>.
One user story is: As a Wikipedia editor, I can link an article to articles in other language(s) so that a Wikipedia reader can find an article in a language he or she can read.

Another user story:  As a Wikidata editor, I can maintain statements on Wikidata items so that Wikipedia readers always have the latest information available to them.

The first user story has been a resounding success. It is why Wikidata was relevant from the start. The second is very much a work in process and it depends very much how the current state of affairs is evaluated. There are dependencies for the efforts of so many to have an effect;
  • Readers of a Wikipedia can only see the result when the information has been included in Wikidata
  • Wikipedia readers will only see the result when the editors of their Wikipedia allow them to see it
The first dependency is with Wikidata editors but the second dependency is outside of the influence of Wikidata editors. For this reason it makes sense to formulate a different user story: As a Wikidata editor I can maintain statements on Wikidata items so that Wikipedia editors can take the responsibility to inform their public.

To help these Wikipedia gatekeepers there is a need for tools that makes them aware of the information they do not provide.
Thanks,
      GerardM

Sunday, September 17, 2017

#Wikimedia and its #BLP approach


There is a huge controversy about the policies about the "Biographies of Living People". Central in all this is that there is no such policy at Wikidata. Many seasoned Wikipedians are of the opinion that using data in Wikipedia is a violation of its BLP policy as a consequence. At the same time there are seasoned Wikidatans who oppose a BLP policy similar to the one at Wikipedia. The problem is that Wikidata does need a BLP policy but it needs to be different for various reasons.

  • An item in Wikidata can be really rudimentary; Marian Latour, a Dutch author, was created because she won an award. This is allowed in Wikidata but the limited information is probably a violation of the English BLP policy. This information came from the Dutch Wikipedia
  • The initial data of Wikidata were the interwiki links. This was a huge improvement for the Wikipedias and there are still many items that have no statements. This is used as an argument not to accept information from Wikidata.
  • Wikidata data is retrieved from a Wikipedia, information like "who won an award". Given the BLP policy of that Wikipedia is should be faultless but it often is not due to disambiguation issues. 
The first issue refers to a red link on the Dutch Wikipedia. When the red link is associated with the Wikidata item, there will not be a new disambiguation issue when a different Marian Latour is introduced. Currently there is only one Marian Latour known to Wikidata.
The second issue is one where Wikidata statistics indicate that slowly but surely is adding statements. They also prove that there is still so much to do...
The third issue is the main one. When an article is linked to Wikidata, articles in other languages should link to the same item or to a red link. Solving these issues requires coexistence and preferably collaboration. 

What we need in a Wikipedia is the ability to link a blue or red link to a Wikidata item. Obviously changing links is either blatantly obvious like for Manuel Echeverria or it requires a source. Technically the necessary change in the MediaWiki software may be "opt in" so that only people who care about this approach to quality make use of it. 

As far as I am concerned, when some Wikipedians find fault elsewhere and do not reflect on this proposal and the improvements it brings them, that is fine. What is relevant is that this approach allows for the best Wikidata practices and at the same time improves the BLP quality in all Wikimedia projects.
Thanks,
       GerardM

Saturday, September 09, 2017

The Manuel Echeverría "revenge"

When there are mistakes in a Wikipedia, it follows that once information is copied from that Wikipedia these mistakes find their way into Wikidata. So Manuel Echeverria did not receive the Xavier Villaurrutia AwardManuel Echeverría did.

So the edit that made Mr Echeverria a recipient of the award was reverted. I fixed things by using the Spanish Wikipedia as a resource instead. The dates were added when people received the award and a few missing people in Wikidata are now known as well.

I cannot be bothered to fix the English Wikipedia. There is no structural solution at this time and as far as I am concerned, there is no interest in one that has been proposed.

There is one additional reason why a solution would be advantageous; reverting edits is a hostile act when edits are made with the best intentions. By actively linking red links and black links to Wikidata, such reversions will become unnecessary.

The problem is that Wikipedians need to understand a problem that as far as they are concerned is elsewhere, and is only caused by the lack of quality of their project. It is with grim satisfaction that I know it serves them well.
Thanks,
     GerardM

Saturday, September 02, 2017

#Wikimedia - Where I make a stand / where I stand for

I was told that my priorities are not the shared priorities of our movement; this by a pivotal person in the WMF. I consider this a personal affront and I will spell out what I stand for and where I make a stand. When you want to personally verify the veracity of my commitment; read my blog and check out my involvement. I have blogged for over 10 years and the basics/citations are all there to find. I consider my position very much in line with what our movement is there for.

==Share in the sum of all knowledge==
This is the overarching aim of our movement. At this time we are congratulating ourselves with what we have achieved so far. There is a lot to celebrate particularly for the English reading world.

===Everything but English===
Given that only 40% of the world population can read English, our successes need to be measured for what we do for all the people in the world. I do not care for good intentions, I care for what can be observed. Financially there is no break down available on the amount spend on English versus the amount spend on all the rest. This is imho a diversity issue as potent as the gender gap. All the arguments why "English first" are structurally no different from any other "my group first" arguments. Just compare the amounts given to US American chapters versus the Indian chapter. In addition you may or may not consider the cost of the software that is developed with English Wikipedia in mind.

===Internationalisation and localisation===
I have searched briefly for "internationalisation" in the 2030 strategy papers. Could not find it. It is however the bedrock of Wikipedia. It is vital for any and all of the individual features of MediaWiki.

When you consider Wikimedia partners like the Internet Archive and their Open Library, we do not even consider how much we will to achieve when together we reach out to the other 60% as well. Our internationalisation platform is open to our open source partners and translatewiki.net is in my opinion a strategic resource.

===Partners===
The successes of our GLAM partnerships prove collaboration serves mutual interests. There are plans to improve Commons, a key part is the Wikidatification that will open up Commons, not only in English but also in any and all other languages. Where we could make more of a difference is help where our partners indicate what is relevant to them. We can show them the effect of the cooperation in any language. At this time what we show is limited to images. This is something we should expand on.

====Internet Archive====
The Internet Archive provides a vital service to our Wikipedias. Its Wayback Machine allows us to proof that references that used to be on the Internet existed. Effectively it is an import tool when the aim is to prevent misinformation. Its Open Library has two parts. The part I am interested in is making free e-books available to readers. We would do better when we collaborate just a bit more and help them with their internationalisation and localisation.

====OCLC====
The libraries of this world collaborate in the OCLC and share their links in one system; the Virtual International Authority File. In its WorldCat sytem, the idea is that people can find books in the library near to them. Thanks to the references to local libraries, it is always possible to know if a book, an author is known in whatever country. Important is for us to improve cooperation and the visibility of this collaboration for our readers and editors.

===Bringing things together===
I have helped bring data from Wikidata, OCLC and Open Library together. I am seeking the disambiguation of Open Library content using existing links to the Library of Congress to the VIAF and consequently to Wikidata. I am adding award winners because they provide arguments what articles to write or improve. Currently I am adding Dutch literature awards to show the Dutch National Library that this information exists and can be used. Recently I added botanical awards to show a group of botanists how small tasks like this add relevance.

===Outspoken stuff===
  • I am not a Wikipedian and consequently arguments specific to any Wikipedia are problematic, mostly irresponsible.
  • I care about diversity; issues around the gender gap do get extra attention from me but it is a secondary consideration.
  • I care about usability and use Reasonator and tools like Petscan and Awarder. The necessity to use Reasonator for so many years is proof perfect that usability does not have much of a priority. Having seen previous attempts at usability, I will consider it once it is available.
  • I expect that there will be more use for our data. Quality is key and collaboration on a meta scale is what will make this possible.
  • Wikidata is particularly useful in English. Theoretically other languages may profit from its multilingual nature. Institutional (WMF) interest is needed to improve this use of Wikidata. 
  • While I respect many efforts of the WMF, I find that its concentration on English Wikipedia has a very negative effect on a micro scale. It is not all bad but it is this division of labour and money that prevents us from having the most bang for our buck.
Thanks,
      GerardM

PS I resent that I felt the need to write this blogpost.

Sunday, August 27, 2017

#Wikidata - surge of new items

Lately there has been a surge of new items coming into Wikidata. They must be quite good when you consider the number of statements. The items with no statements are mainly part of the original load, the Wikipedia articles, and their number is slowly but surely decreasing (1.35% the last month).

With more items in Wikidata, there is more data to support, to edit. As it is, limits are put on the amount of edits. This can be appreciated because of the current performance problems but it is obvious that as this upward trend continues, more people and more data will come to Wikidata to edit as well as to query.

There is plenty of data waiting in the wings to be added. The big challenge is promoting the data that is of use and will enable more collaboration both with people and with organisations.
Thanks,
      GerardM

Saturday, August 26, 2017

#OpenLibrary - Charles Horn and its other volunteers

There are several reasons why Open Library and Internet Archive deserve attention. They provide downloadable books in many language and their Wayback machine comes to the rescue when links in references in Wikipedia go stale. Have a look at the presentation from Wikimania 2017 (from11:46).

The Internet Archive is officially one of the partners of the Wikimedia Foundation. When you ask who in the Wikimedia Foundation is the goto person for contacts with Internet Archive, there is no answer. It is as if there is no structure in contacts with our partners even when it plays dividends to collaborate in a more structured way. When you consider the "Coleman Boat" it is just as if the macro elements are totally missing and it is left for the micro elements to make the difference.

Macro effects of collaboration with the Open Library would be:
  • references are made to downloadable eBooks from Wikipedia - People read books
  • localisation are made at translatewiki.net - People read books in "other" languages 
  • books at Open Library are in Wikidata - links to eBooks are available
  • identifiers are widely shared and widely curated -  work of volunteers has the biggest impact
At a micro level, collaboration is happening. Charles Horn, a volunteer at Open Library is a stellar example. Charles added identifiers to Wikidata and VIAF in the Open Library database. He provided us with a large file of redirects and was instrumental in removing multiple identifiers to Open Library for authors.  He recently produced a Wikidata query to find duplicates and the Wikidata community was made aware of this maintenance work. 

Many of the macro opportunities become possible when conditions at Open Library are met. One big issue is the need for disambiguation and de-duplication. This is not helped with the massive amounts of data involved and the lack of data on the individual author level. While individuals like Charles have an immense effect, it is in the collaboration on a macro level where even bigger differences can be made. Consider; many books include identifiers like an ISBN or a link to the Library of Congress. So it is possible to leverage a tool developed at the Wikimedia Foundation to retrieve associated meta data or to find associated data at the OCLC.

It takes just a bit of friendly prodding from the macro people at the associated organisations, some reassurance that there is support for these efforts and there will be a lot of talent at the micro level making a big difference. Cooperation and coordination is what the organisations are to provide and we will share more of the knowledge that is available to all who come looking.
Thanks,
       GerardM

Sunday, August 20, 2017

#Wikidata - Martin Reints and {{Authority control}}

Martin Reints received the Herman Gorter Award in 1993. There is a Wikipedia article about him and consequently he was known in Wikidata. There was no "authority control" information for Mr Reints in Wikidata yet and this was quickly remedied.

The most interesting part is that the VIAF registration for Mr Reints already included a link to Wikidata. Proof perfect that librarians are actively working on keeping their house in order. There was an Open Library entry for Mr Reints and the Dutch article had a link to the DBNL-website for Dutch language authors.

Open Library I found is very much about books. Their data on the books they have is great; identifiers like ISBN-10 or ISBN-13 and links to the online catalog of the Library of Congress. This makes a lookup at the OCLC for identifiers of all the authors easy and disambiguation becomes more effective.

Wikidata is very much about data. You can query Wikidata for all the winners of the Herman Gorter Award and it the results you can add the links to VIAF or to the Open Library. This ability to query makes all kinds of applications possible like: "what books written by authors who won the Nobel Prize are available in your library?"
Thanks,
      GerardM

Saturday, August 19, 2017

#OpenLibrary and winners of the Herman Gorter Award

If you want to know if the Open Library is of relevance in other languages, you have to do some research. I wanted to find out if there are publications by the authors who won the prestigious Herman Gorter Award?

This award was conferred from 1945 to 2002 often to multiple authors. The first author not known to Open Library is H. C. ten Berge. He received the Herman Gorter award in 1964. There were several authors where Wikidata did not have a link yet for Open Library.

Now consider this: what if we could query Wikidata for all the authors and their publications in Open Library? 

Just a little bit more metadata about books, publications is what we need.. It is not really a big deal, only a few million additional records..

Many if not most of the books at Open Library have links to authorities like the Library of Congress. This makes it possible to link these books through the OCLC to "your library system". It knows about authors and that is what makes it possible to use tools in stead of people to enrich Wikidata and open up all that is in the Open Library for all of us.
Thanks,
       GerardM

Wednesday, August 16, 2017

#Wikipedia - #BlackLunchTable / Brooklyn Hip Hop

The Black Lunch Table project has an editathon on August 20th. It will focus on on important but underrepresented New York Hip Hop/rap artists.

In preparation they have created entries in Wikidata for artists with and without a Wikipedia article. In this way they can prepare information for the editors to use in their articles.

Magnus created a new tool and it shows who edited Wikidata. As a result we can create a query for the edits for the New York Hip hop event for the month of August.

It shows who has been doing all the work.
Thanks,
      GerardM

Monday, August 14, 2017

#Wikimedia - Women in blue

Dear Rosie, I saw your presentation. You want women in blue. In it you mention 300 lists of women. That is a lot of lists. In the mean time the biggest list of women with no article in a Wikipedia can be found on Wikidata.

There has been research in suggesting subjects to people and it works. Leila Zia, one of the WMF researchers wrote about a project they did. So the mechanism is there and you know, Wikidata has oodles of women with no article in "your" Wikipedia that have enough relevance given.

So how about a generator for ideas for articles to write? Leila knows many algorithms and Wikidata knows about many if not most of the women that are on your lists.. Come to think of it, why not add all the lists in Wikidata in the first place?
Thanks,
       GerardM

Sunday, August 13, 2017

#Wikidata - Three award winners of the #ASBA

The ASBA or the "American Society of Botanical Artists" started of in the USA only to become a truly international organisation. They are an important player in the revival of botanical art, they have many local chapters and they have a number of awards.

The three ladies to the right; are the winners of three awards. They now have their Wikidata entries.

I was introduced to people at the New York Botanical Garden and they indicated to me the relevance of illustrations. After that I got into contact with a lady from New Zealand who created a Google list of women scientific illustrators and artists. Her objective is to collect information for Wikipedia articles and many of them already do have an article.

The NYBG is planning future events and for its preparation they do like to include information about awards including awards about botanical illustrators. When the information in the spreadsheet is entered from the start in Wikidata, there is no need for Google lists; Wikidata can play its role in stead.
Thanks,
      GerardM

Saturday, August 05, 2017

#Wikidata - Harriet Martineau and some social opportunities

When you do not already know about Mrs Martineau, do read one of the many Wikipedia articles, she is considered to be the first female sociologist and introduced many subjects into sociology that were up to that time not considered.

The picture is a crop of a painting at the National Portrait Gallery by Richard Evans. The picture is known at Wikidata, at Commons the Creator template is missing.

At the Biodiversity Heritage Library Mrs Martineau was know for her book a complete guide to the English lakes. It was the only book known for her at Open Library.  Given the relevance of Mrs Martineau this was strange and sure enough she was known as "Martineau, Harriet" and changing the link to the book was easily done.

At Wikidata meanwhile, there was a hidden link to Mrs Martineau to Open Library thanks to all the good work of the Freebase volunteers. Approving the change was obvious.

At Wikidata there is now a link to both VIAF, to the BHL, to OL for Mrs Martineau and to over 20 more sources. The BHL has links to both Open Library and VIAF. When the links differ, it becomes obvious where work needs to be done.

The result is a better service for all the people who make use of any or all of these resources. We truly should collaborate and strengthen our partners, the partners we share data with.
Thanks,
      GerardM

#Standards - the International Plant Names Index

#IPNI is a collaborative project between three august bodies in the taxonomy of plants. They are the Royal Botanic Gardens, Kew, the Harvard University Herbaria, and the Australian National Herbarium.

There are three areas where IPNI sets the standards: plants, authors and publications. The objective is to disambiguate any taxonomic reference to a plant in scientific literature to the correct taxon given the taxon name, its author information, publication information and date.

IPNI publishes several graphs indicating the success of their work. I have been involved in this work as a consequence of a database project I did for my father who loved his cacti and succulents.

One example of what information IPNI provides can be found in this page for the "genus" Echninocactus. In my understanding, the correct full taxonomic name is: "Echinocactus Link & Otto Verh. Vereins Beford. Gartenbaues Konigl. Preuss. Staaten 3: 420. 1827". It has all the required information, it has type information, it has links all as you would expect of a standard like this.

To appreciate the work of IPNI; in stead of "Link & Otto", there may have been: "Link and Otto" or "Link et Otto" or ... obviously the information for the publication is easily made into a different abbreviation.

Wikidata included only a subset of the full taxon information. It is easy enough to understand why; Wikipedia only needed the most current one. It is an easy model; works relatively well and it breaks in the corner cases. With the development of WikiCite there is a great and possibly easy opportunity to expand on the current work given the expanding collaboration with botanical partners like the Biodiversity Heritage Library.
Thanks,
      GerardM

Sunday, July 30, 2017

#Wikidata - Mrs Helen M. Duncan is not the only geologist

There are many ways of updating Wikidata. Individual statements for individual items are made. They are worthwhile but on the grand scale of things they have little impact. Another approach is to seek sets of data that can be updated all at the same time.

Mrs Duncan is among others relevant to the Smithsonian Institute. The approach of adding loads of data for many people has the advantage that when the same issue like Mrs Duncan not being identified as a geologist, is fixed for many people at the same time.

To do this, I identified a category that implied the missing statement and I used PetScan to add all of the missing data in one go. Together with Mrs Duncan I made 1005 humans a geologist.

These are small numbers, they hardly register. But as it is, there are Wikidata administrators actively preventing edits because Wikipedia cannot cope with the volume of changes in its recent changes. 

There is no plan, no timetable for the underlying problem to be solved. Wikidata people are told not to make mass edits. It is however the only way to make a real difference and make Wikidata halfway usable.

There are two options:
  • improving Wikidata as fast as we can and in the best way possible - as a consequence changes at Wikidata will not all be visible in some Wikipedias
  • allow Wikidata to edit to the extend that Wikipedias can keep up with the volume of changes - as a consequence people will go away and new projects will not start
There is a prima facie case to be made for the edits to be seen in the Wikipedias. Its efficacy has not been studied and some say that the user interface sucks too much to be useful. Arguably keeping these changes is based on beliefs/assumptions and not on established facts. 

We should imho make all the edits we can make and when the Wikipedia recent changes are to be salvaged, give it the highest priority particularly at the Wikipedia end. It sucks that we can not provide all changes to them but hey that's life. 
Thanks,
      GerardM

Wednesday, July 26, 2017

#Wikidata - in #defence of Erika Herzog

On Facebook, Erika made a few comments that were not well received. A few really positive things did come out as a result but there is a need to defend Erika and her central argument. She asked if there had been a process of consulting the English Wikipedia community because the user interface of Wikidata is so poor. She said:
"... But I am pretty sure a lot of En Wikipedia editors are going to be sort of upset about this shift that requires them to actually edit Wikidata without a form input method (on WikiMarkup). Is there a form input on Visual Editor for this?"
On Facebook she is attacked for all the wrong reasons. A Wikimedia functionary asks: "How is this a Wikidata matter? English Wikipedia is where you want to discuss this." Erika's answer is spot on: "Actually no it's not. I'm tired of this response. It's not helpful or realistic. This is a Wikidata item in terms of buy-in and outreach to incorporate more Wikipedia editors. It's disingenuous to posit otherwise. This needs to be a discussion on both sides, and I think the onus is more on the Wikidata side as the usability and UX is poor at best."

One positive outcome of the Facebook thread is that it is mentioned that there is a method under development to edit Wikidata from Wikipedia templates. However welcome, it is going to introduce its own problems because the primacy of the data remains at Wikidata. The user interface of Wikidata is indeed awful. As one of the more prolific Wikidata editors I only use it for editing. For displaying the data I use Reasonator exclusively. Compare this with this for instance and you will see why.

The reason for this are applicable priorities. The WMF has too many concurrent ambitions for Wikidata and the staff is overextended. When the question is if Wikidata is sufficiently user friendly for an average Wikipedian, the answer is no. At this time Wikidata cannot cope with all the changed committed to it as it is, the wise words of Johan Cruyff apply; every disadvantage has its advantage.
Thanks,
      GerardM

Sunday, July 23, 2017

#Wikidata - Franziska Michor and #notability

Because of Facebook I read something about Franziska Michor. What triggered me was that she received an award. Her occupation, biomathematician, does not even exist (yet) on Wikidata.

To understand what a biomathematician does, it is great to watch the TedMED presentation by Mrs Michor. It gets me to the question of notability; I was amazed that Mrs Michor did not have a presence on Wikidata. I do not know if TEDMed is part of the TED project we had and I have no clue how to add this presentation.

The problem with an ever increasing scope of Wikidata, the challenge becomes less one of introducing data but more of maintaining data. This is particularly true when you look at Wikidata from a mathematical point of view. With Mrs Michor there are several datasets that gained notability and can do with some tender loving care: biomathematicians, TEDMed talks and the Vilcek Prize for Creative Promise.
Thanks,
     GerardM

Saturday, July 22, 2017

#Wikidata - Prix de Coincy and Raymond Benoist


The Prix Coincy is an award conferred by the French Botanical Society. The first time it was awarded was in 1904 according to the French article but the first botanist who is known to have received it, got it in 1906. He was Edmond Gustave Camus a red link in the French article but he has articles in several Wikipedias.

Botany is one of those subjects that have appeal; people care about plants, how they are named and consequently many botanists have articles in multiple Wikipedias. This became obvious when all the red links and black links in the article were entered in Wikidata. Like Mr Camus most already existed and just had to be associated with this award.

There are a few items that are not that obvious; Raymond Benoist is one. The French article has it that he received the award but there is no source and at that the only source for the award is the French article. Another issue is with the 1949 award; they are likely three people, one is Louis Quentin, the others Henri and Madeleine Stehlé. Nothing wrong with being bold I suppose..
Thanks,
     GerardM


Sunday, July 16, 2017

#Wikidata Tool - The #Awarder

The Awarder is a tool I use everyday to add people known to have received the award to Wikidata. Its use is straight forward:
  • find a list of award winners, a list that includes the person and the year it was conferred
  • copy the source text into the awarder
  • identify the wiki the data is from
  • identify the award by its Wikidata identifier.
  • open the results in "quick statements" for processing. 
Easy. When done properly the result is as good as the information from the Wikipedia it came from.

There are a few points. Some lists, like the one on the John Wesley Powell award, have the year on a line and the data is implied for the following text. The results is ten people identified. There are a few red links in there for instance for "George M. Hornberger" and Awarder has identified him so that I can click on a button to find him in Wikidata. As I did not, I added him in Wikidata for later processing. Awarder does not identify organisations as award winners so I had to add the identifier for for instance the "California Department of Transportation". John Galetzka is the award winner for 2016. He is a "black link" so I identified him in the tool with brackets and as a result I could add him as well.

For fifteen award winners it is now known that they won the award. Slowly but surely it adds to the relevance of these people in Wikidata and the missing award winners become easier to identify for the implied notability.
Thanks,
       GerardM

PS thank you Magnus for a great tool

Friday, July 14, 2017

#Wikidata VS #Wikipedia - the issue with input, output

I was told that I should not talk about quality because "on the basis of my work I did not give a good example". Basically I was told to stop what I am doing. As I have written a lot about quality and argued how we can achieve greater quality it is not funny nor is it appreciated but the guy has a point.

With 2,304,191 edits there must be a lot that is wrong in what I have done. No matter how careful I am, the percentage of errors that is to be expected means that with 6% there must be at least some 138,252 errors that I introduced. The problem is that depending on your outlook this is acceptable or it is not. When in stead of me 100 people did the same work, the result would have been the same; together they would have introduced around 138,252 errors as well.

I totally agree that we need to bring our errors down. There are three steps where errors have their origin; input, process and output.
  • My input is based on the Wikipedias; their content all have their own issues. They all operate on their own little islands; there is no or little coordinated effort to make the quality of the information we provide a collective ambition.
  • My process is based on identifying what I want to work on; typically awards, often the enrichment of data around one person. For tools I mainly use what Magnus provides; they provide superior usability. Reasonator makes Wikidata statements intelligible, it provides superior disambiguation and automated descriptions. Awarder adds both the year and the person who received an award. It allows me to effectively cover a lot of ground. They are the tools I use most, others like PetScan are also invaluable.
  • There is too much output I generate and consequently I do not care for individual edits. I justify them all for the process, the routines I follow. I added "Claudia Wills" based on the information in the article of the eponymous award. Like other notable birdwatchers, Mrs Wills does not have her own article and I added her to complement the information on the award.
We share in the sum of knowledge and when the quality of what we provide is to improve, our movement has to become dedicated to the quality of all our information. The typical Wikipedian does mostly care about his or her own project and that is fine; we do not need all of them in an effort to improve our overall quality. The effort I propose can be hidden from view.

A Wikipedia article contains many links; they are blue, red or black. All the blue links are implicitly linked to Wikidata items. Many issues become evident when they can be compared with the links in articles in other Wikipedias or Wikidata. Some Wikis have additional links and they can be mapped to red links and black links. This prevents problems when articles are written with the name suggested in this link.

Once articles on a same subject in many Wikipedias are linked, all kinds of additional functionality become easier; one that is close to my heart is when a new award winner becomes known..
Thanks,
      GerardM

Saturday, July 08, 2017

#Wikimedia project - #PlantsAndPeople

#Wikidata is a great to encourage collaboration and reporting for Wiki projects. The results of projects like the Black Lunch Table have been encouraging so for; reports for articles in multiple languages, gender ratios were possible because of the Wikidata link.

A new initiative is PlantsAndPeople. There have been editathons in the past and more are planned. It is about both people and plants so the kind of questions that may be asked will be quite interesting. For instance how many taxons were described by the people in the project and how many people were honoured in taxon names.

At this moment the people who are the subject of editathons are added. This list will grow slowly but surely and only once it is done, it can replace list in Wikipedia. It will take quite some time to get there because it makes sense to add additional data as well. This is the best way to quickly improve the quality of the data involved. So far quite a number of mycologists and ethnobotanists have been added. A question has been raised in Wikidata about people named in taxons and a picture that should be in Commons is waiting for someone else to transfer it.

When you are interested; join in the fun.
Thanks,
      GerardM

Wednesday, July 05, 2017

#Wikipedia - there once was a lady from #Estonia

Once upon a time there was a Wikipedian from Estonia. He decided to write about a fellow countryman, Kersti Kaljulaid. When your Estonian is as good as mine, it is not a name you remember or a person you are likely to have come across.

At the time this was the same for the English Wikipedians; she could not be notable because there were not enough sources in English.. So for all the good reasons the article was in danger. Our Estonian Wikipedian said: "wait a week". A week later Mrs Kaljulaid was the president of Estonia.

I have taken the liberty to add additional data in Wikidata. Mrs Kaljulaid received two awards and others award winners have been added. No sources for them in English either. To be brutally honest, incidents like this prove why English Wikipedia is only a subset of the sum of all knowledge. Because of this insistence on English sources, English Wikipedia can not cover the sum of all knowledge. People who seek reputable information on foreign subjects will not find it.
Thanks,
       GerardM

Sunday, July 02, 2017

Comparing #Wikipedia using blue, red and black links

There are reasons to compare Wikipedia articles on the same subject in multiple languages. When you just want to read, you may find additional information in another language but as you can imagine, the content should be largely the same. Consequently, the links in an article should go to articles that are about the same topic.

One problem with "blue" links is homonymy. You write a subject in the same but they are not the same; John Doe is one example. Finding these issues, issues that are surprisingly common, can be done by a bot using the Wikidata identifiers for the linked articles.

When there is no article to link to, there is no implicit link to Wikidata. There are two options; we can fake a link by accepting the red or a "black" link as synonymous or we can link a red or a "black" link to Wikidata. The latter is precise and has additional benefits.

When all links are associated with Wikidata items, it is obvious what links in what language are missing or are additional. They are of interest because they may imply potential information to be added to articles or they may point to errors even vandalism. Another benefit is that it helps establish a baseline for a NPOV or neutral point of view without a need to understand the language.
Thanks,
      GerardM

Saturday, July 01, 2017

#Wikipedia - Blue, red and black links

Lists in Wikipedia, like this list of award winners of the Tony Kent Strix award on the right exist as blue, red and "black" links. At the moment only an article in English exists about the award and based on past experiences it is likely that other award winners are known in other Wikipedias.

Based on the information in the article, it was easy enough to add the missing information in Wikidata for all the "black links". When you now compare the information in Wikidata with the Wikipedia article, it is feasible to link fixed text to a Wikidata item. This makes it feasible to trigger a warning once a blue link is possible based on new  Wikidata information. In this way a link to Jack Mills is already likely.

When we can compare the information in an article with data in Wikidata, there is an additional way to compare the information and prevent errors and vandalism. Wikidata is after all superior in its use as a tool for disambiguation.
Thanks,
     GerardM


Friday, June 30, 2017

#Wikidata - Rina Spigt en een prijs van de #VARA

Two years ago I blogged about the J.B. Broekszprize and the VARA. It mentioned a Mr Hof and the fact that it is assumed that this Mr Hof received the award in 1995.

Mrs Spigt was awarded the prize for a radio series about Mary Zeldenrust-Noordanus according to the Dutch Wikipedia article. She is the only one for whom it was not known in what year she received the award.

The best way of finding out? I asked her. Mr Spigt is on Facebook. It is an effective way of digging up some facts.

I always illustrate a blogpost so I googled for images and found this image of Mrs Spigt with her father. Another fact established. I did ask Mrs Spigt if she wanted additional information in Wikidata. A picture is welcome for instance. When you do ask, you may get confirmation about facts. Maybe not sourced in the Wikipedia way but nevertheless correct.
Thanks,
      GerardM

Thursday, June 29, 2017

#Wikimedia - Funding #Wikieducation is problematic

How can you not love WikiEducation.. It has a great reputation bringing the editing of Wikipedia to curricula. When you check out their Twitter account, their blog, their results they do an awesome job.

You can once you realise it is instrumental in maintaining the existing bias that favours English and English Wikipedia. Realistically you cannot even blame them for it because they define their operations as limited to the USA and Canada.

Still there are problems.
* People assume that the example of using university students to write articles is to be followed. For most of the Wikipedias there is too little content. The type of articles is not what is needed, more basic articles are needed.
* The Wikimedia Foundation has a huge bias for the English Wikipedia and consequently the 70% of the word population who do not speak English are underserved and less than 50% of the traffic of the WMF is English Wikipedia.

The solution is not to defund Wiki Education. A solution will only come once the WMF acknowledges that they have a diversity problem, a problem they do not acknowledge.
Thanks,
      GerardM

PS Does Wiki Education support French?

Saturday, June 24, 2017

#Wikipedia - Sister projects in search results

The Wikipedia Signpost informs that the discovery team extended the results for search on Wikipedia. New is that English Wikipedia now includes results from
WikisourceWiktionaryWikiquote and  Wikivoyage and that is indeed welcome news.

There is one puzzling part in the information; "Wikidata and Wikispecies are not within the scope of this feature." It is puzzling because including Wikidata search results is where search has been augmented for years in many Wikipedias including the English Wikipedia by the people who added this little bit of magic Magnus provided.

As you can see in the screenshot of the search for Wilbur R. Leopold, an award was conferred on him and the origin of this factoid is the article on the award. Thanks to Wikidata, information is available for Mr Leopold. There are so many references in Wikidata that have no article in a Wikipedia or any other project that from a search perspective it is probably the next frontier.

When wiki links, red links and even black links can be associated with Wikidata items, it becomes even easier to add precision to the search results. Adding these links is the low hanging fruit to improved quality in Wikimedia projects anyway. 
Thanks,
     GerardM


Sunday, June 18, 2017

#Wikidata - John P. A. Ioannidis and his awards

I am a self confessed award junkie. They are imho important because they are an indication of who is notable and who is less so.

Three awards are associated with professor Ionannidis in Wikidata. One award was also conferred on Hans Rosling and this gives me added confidence in Mr Ionannidis and other recipients of the Chanchlani Global Health Research Award.

Professor Ionannidis throws cold water on much of the practice of scientific practice and consequently on its practitioners. One of his papers has the title: Why most published research findings are false and it is inherently a challenge as well to what we write in the Wikipedias and Wikidata.

At Wikidata a wholesale import is happening of papers, science facts and its authors. This is a great idea, particularly when papers that dismiss much of the nonsense papers gets a prominent place. The result will be that the Neutral Point Of View gets an other twist; it balances what we include with actual science.
Thanks,
     GerardM

Saturday, June 17, 2017

#Wikidata vs #GeoNames - the first to throw a stone

Wikidata has some vocal people vilifying GeoNames. They insist that no data from GeoNames is included in Wikidata because "the quality is so bad". In my last post I wrote down assertions about Wikidata. One of them is that "Never mind how "bad" an external data source is, when they are willing to cooperate on the identification and curation of mutual differences, they are worthy of collaboration".

I wrote an email to Markc Wick, the founder of GeoNames and with his permission I can publish our mail exchange.

Hoi,The import of data from GeonNames into Wikipedia has been controversial. People say that the quality of the GeoNames data is not "good enough". It resulted in the deletion of thousands of articles from the Swedish Wikipedia. I am not Swedish, I did not follow their discussions but the problem is it sours collaboration with other parties because "their data might not be 100%".
This happened in the past, I care for the future. In Wikidata we do link to GeoNames (example Almere [1]).
There are several ways in which we can help each other and potentially even benefit from a collaboration. Wikidata is licensed with a CC-0 license and therefore GeoNames can have all our data and do with it as they please.
My initial proposal is for a comparison of the shared data. The data where GeoNames differs from Wikidata is potentially problematic. Concentrating on these differences together will improve both our and your data.
Would you be interested?
Thanks,
       GerardM
       Gerard Meijssen
His answer is everything I could hope for:
Hi Gerard
Thanks a lot for your email. A couple of weeks ago I have started to parse the wikidata extract and look for the matching attributes. Unfortunately I got interrupted and have not yet looked at the result of the parsing. I will continue as soon as I find the time.
The goal is to add the wikidata identifier to the alternatenames table with pseudos language code 'wkdt'. What I have noted so far is that sometimes the geonameids in wikidata go the wrong concept. For instance going to the city feature when the article is speaking about the administrative division or vice versa. This is one of the things I would like to check before adding the wikidataid as alternatename. GeoNames also has links to wikipedia.
I don't think wikipedia should import all geonames features, not all of them are relevant enough to justify a wikipedia article.
Best Regards 
Not only is there an interest to collaborate; Marc is checking the links in Wikidata referring to GeoNames and as can be expected he finds issues. As I asserted, this is to be expected and collaboration is the only way forward for optimal results.
Thanks,
      GerardM

Tuesday, June 13, 2017

#Wikidata some assertions

Wikidata is no different from any community, there are differences of opinion. Everybody has his or her own perspective but there are assertions that can be made that have a more universal resonance. 

The assertions below represent the underlying arguments I use in my blog posts and in the discussions I take part of. They are the ones I feel are not necessarily "political" or have a negative impact.
Thanks,
       GerardM
  1. There is no data store without problems, this includes Wikipedia and Wikidata.
  2. The data we hold is best understood by applying set theory. The data in Wikidata consists of many subsets; probably the most valuable subset for the WMF are the interwiki links.
  3. The error rate in each subset can be assessed and is by definition different from the overall Wikidata error rate
  4. The absence of data often indicates a bias in the data Wikidata holds. A good example is the lack of data relevant to the global south.
  5. Given the huge influx of data from Wikipedia, the biggest imports have been from English Wikipedia and it is one reason for the existing biases in Wikidata.
  6. An absence of data prevents the application of tools. Tools may suggest writing a Wikipedia article, tools may compare data with other sources.
  7. Concentrating on the differences between Wikidata and any other data source is the most optimal way of improving the quality of existing data in either data set.
  8. Having an application for the data in Wikidata is the best way for improving the usefulness for a subset of data.
  9. Each contributor to Wikidata works on the data set(s) of his/her own choice, these data sets interact in the whole of Wikidata. This may raise issues and this can not always be avoided.
  10. Examples of problematic data must be seen in the light of the total of the data set they are part of. Statistically they may be irrelevant.
  11. Never mind how "bad" an external data source is, when they are willing to cooperate on the identification and curation of mutual differences, they are worthy of collaboration
  12. Wikidata improves continually and as such it is "purrfect" but it will never be perfect.

Monday, June 12, 2017

#Causegraph, an other way of looking at #Wikidata


Causegraph is a tool to visualize and analyze cause/influence relationships using Wikidata. If you have not seen it yet, give it a spin.

Randomly looking at the galaxy of relations, I found a Charles Frédéric Bassenge, he is in Wikidata because he is the father of Pauline Runge. He is in Wikidata because she has an entry in WikiTree. What amazes me most is the quality of the data for the father and his absence in WikiTree. 

Causegraph works on the basis of there being a direct relation between two persons. For Jacob Palis, the doctoral students and doctoral advisers are included and not the other TWAS award winners.

What is really good is that it is regularly updated. It would be even better when it was a Labs tool. This might enable real time updates .. <grin> there is always a wish for more and better </grin>
Thanks,
       GerardM

Sunday, June 11, 2017

How #Wikipedia gets into @Africa


This is a map showing how fiber is getting into Africa. The blind spots is where the Internet does not go. The red lines is where the future for the Wikimedia lies.
Thanks,
        GerardM

#Wikidata - Premio Almirante Álvaro Alberto

The Premio Almirante Álvaro Alberto is named after admiral Álvaro Alberto da Mota e Silva. They are both notable for their own reasons.

The award was mentioned in an article on the German Wikipedia for César Camacho. The award was not known to Wikidata and was added. The website of the conferring organisation gives me the impression that it is the "National Council for Scientific and Technological Development" and part of the Brazilian ministry of sciences. When you look for it in Wikidata, it is embarrassing.

The admiral is probably a child of his time. He was military and also a very relevant scientist. As a military man he held the rank of vice admiral and as a scientist he was twice the president of the academy of scientists. He was also very much involved in the Brazilian nuclear program.

When you consider the notability of Brazil, it is astounding how little is known in Wikidata. Many politicians have been added for Brazil; national senators and deputies. 

Brazil is one of the top twenty countries in the world I think, when you consider any and all of the "lesser" countries it is obvious that we know even less. When Wikipedia and by inference Wikidata is about the sum of all knowledge, there is a lot of white space where all our tools have no impact.
Thanks,
     GerardM