Tuesday, November 28, 2017

#Wikidata - Disambiguating for the Biodiversity Heritage Library

Tatiana Carneiro is an entymologist. Her work is known at the Biodiversity Heritage Library. When you check the "authors page", there are two other identifiers known and, for "Tatiana R. Carneiro" the same two identifiers are shown as well.

When you google for Mrs Carneiro all kinds of information may be found but you do not want to do this for all the 177,271 BHL authors that are waiting in Mix'n'Match. It is no fun and only a few people take up a task like this.

So the question is; how do we make it more rewarding and how do we bring the many Brazilian papers to Wikidata as well. What is it that there is to achieve and how does it benefit all the people reading Wikimedia content.

For readers of our content, there is little merit in the fact that all these authors published papers. Many of them have been published with a DOI and, many of these papers are freely available to read. For them the papers are important. So contrary to a more normal database approach it is not the authors we should concentrate on but it is their publications. In addition to this, the BHL actively promotes the use of illustrations and publish them on Flickr. Thanks to the fine work of people like Fae these illustrations end up on Commons as well. It will be a challenge to link them to all this metadata..

There are millions of illustrations, there are far fewer publications and many authors are known for not one but multiple publications. To complicate it even further, an illustration has an illustrator and many publications are exclusively found in archives. Many publishers are no longer active and all this information is or may be considered relevant.

So what to do; first import all the publications that are freely readable. The publications with a DOI and include the author information as "author name string". When an author is known to Wikidata, we can always add the author information as well. The benefit of this approach? People can read now.

To make it interesting we can run a bot using the APIs of the BHL. We add missing books for authors and add the authors to the books where this information is missing. Running this regularly will make it interesting for anyone interested in the work of the BHL. But most importantly, people can read now.

