Sunday, August 11, 2019

How to value open data and why Wikidata won't go stale

The data in Wikidata is data everyone knows or could know. A lot of awful things could be said of its content and quality and all of it misses one important point. It is being used, its use is increasing, it is increasingly used by Wikipedias and that provides an incentive to maintain the data.

What Wikipedia indicates is that most data is stable, not stale. A date of birth, a place of birth so much remains the same. When we bury data in text, it is always a challenge to get the data out. When we bury data in Wikidata it just takes a query to bring it back to life. Who was a member of multiple "National Young Academies, Similar Bodies and YS Networks" for instance; you do not find it in the texts of those organisations but you will increasingly find it in Wikidata. Once the data is in there, it is stable and available for query.

As GLAMS make their content available under a free license, their collections gain relevance as the collection gains an audience. Just consider that only a small part is available to the public in the GLAM itself and on Commons it is there for all to find. Commons is being wikidatified and those collections become available in any language gaining additional relevance in the process.

The best example is what the Biodiversity Heritage Library does. It is instrumental in the digitisation of books, it makes them publicly available and gains the collections they are from an audience. Volunteers prove themselves in this process and both professionals and the wider world benefit. From a data perspective the data is new because only now available.

When a publisher mocks Open data, it is self serving. It is in their interest that data is inaccessible, only there for those who pay. There are plenty of examples of great data initiatives that went to ground and obviously when the data does not pay the rent, publishers will pull the plug. It is different for the data at Wikidata. It is managed by an organisation that has as its motto "share in the sum of all knowledge". The audience the WMF has makes it a world top ten website, it is not for sale and it is not going anywhere. As long as there are people like me who care about the availability of information, the data at Wikidata may go stale in places waiting for another volunteer to pick up the slack.
Thanks,
      GerardM

Saturday, August 10, 2019

#Statistics or how many researchers are a #physicist

At @Wikidata most "researchers" are given this "occupation" out of convenience. We do not know how to label them properly, there are too many, so as all scholars must be researchers we make them so.

Nothing inherently wrong here; it is better to know them for what they also are then to know nothing about them at all. One issue though; we do not know the physicists from the chemists, from the behaviorists or any other specialism in science. We can query for physicists anyway but we will not catch them all.

Queries that show the numbers for a profession are easy enough to make. The value of such one time wonders is minimal, the results are fleeting, any moment now another scientists like Walter Hofstetter may become known to be a physicist and the numbers are no longer true. They are useful when we run queries like these regularly, save the results and present them like Magnus does for Wikidata itself.

What it takes is a mechanism that mimics Magnus's approach. We gain an insight in how Wikidata is performing over time and it provides motivation for people who care, for instance about physicists.
Thanks,
       GerardM

Tuesday, August 06, 2019

#Statistics for National Young Academies

The Global Young Academy is linked to many national young academies. It and they represent many relevant scientists. They represent all of science and, they are interested in representing science to the public. The question: how can we make them visible.

First you add the orgs to Wikidata and then the scientists to the orgs. When you then add the same Listeria lists to Wikipedias, we will see a picture when we have one and we may notice who has a local Wikipedia article.

There are many interesting statistics possible:
  • the gender ratio
  • the different professions in the mix
  • awards received 
  • the known number of publications per person
  • the organisations they are employed at
However, first things first. It is my intention to include all the current members and the alumni of the Young Academy of Sweden before Wikimania.. Second, these scholars are bright :) once they put their mind to it, they will help themselves to nice statistics based on the info we accumulate in Wikidata. They can be linked to on the Wikipedia pages.
Thanks,
     GerardM

Sunday, August 04, 2019

Helping @Wikipedia readers find their read, one author, one publication at a time

Reading is what the public of Wikipedia does and in a way, every Wikipedia is an invitation to further reading. Wikipedia is an encyclopedia and by definition, its coverage of a subject is limited. Its reliability is defined by its sources and they themselves are typically a subset of what may be read about that subject.

The quality of the invitation for further reading differs. How do we invite people to read a Shakespeare in Dutch, German, Malayalam, Kannada even English?

The primary partner in this quest for further reading; the local library. We can put all of them on a map and invite people to go there or to probe its website for further reading. Having them all in Wikidata with their coordinates puts libraries and what they stand for on the map. We can invite them to use services like OpenLibrary or WordCat, the bottom line; people read.

In this people first approach, the user interface is in the language people want to read their book in. It follows that the screen may be sparse. When it is to be a success, it is run like a business. We have statistics on libraries, people seeking, books found and a perspective in time. It is about people reading books not about transliterating books. Our business model: people reading. Funding is by people, organisations who care about more reading by more people. The numbers entice people to volunteer their efforts making more books, publications available in the language they care about.

To make this happen, the WMF takes the lead enabling and maintaining such a system and partnering with any and all organisations that care about this, organisations like the OCLC and the Internet Archive.

We will succeed when we make the effort.
Thanks,
       GerardM

Saturday, August 03, 2019

Competing with the PayWalls? Hell no!!

Competition is about business models and, the business models of the Wikimedia Foundation and publishers are utterly different. At the WMF we do better when people read more. The business model of publishers is that people pay before they read. When people like our service, they share their data, their money enabling us to do more.

Notions of "professional results" for our readers are outside of either business model. Terminology like "professional results" are interesting but to some extend they are a fringe benefit.

When a professional adds 278 ORCiD identifiers to Wikidata, he and all his colleagues benefit professionally because he did put in the effort. It follows that Roderic Page is a member of our community and his professional work benefits us all. These 278 scholars need to have their work known at Wikidata and when Roderic and others want to work on other scientists as well, they may.

There is no point is competing with paywalling publishers. Whether people do need to use a document that is behind paywall it is of no real concern to us. When we point to a free version of a paywalled document we do a better job because more people read. The business model of a publisher is of no further concern to us, our aim is for people to read.
Thanks,
     GerardM

Sunday, July 28, 2019

Daniel Pomarède or a method to include an interesting paper in @Wikidata

I read about an interesting astronomical phenomena, this is where you find the abstract. Daniel Pomarède, one of the authors is open in ORCiD about his work. I found that he has a presence on Wikidata by searching for his ORCiD id: "0000-0003-2038-0488".

Marking Mr Pomarède for an update is what it takes to give the article I think is interesting more of an audience. In the process there will be more to read about this part of astronomy.
Thanks,
       GerardM

Hey @GlobalYAcademy this blogpost is for you II

As I have added all global young academics in Wikidata, an update. What I like best about this academy is its global reach and the spread among the sciences. I am happily pleased that you may be found in ORCiD, Google Scholar and VIAF.

The most vital of them all for Wikidata is ORCiD; when your data "can be seen by everyone", we can retrieve your data, import it in Wikidata and make a "Scholia" for you. This is the Scholia of one of my favourite young academics. The import process (SourceMD) is broken at this time and this is my backlog of jobs to run.

Running a process for you will import co-authors and papers we do not know about. Given your global spread, it follows that your co-authors will have a similar global spread and this is an anti-dote to the Anglo-American bias we have in Wikidata and the Wikipedias. Particularly when I run a second job, a job will run for your co-authors with pubic ORCiD information as well, improving the subset of the data you are part of.

There are things you can do that have an impact on what we do:
  • You can check your data, add what is missing, improve what is wrong or missing on your Wikidata item
    • You can create/improve your ORCiD data and make it visible to everyone
    • You can trust organisations like CrossRef to update your data in ORCiD on your behalf
    • Please add your "name in native language" and indicate using the ISO 639 code the language it is in.
    • Check if the authorities that are linked to you are indeed correct and do not link to a false friend
    • Add your occupation
    • Please add other authorities that know you.. ISNI for instance
  • We love to have a (freely licensed) picture, it helps with disambiguation. You can upload it to Commons.. 
    • Having a picture on the GYA website and on Google Scholar is why there are so many links to Google Scholar
So what is in it for us?
  • We want people to know about science and learn about the scientific record
  • We want people to write Wikipedia articles and your papers may be used as references.
  • There are many gaps in our coverage of science. We know and it is improved one paper, one scientist at a time.. There is even the option to work on a specific subject.. like this one
As a member or the GYA, you are part of an outreach program. We happily invite you to work with us and together do the best job possible for science.
Thanks,
      GerardM

Friday, July 26, 2019

Authorities relieve us from the tedium of completeness and enable functionality

At Wikidata we do not rely on any one authority and we refer to many. As a consequence we bring many links to authorities to our users and only when they know how to value them, it is of value.

The link for William Shakespeare to the Open Library gains you access to his work.. It links those works to the Library of Congress to indicate that it is indeed that work of the bard.

When authors we know are linked to the Open Library, it does not really matter if we know their books. People find them regardless. When we want people to read, all we need to do is promote these links to Open Library and to local libraries.. Such promotion could be done in the Wikipedias. Like we do for WorldCat and WorldCat could be so much better if it is about local attention for the user and consequently have more of a purpose.

One project on Wikidata has been to include scholarly works that are free to read. Free to read enables for those works and their authors an additional audience and increased relevance. However among all the works we represent, we do not know what works were added. That makes it a fail. There is an authority for that. It is Unpaywall. However even when we have a link to Unpaywall it only makes a difference when people use it and read articles. This effect is something we can measure when people go to the free version of an article.

We can get the database of Unpaywall and add just another authority. Next is the issue of maintenance. We could partner with Unpaywall and have a hybrid system where we import the database and regularly check those articles we do not know to be open.

In this way we still do not see the effect of more reads of open science. To achieve that we should mark free articles with an Unpaywall icon in Scholia and Reasonator. Measuring the amount of reads is now possible and we positively acknowledge authors with free to read articles.

Next could be an Unpaywall icon in Wikipedia for all free to read references..
Thanks,
      GerardM