Sunday, September 22, 2019

Comparing datasets, bigger or better or it does not matter?

When Wikidata was created, it was created with a purpose. It replaced the Wikipedia based interwiki links, it did a better job and, it still does the best job at that. Since then the data has been expanded enormously, no longer can Wikidata be defined by its links to Wikipedia as it is now only a subset.

There are many ongoing efforts to extract information from the Wikipedias. The best organised project is DBpedia, it continuously improves it algorithms to get more and higher grade data and it republishes the data in a format that is both flexible and scalable. Information is also extracted from the Wikipedias by the Wikidata community. Plenty of tools like petscan and the awarder and plenty of people working on single items one at a time.

Statistically on the scale of a Wikidata, individual efforts make little or no impression but in the subsets the effects may be massive. It is for instance Siobhan working on New Zealand butterflies and other critters. Siobhan writes Wikipedia articles as well strengthening the ties that bind Wikidata to Wikipedia. Her efforts have been noticed and Wikidata is becoming increasingly relevant to and used by entomologists.

There are many data sets, because of its wiki links every Wikipedia is one as well. The notion that one is bigger or better does not really matter. It is all in the interoperability, it is all in the usability of the data. Wikipedia wiki links are highly functional and not interoperable at all. More and more Wikipedias accept that cooperation will get them better quality information for its readers. Once the biggest accept data as a resource to curate the shared data the act of comparing data sets is improved quality for all.
Thanks,
      GerardM

Saturday, September 07, 2019

Language barriers to @Wikidata

Wikidata is intended to serve all the languages of all the Wikipedias for starters. It does in one very important way; all the interwiki links or the links between articles on the same subject are maintained in Wikidata.

For most other purposes Wikidata serves the "big" languages best, particularly English. This is awkward because particularly people reading other languages stand to gain most from Wikidata. The question is: how do we chip away on this language barrier.

Giving Wikidata data an application is the best way to entice people to give Wikidata a second look.. Here are two:
  • Commons is being wikidatified and it now supports a "depicts" statement. As more labels become available in a language, finding pictures in "your" language becomes easy and obvious. It just needs an application
  • Many subjects are likely to be of interest in a language. Why not have projects like the Africa project with information about Africa shared and updated by the Listeria bot? Add labels and it becomes easier to use, link to Reasonator for understanding and add articles for a Wikipedia to gain content.
Key is the application of our data. Wikidata includes a lot, the objective is to find the labels and we will when the results are immediately applicable. It will also help when we consider the marketing opportunities that help foster our goals.

Thanks,
      GerardM

@Wikidata - #Quality is in the network

What amounts to quality is a recurring and controversial subject. For me quality is not so much in the individual statements for a particular Wikidata item, it is in how it links to other items.

As always, there has to be a point to it. You may want to write Wikipedia articles about chemists, artists, award winners. You may want to write to make the gender gap less in your face but who to write about?

Typically connecting to small subsets is best. However we want to know about the distribution of genders so it is very relevant to add a gender. Statistically it makes no difference in the big picture but for subsets like: the co-authors of a scientist or a profession, an award, additional data helps understand how the gender gap manifests itself.

The inflation of "professions" like "researcher" is such that it is no longer distinctive, at most it helps with the disambiguation from for instance soccer stars. When a more precise profession is known like "chemist" or "astronomer", all subclasses of researcher, it is best to remove researcher as it is implied.

Lists like members of "Young Academy of Scotland", have their value when they link as widely as possible. Considering only Wikidata misses the point, it is particularly the links to the organisations, the authorities (ORCiD, Google Scholar, VIAF) but also Twitter like for this psychologist. We may have links to all of them, the papers, the co-authors. But do we provide quality when people do not go into the rabbit hole?
Thanks,
      GerardM

Sunday, August 25, 2019

There is much more to read; introducing the "one page wonder"

Given that our aim is to share in the "sum of all knowledge", realistically we will not have it all at our disposal to share. It is also fairly likely that we will not know about all subjects.

When you google for a given subject, it is as likely as not that you will drown in too much data, too many false friends or find nothing at all when there is nothing to find in "your" language.

Increasingly, what we know about in the Wikiverse is linked to a Wikidata item. Pictures may depict a subject, articles may be written about a subject and all of them refer to a Wikidata item that may have labels in any language. Items that may even have links to references.

When we are to find for Wikipedia readers more to read, we need a mechanism, a place where we can link a subject to external resources. Resources like the Internet Archive, "your" library. the papers we know in WikiCite but to the free versions of these papers. The page will show the label in "your" language and  a picture. It links to all the pictures depicting the subject as well.

Putting the "one page wonder" in production is easy. It is all on one page and is fully internationalised. The localisation is done at translatewiki.net and when people want to make it useful in "their" language, they will add the missing labels for the Wikidata items.

With the "one page wonder" in place it becomes interesting:
  • Is "your" local library known to us and do we get your permission to find it for you. How do we supply "your" library with a search string?
  • The Internet Archive's wayback machine may have content in "your" language but can you navigate its English only user interface. 
  • What other organisations do we want to partner with to provide you with more to read
  • Will we be able to show local pictures, a Dutch cow looks different from an Indian cow..
  • What other issues will there be..
  • Oh and yes, we can include the Reasonator, queries and what have you.. we just have to think about what to show.
Thanks,
       GerardM

#Translatewiki.net the @Wikimedia movement infrastructure most people even do not know


Just consider this, there are more than 200 functioning Wikipedias and this is only possible because people localise the MediaWiki software in over 280 languages. It makes translatewiki.net, the website where all this work happens a strategic resource to the Wikimedia movement.

Internationalisation (i18n) and localisation (l10n) are an integral part of software development. It is an integral part of a continuous process and it requires constant attention. The day to day jobs are well in hand. The localisation itself is a community effort and with developers continually expanding the software base a continuous effort is needed of the translators to keep up with their language. This is hard and for many languages it is a struggle to keep up with even the "most used messages".

Managing this effort is a continuous effort, it is essential to maintain the i10n and the localisation optimally. It follows that it should be obvious what messages have the biggest impact first on the readers and then the editors of a Wikipedia. What should be in the "most used messages" changes over time and when it is considered strategic, such maintenance is to be considered a Wikimedia/MediaWiki undertaking. 

Translatewiki has always been an independent partner of the Wikimedia Foundation and it has always been firmly part of the Wikimedia movement. Given that partnerships are a key part of the strategic plans of the WMF, the proof of the partnership pudding is very much in how it interacts with a translatewiki.net. TWN does not need to be part of the WMF organisation for it to fund TWN, it is clearly a quid pro quo. The WMF should even encourage TWN and other partners to collaborate for their i18n and l10n and enable this for strategic purposes, strengthening these partners globally. 
Thanks,
     GerardM

Sunday, August 11, 2019

How to value open data and why Wikidata won't go stale

The data in Wikidata is data everyone knows or could know. A lot of awful things could be said of its content and quality and all of it misses one important point. It is being used, its use is increasing, it is increasingly used by Wikipedias and that provides an incentive to maintain the data.

What Wikipedia indicates is that most data is stable, not stale. A date of birth, a place of birth so much remains the same. When we bury data in text, it is always a challenge to get the data out. When we bury data in Wikidata it just takes a query to bring it back to life. Who was a member of multiple "National Young Academies, Similar Bodies and YS Networks" for instance; you do not find it in the texts of those organisations but you will increasingly find it in Wikidata. Once the data is in there, it is stable and available for query.

As GLAMS make their content available under a free license, their collections gain relevance as the collection gains an audience. Just consider that only a small part is available to the public in the GLAM itself and on Commons it is there for all to find. Commons is being wikidatified and those collections become available in any language gaining additional relevance in the process.

The best example is what the Biodiversity Heritage Library does. It is instrumental in the digitisation of books, it makes them publicly available and gains the collections they are from an audience. Volunteers prove themselves in this process and both professionals and the wider world benefit. From a data perspective the data is new because only now available.

When a publisher mocks Open data, it is self serving. It is in their interest that data is inaccessible, only there for those who pay. There are plenty of examples of great data initiatives that went to ground and obviously when the data does not pay the rent, publishers will pull the plug. It is different for the data at Wikidata. It is managed by an organisation that has as its motto "share in the sum of all knowledge". The audience the WMF has makes it a world top ten website, it is not for sale and it is not going anywhere. As long as there are people like me who care about the availability of information, the data at Wikidata may go stale in places waiting for another volunteer to pick up the slack.
Thanks,
      GerardM

Saturday, August 10, 2019

#Statistics or how many researchers are a #physicist

At @Wikidata most "researchers" are given this "occupation" out of convenience. We do not know how to label them properly, there are too many, so as all scholars must be researchers we make them so.

Nothing inherently wrong here; it is better to know them for what they also are then to know nothing about them at all. One issue though; we do not know the physicists from the chemists, from the behaviorists or any other specialism in science. We can query for physicists anyway but we will not catch them all.

Queries that show the numbers for a profession are easy enough to make. The value of such one time wonders is minimal, the results are fleeting, any moment now another scientists like Walter Hofstetter may become known to be a physicist and the numbers are no longer true. They are useful when we run queries like these regularly, save the results and present them like Magnus does for Wikidata itself.

What it takes is a mechanism that mimics Magnus's approach. We gain an insight in how Wikidata is performing over time and it provides motivation for people who care, for instance about physicists.
Thanks,
       GerardM

Tuesday, August 06, 2019

#Statistics for National Young Academies

The Global Young Academy is linked to many national young academies. It and they represent many relevant scientists. They represent all of science and, they are interested in representing science to the public. The question: how can we make them visible.

First you add the orgs to Wikidata and then the scientists to the orgs. When you then add the same Listeria lists to Wikipedias, we will see a picture when we have one and we may notice who has a local Wikipedia article.

There are many interesting statistics possible:
  • the gender ratio
  • the different professions in the mix
  • awards received 
  • the known number of publications per person
  • the organisations they are employed at
However, first things first. It is my intention to include all the current members and the alumni of the Young Academy of Sweden before Wikimania.. Second, these scholars are bright :) once they put their mind to it, they will help themselves to nice statistics based on the info we accumulate in Wikidata. They can be linked to on the Wikipedia pages.
Thanks,
     GerardM

Sunday, August 04, 2019

Helping @Wikipedia readers find their read, one author, one publication at a time

Reading is what the public of Wikipedia does and in a way, every Wikipedia is an invitation to further reading. Wikipedia is an encyclopedia and by definition, its coverage of a subject is limited. Its reliability is defined by its sources and they themselves are typically a subset of what may be read about that subject.

The quality of the invitation for further reading differs. How do we invite people to read a Shakespeare in Dutch, German, Malayalam, Kannada even English?

The primary partner in this quest for further reading; the local library. We can put all of them on a map and invite people to go there or to probe its website for further reading. Having them all in Wikidata with their coordinates puts libraries and what they stand for on the map. We can invite them to use services like OpenLibrary or WordCat, the bottom line; people read.

In this people first approach, the user interface is in the language people want to read their book in. It follows that the screen may be sparse. When it is to be a success, it is run like a business. We have statistics on libraries, people seeking, books found and a perspective in time. It is about people reading books not about transliterating books. Our business model: people reading. Funding is by people, organisations who care about more reading by more people. The numbers entice people to volunteer their efforts making more books, publications available in the language they care about.

To make this happen, the WMF takes the lead enabling and maintaining such a system and partnering with any and all organisations that care about this, organisations like the OCLC and the Internet Archive.

We will succeed when we make the effort.
Thanks,
       GerardM

Saturday, August 03, 2019

Competing with the PayWalls? Hell no!!

Competition is about business models and, the business models of the Wikimedia Foundation and publishers are utterly different. At the WMF we do better when people read more. The business model of publishers is that people pay before they read. When people like our service, they share their data, their money enabling us to do more.

Notions of "professional results" for our readers are outside of either business model. Terminology like "professional results" are interesting but to some extend they are a fringe benefit.

When a professional adds 278 ORCiD identifiers to Wikidata, he and all his colleagues benefit professionally because he did put in the effort. It follows that Roderic Page is a member of our community and his professional work benefits us all. These 278 scholars need to have their work known at Wikidata and when Roderic and others want to work on other scientists as well, they may.

There is no point is competing with paywalling publishers. Whether people do need to use a document that is behind paywall it is of no real concern to us. When we point to a free version of a paywalled document we do a better job because more people read. The business model of a publisher is of no further concern to us, our aim is for people to read.
Thanks,
     GerardM

Sunday, July 28, 2019

Daniel Pomarède or a method to include an interesting paper in @Wikidata

I read about an interesting astronomical phenomena, this is where you find the abstract. Daniel Pomarède, one of the authors is open in ORCiD about his work. I found that he has a presence on Wikidata by searching for his ORCiD id: "0000-0003-2038-0488".

Marking Mr Pomarède for an update is what it takes to give the article I think is interesting more of an audience. In the process there will be more to read about this part of astronomy.
Thanks,
       GerardM

Hey @GlobalYAcademy this blogpost is for you II

As I have added all global young academics in Wikidata, an update. What I like best about this academy is its global reach and the spread among the sciences. I am happily pleased that you may be found in ORCiD, Google Scholar and VIAF.

The most vital of them all for Wikidata is ORCiD; when your data "can be seen by everyone", we can retrieve your data, import it in Wikidata and make a "Scholia" for you. This is the Scholia of one of my favourite young academics. The import process (SourceMD) is broken at this time and this is my backlog of jobs to run.

Running a process for you will import co-authors and papers we do not know about. Given your global spread, it follows that your co-authors will have a similar global spread and this is an anti-dote to the Anglo-American bias we have in Wikidata and the Wikipedias. Particularly when I run a second job, a job will run for your co-authors with pubic ORCiD information as well, improving the subset of the data you are part of.

There are things you can do that have an impact on what we do:
  • You can check your data, add what is missing, improve what is wrong or missing on your Wikidata item
    • You can create/improve your ORCiD data and make it visible to everyone
    • You can trust organisations like CrossRef to update your data in ORCiD on your behalf
    • Please add your "name in native language" and indicate using the ISO 639 code the language it is in.
    • Check if the authorities that are linked to you are indeed correct and do not link to a false friend
    • Add your occupation
    • Please add other authorities that know you.. ISNI for instance
  • We love to have a (freely licensed) picture, it helps with disambiguation. You can upload it to Commons.. 
    • Having a picture on the GYA website and on Google Scholar is why there are so many links to Google Scholar
So what is in it for us?
  • We want people to know about science and learn about the scientific record
  • We want people to write Wikipedia articles and your papers may be used as references.
  • There are many gaps in our coverage of science. We know and it is improved one paper, one scientist at a time.. There is even the option to work on a specific subject.. like this one
As a member or the GYA, you are part of an outreach program. We happily invite you to work with us and together do the best job possible for science.
Thanks,
      GerardM

Friday, July 26, 2019

Authorities relieve us from the tedium of completeness and enable functionality

At Wikidata we do not rely on any one authority and we refer to many. As a consequence we bring many links to authorities to our users and only when they know how to value them, it is of value.

The link for William Shakespeare to the Open Library gains you access to his work.. It links those works to the Library of Congress to indicate that it is indeed that work of the bard.

When authors we know are linked to the Open Library, it does not really matter if we know their books. People find them regardless. When we want people to read, all we need to do is promote these links to Open Library and to local libraries.. Such promotion could be done in the Wikipedias. Like we do for WorldCat and WorldCat could be so much better if it is about local attention for the user and consequently have more of a purpose.

One project on Wikidata has been to include scholarly works that are free to read. Free to read enables for those works and their authors an additional audience and increased relevance. However among all the works we represent, we do not know what works were added. That makes it a fail. There is an authority for that. It is Unpaywall. However even when we have a link to Unpaywall it only makes a difference when people use it and read articles. This effect is something we can measure when people go to the free version of an article.

We can get the database of Unpaywall and add just another authority. Next is the issue of maintenance. We could partner with Unpaywall and have a hybrid system where we import the database and regularly check those articles we do not know to be open.

In this way we still do not see the effect of more reads of open science. To achieve that we should mark free articles with an Unpaywall icon in Scholia and Reasonator. Measuring the amount of reads is now possible and we positively acknowledge authors with free to read articles.

Next could be an Unpaywall icon in Wikipedia for all free to read references..
Thanks,
      GerardM

Sunday, July 21, 2019

@Wikimedia, when we do "science outreach" what audience do we reach out to and why?

A recent tweet said "If you are an outstanding woman then you have a 1 IN 6 chance of having a @Wikipedia article. If you are an #African woman then you have a 1 in 300 chance." This bias exists for all Africans and all of Africa.

In Wikicite, with all respect for what has been achieved, we find a professional approach by scientists. Their profession, their data and this is all well and good. However, as the University of California no longer has access to the Elsevier papers, business is no longer "as usual" and consequently the relevance of access to readable papers has gained priority.

We need to know if papers known to Wikidata are available and we may have all the papers known at UnpayWall but as long as we do not indicate availability, it is irrelevant.

We need to make it easy for scientists to gain a presence for their science. At this time there are too many hoops to jump through. We can make it easy by putting scientists in the driving seat.
  1. Make an Orcid identifier for yourself and open up the data
  2. Enable common sense organisations like "your" university, CrossRef to update your profile
  3. Have a button that runs a SourceMD process importing the data into Wikidata.
  4. Enjoy and improve on your Scholia
By enabling people to update their data and the data of others, you create value. When we run the API of Unpaywall as part of the SourceMD process, we help USC and we help the rest of the world that is facing the insurmountably obstacle that exists because of the likes of Elsevier. Science becomes easier, scientists gain relevance for their science and Wikidata establishes another purpose.

NB Wikipedia gains as a fringe benefit an objective criteria for the establishment of notability (it is in the science, the Scholia)
Thanks,
      GerardM

Thursday, July 18, 2019

@librarycongress and @gndnet link to @OCLC's #Viaf and beyond

In 2015 it was news that in VIAF, Wikipedia was replaced by Wikidata. In quick succession it was recently announced that both the American Library of Congress and the German Deutsche National Bibliothek announced that they are linking to Wikidata.

That is awesome enough. Awesome because as a result, Wikidata is easier to link to VIAF as every entry of the LoC and DNB results in a VIAF registration. The only thing needed to make this a reality in Wikidata is a dedicated bot for us to know all the good work done in the US and Germany.

Another relevant improvement that is of particular relevance to scientists like linguists is that it is now possible to authorise the GND to automatically update the ORCiD record. It will be truly awesome when this is the example other authorities follow.

It is a small step for Viaf to include ORCiD as it links to other scientific publications. For librarians and library systems this is most relevant. For Wikidata it will help with disambiguation and it allows us to populate our information with even more papers and co-authors.
Thanks,
     GerardM

Wednesday, July 03, 2019

Hey @GlobalYAcademy this blogpost is for you

I am adding members of the Global Young Academy to Wikidata. This was requested on Twitter and I was asked to describe the process how they are added. With 100 members added, it is high time to take the time for this.
In this blog many of the pointers are for Matthew Levy, all edits are done in Wikidata itself.. This is the item for Mr Levy.
Thanks,
      GerardM

Saturday, June 22, 2019

Bulk uploads linked to @ORCID_Org and others, then what

Bulk uploads to @Wikidata happen all the time, for instance the latest medical publications. They result in links to existing scholars and new authors. The question: "then what" was raised on Twitter and in the question was the assumption of a quantitative reply.

When such data is imported in Wikidata it does not fall into a vacuum. Many notable scientists are already known because they have a Wikipedia article and because they are linked to "authorities" like ORCiD, VIAF, Google Scholar and many others. The result is a "Scholia" for a scholar and it includes all the known papers, the co-authors, dates of awards. This is one example of a scholar without a Wikipedia article.

Scholia is a very important tool as it enables more work on scholars. The display of co-authors for instance show their gender. Orange for women, blue for men and white when it is not known. Many people are involved in "Women in Red" writing new articles about lady scientists. On the project page of Women in Red you will find lists that are the result of queries run on Wikidata. This is why adding gender info is so important. Notability may be inferred from the awards people received, notability gains relevance when it does not stand alone. This is why a link to "authorities" establish the necessary notability for a Wikipedia article. Objectively this is best presented in a Scholia like the example of Elizabeth Barrett-Connor.

When attention is given to a scholar like Mrs Barrett-Conor, arguably the "ungendered" scholars are relatively new to Wikidata and typically incomplete. There is a tool for that; SourceMD adds missing papers and links to existing papers. It also adds links to known authors and adds missing authors. The effect is a network of information that is increasingly rich. Arguably this is a bulk upload in its own right but the origin is a different one.

Presentations on topics like awards, organisations, topics and much more are available from the Scholia tool. In such a presentation it shows what we have and given that Wikidata is a wiki, there is more to know. Award winners may be enriched with authority information, they may be linked to papers. Frequent publishers to a topic may have co-authors that could do with some TLC.

In answer to the original question; bulk uploads invite additional work, the data is enriched and becomes increasingly relevant.
Thanks,
       GerardM

Saturday, June 15, 2019

@Wikipedia - could give a clue to #deleted articles

Even deleted Wikipedia articles have "false friends". In a list of award winners a Mr Markku Laakso used to have an article. This Mr Laakso was actually a conductor and not the diabetes researcher the list was there for. For whatever reason, the article for the conductor was deemed to be not notable and it was deleted.

When you are NOT a Wikipedia admin, there is no way to know what was deleted.

One solution is for all blue, red and black links to refer to Wikidata items. When an article is deleted, the Wikidata item is still there making it easy to prevent cases of mistaken identity like with Mr and Mr Laakso.

A more expanded proposal you may find here.
Thanks,
     GerardM

Monday, June 10, 2019

@Wikipedia: #notability versus #relevance

I had a good discussion with imho a deletionist Wikipedia admin. For me the biggest take away was how notability is in the way of relevance.

With statements made like: "There are only two options, one is that the same standards apply, and the other is the perpetuation of prejudice" and "I view our decisions of notability as primarily subjective--decisions based on individual values and understandings of what WP should be like" no/little room is given for contrary points of view.

Notability has as its problem that it enables such a personal POV while relevance is about what others want to read. For a professor Bart O. Roep there is no article. Given two relevant diabetes related awards he should be notable and as he started a human study for a vaccine for diabetes type 1, he should be extremely relevant.

A personal POV ignoring the science that is in the news has its dangers. It is easy enough for Wikimedians to learn about scientific credentials, the papers are there to read but what we write is not for us but for our public. Withholding articles opens our public up to fake facts and fake science. An article about Mr Roep is therefore relevant and timely particularly because people die as they cannot afford their insulin. Articles about the best of what science has to bring about diabetes now is of extreme relevance.

At Wikidata, there is no notability issue. Given the relevance of diabetes all that is needed is to concentrate effort for a few days on a subject. New authors and papers are connected to what we already have, genders are added to authors (to document the gender ratio) and as a result more objective facts available for the subjective Wikipedia admins to consider, particularly when they accept tooling like Scholia to open up the available data.
Thanks,
      GerardM

Sunday, June 09, 2019

#Wikidata - Exposing #Diabetes #Research

People die of diabetes when they cannot afford their insulin. There is not much that I can do about it but I can work in Wikidata on the scholars, the awards, the papers that are published that have to do with diabetes. The Wikidata tools that are important in this are: Reasonator, Scholia and SourceMD and the ORCiD, Google Scholar and VIAF websites prove themselves to be essential as well.

One way to stay focused is by concentrating on awards and, at this time it is the Minkowski Prize, it is conferred by the European Association for the Study of Diabetes. The list of award winners was already complete so I concentrated on their papers and co-authors. The first thing to do is to check if there is an ORCiD identifier and if that ORCiD identifier is already known in Wikidata, I found that it often is and merges of Wikidata items may follow. I then submit a SourceMD job to update that author and its co-authors.

The next (manual) step is about gender ratios. Scholia includes a graphic representation of co-authors and for all the "white" ones no gender has been entered. The process is as follows: when the gender is "obvious", it is just added. For an "Andrea" you look them up in Google and add what you think you see. When a name is given as "A. Winkowsky", you check ORCiD for a full name and iterate the process.

Once the SourceMD job is done, chances are that you have to start the gender process again because of new co-authors. Thomas Yates is a good example of a new co-author, already with a sizable amount of papers (95) to his name but not complete (417). Thomas is a "male".

What I achieve is an increasingly rich coverage of everything related to diabetes. The checks and balances ensure a high quality. And as more data is included in Wikidata, people who query will gain a better result.

What I personally do NOT do is add authors without an ORCiD identifier. It takes much more effort and chances of getting it wrong make it unattractive as well. In addition, I care for science but when people are not "Open" about their work I am quite happy for their colleagues to get the recognition they deserve.
Thanks,
      GerardM

Thursday, June 06, 2019

Perspectives on #references, #citations

Wikipedia articles, scientific papers and some books have them: citations. Depending on your outlook, citations serve a different purpose. They exist to prove a point or to enable further reading. These differing purposes are not without friction.

In science, it makes sense to cite the original research establishing a fact. Important because when such a fact is retracted, the whole chain of citing papers may need to be reconsidered. In a Wikipedia article it is imho a bit different. For many people references are next level reading material and therefore a well written text expanding on the article is to be preferred, it helps bring things together.

When you consider the points made in a book to be important, like the (many) points made in Superior, the book by Angela Saini, you can expand the Wikidata item for the book by including its citations. It is one way to underline a point because those who seek such information will find a lot of additional reading and confirmation for the points made.

Adding citations in Wikidata often means that the sources and its authors are to be introduced. It takes some doing and by adding DOI, ORCiD, VIAF, and or Google Scholar data it is easy to make future connections. When you care to add citations to this book with me, this is my project page.
Thanks,
     GerardM

Sunday, May 26, 2019

Be #Excellent - science in Europe / Holland - #wetenschapper2030

At the "Wetenschap 2030 / Evolutie of revolutie" conference in The Hague it was all about excellence. Confusion set in when the question was raised: what is excellence.

Two big Dutch science funding organisations invited young scientist to consider scientific practice in 2030. It was a great gathering, more than half of the public were women, one prominent speaker told them she had two children and we were reminded that a more diverse team is a more successful team as shown in a recent paper.

One of the introductions set the tone. In science itself there is no room for all of us. It would take exponential funding, funding that is not available. In a few panels the subject of the daily practice was discussed and indeed, it is cut throat. Many people do not share expertise, results; there is no common good, everything to win in a rat race to move up the ladder towards tenure. Some say these practices are on the way out, others indicate that it depends on the field you research.

And then there is this guy from Europe who says, only excellence will get you funding from the EU..

What helps; scientists that may indicate what their primary concern is to be, what they want to be evaluated on; education, research.. Counter intuitively, such focused reviews have the effect that results outside the track benefit as well. In this way it is the university itself who finds excellence and improves its processes.

Another perspective on excellence; what value does research offer. Not to the scholars, nor the universities but to the ones bearing the burden of the costs and expect results. Are these results truly the best that can be achieved, does it reflect cooperation, are the numbers reproducible and the papers readable. For Europe to fund; the proposal must have merit.

My take away message; all those scientists that do not collaborate, back stab and think it acceptable because that is the way it is done, they do not deserve my taxeuro going forward towards 2030. The good news; thanks to Open Science and everything related change is underway but we are not there yet. Important: it does not follow that getting funding is a sign of quality, there is too little money to go around for all proposals with merit.
Thanks,
       GerardM

Sunday, May 19, 2019

#Scholia: on the "requirement" of completeness

Scholia, the presentation of scholarly information on authors, papers, universities, awards et al is at this time not included in the "Authority control" part of a Wikipedia article. The reason I understand is because Wikipedians "that matter" insist that its information is to be complete.

That is imho utter balderdash.

The first argument is the Wiki principle itself. Things do not need to be complete, in the Wiki world it is all about the work that is underway. The second is in the information that it provides: its information is arguably superior to what a Wikipedia article provides on the corpus of papers written by an author. The third is that with the prospect of all references of all Wikipedias ending up in Wikidata, value is added when a paper can be seen in relation to its authors and citations. It matters when it is known what citations a paper is said to support. It matters that we know the papers that are retracted. The fourth argument is in the  maths of it all; typically scientific papers have multiple authors. It takes only one author with an ORCiD identifier to get its papers included. The other authors have not been open about their work, it is their own doing why they are not known in the most read corpus on the planet. They still exist but as "author strings". When a kind soul wants to remove them from obscurity they can.

As to the "Katie Bouman"s among them? There are many fine people that are equally deserving, that have not been recognised yet for their relevance. Fine people that have a public ORCiD record. For them it is feasible to have their Scholia ready when they are recognised. For the others, well it is not a Pokemon game, it is a Wiki.
Thanks,
      GerardM

Sunday, May 12, 2019

@Wikidata Women in science - Lesley Wyborn

For Lesley Wyborn a Wikipedia article exists. She "built an international reputation for innovative leadership in geoinformatics and global e-research, particularly in the geoscience area" according to the motivation for the "Outstanding Contributions in Geoinformatics" award. Notability, no issue.

When the article was written in 2016, no attention was given to the "authority control" and consequently in 2018 an additional item was created with an ORCID identifier. In 2019 additional work was done and the two items were merged. A Google Scholar identifier and the award was added potentially addressing the issues raised on the Wikipedia article.

Arguably both the Wikidata and the Wikipedia information could be more informative. However, given that both are Wikis that is quite acceptable. It is quite likely that many more papers are already on Wikidata and just need attribution. That is something for others to do.. we are a community remember.
Thanks,
     GerardM

Thursday, May 09, 2019

How and when I trust science, when would you trust science?

When something drops on my head, it is gravity that brings it down. When I travel to the USA, the shortest route is over Iceland, the world is round. I did not get polio, measles or whooping cough, my parents had me vaccinated. I worked in computing, most of the women were better then the men, my observation, I am happy working for women.

When I read articles in Wikipedia, I know that I can trust it up to a certain level because there are citations indicating that something is true or that a given opinion is held. Its neutral point of view means that equal weight is to be given to opinions but not when it flies in the face of proven facts, the science about a subject. The best news; when scientific papers are retracted, we start to know about this and act upon it in Wikipedia. The nonsense, the preconceptions, the paid for science is to be removed once it is retracted.

In the Netherlands a prominent scientist has been tasked to root out those medical practices that are proven not to work. His work will be hard he will have to deal with vested interests, ingrained practices and a public that wants everything to be as expected. People will still be vaccinated, some medications will no longer be available, some treatments will not be there, they do not work even when you are desperate for them to work..

That is me, now you, when can you trust.. Well it is good to be wary, just consider the numbers. When a politician says he was effective because many drug dealers went to jail, ask yourself why should they be in jail, did your community end up safer? If not, not much was achieved. When scientific papers show the numbers of junks go down when substance dependence is treated as a medical and not as a criminal issue. Wonder what this meant for the communities these people come from. Seek out the numbers and you are no longer talking politics but considering the science of it.

A lot of so called science defends points of view that do not fit facts on the ground. This can be tricky/tough to understand because the difference may be local versus global. World wide, temperatures go up. Our climate is no longer stable and yes, in the USA it has been cold lately.. not so in Europe, Africa, Asia. One thing to consider, is it truly science, peer reviewed and everything or is it to shore up a point of view.. A tell tale is when it is from a "research institute" / "policy institute" paid for by an interested party.
Thanks,
      GerardM

Tuesday, April 23, 2019

Scopus is "off side"

At Wikidata we have all kinds of identifiers for all kinds of subjects. All of them aim to provide unique identifiers and the value of Wikidata is that it brings them together; allowing to combine the information of multiple sources about the same subject.

Scientists may have a Scopus identifier. In Wikidata Scopus is very much a second rate system because to learn what identifiers goes with what people requires jumping through proprietary hoops. Scopus is the pay wall, it has its own advertising budget and consequently it does not need the effort of me and volunteers like me to put the spotlight on the science it holds for ransom. When we come across Scopus identifiers we  include them but Scopus identifiers are second class citizens.

At Wikipedia we have been blind sighted by scientists who gained awards, became instant sensations because of their accomplishments. For me this is largely the effect of us not knowing who they are, their work. Thanks to ORCiD, we increasingly know about more and more scientists and their work. When we don't know of them, when their work is hidden from the real world, I don't mind. When we know about them and their work in Wikidata it is different. It is when we could/should know their notability.
Thanks,
      GerardM

Sunday, April 14, 2019

The Bandwidth of Katie Bouman

First things first, yes, many people were involved in everything it took to make the picture of a black hole. However, the reason why it is justified that Katie Bouman is the face of this scientific novelty is because she developed the algorithms needed to distill the image from the data. To give you a clue about the magnitude of the problem she solved; the data was physically shipped on hard drives from multiple observatories. For big science, the Internet often cannot cope.

There are eternal arguments why people are notable in Wikipedia. For a lot of that knowledge a static environment like Wikipedia is not appropriate and this environment is causing a lot of those arguments. To come back to Katie, eh every scientist, their work is collaborative and much of it is condensed into "scientific papers". One of the black hole papers is "First M87 Event Horizon Telescope Results. I. The Shadow of the Supermassive Black Hole". There are many authors to this paper not only "Katherine L. Bouman". When a major event like a first picture of a black hole is added, it is understandable that a paper like this is at first attributed to a single author..

Wikimedia projects have to deal with the ramifications of science for many reasons. The most obvious one is that papers are used for citations. To do this properly, it is science who defines what is written and not selected papers to support an opinion. The public is invited to read these papers and the current Wikipedia narrative is in the single papers, single points of view. This makes some sense because the presentation is static. In Wikidata the papers on any given topic are continuously expanded, the same needs to be true for papers by any given author. Technically a Wikipedia could use Wikidata as the source for publications on a subject or by an author. The author could be Katie Bouman and proper presentations make it obvious that the pictures of a black hole were a group effort with Katie responsible for the algorithms.
Thanks,
       GerardM

Tuesday, April 09, 2019

@Wikidata is no relational #database

When you consider the functionality of Wikidata, it is important to appreciate it is not a relational database. As a consequence there is no implicit way to enforce restrictions. Emulating relational restrictions fail because it is not possible to check in real time what it is that is to be restricted.

An example: in a process new items are created when there is no item available with an external identifier. Query indicates that there is no item in existence and a new item is created. A few moments later the existence of an item with the same external identifier is checked using query. Because of the time lag that exists, what is known to be in the database and what actually is in the database differs and query indicates there is no item and a new but duplicate item is created.

Implications are important.

Wikidata is a wiki. The implications are quite different. In a wiki things need not be perfect, and the restrictions of a relational model are in essence recommendations only. In such a model duplicate items as described above are not a real problem, batch jobs may merge these items when they occur often enough. Processes may use arrays knowing the items it created earlier and thereby minimising the issue.

Important is that we do not blame people for what Wikidata is not and accept its limitations. Functionality like SourceMD enable what Wikidata may become; a link to all knowledge. Never mind if it is knowledge in Wikipedia articles, scholarly articles or in sources used to prove whatever point.
Thanks,
      GerardM

Sunday, March 24, 2019

#Sharing in the Sum of all #Knowledge from a @Wikimedia perspective II

When we are to share in the "sum of all knowledge" we share what we know about subjects; articles, pictures, data. We may share what knowledge we have, what others have and that is what it takes  for us to share in the sum of all knowledge. The question is why should we share all this, how to go about it and finally how will it benefit our public and how will it help us share the sum of all knowledge.

At the moment we do not really know what people are looking for. One reason is that search engines like the ones by Google, Microsoft and DuckDuckGo recommend Wikipedia articles and as a consequence the search process is hidden from us. We do not know what people really are looking for. However, some people prefer the "Wikipedia search engine" in their browser. We can do better and present more interesting search results. From a statistical point of view, we do not need big numbers to gain significant results.

When we check what the "competition" does we find their results in many tabs; "the web" and "images" are the first two. The first is text based and offers whatever there is on the web. What we will bring is whatever we and organisations we partner with, have to offer. It will be centered on subjects and its associated factoids presented in any language.

One template to consider is how Scholia presents. It differs. It depends on whether it is a publication, a university, a scholar, a paper. Large numbers make specific presentations feasible and thanks to Wikidata we know what kind of presentation fits a particular subject. A similar approach is possible for sports, politics. It takes experimentation and that is what makes it a Wiki approach.

Thanks to this subject based approach, language plays a different role. Vital is that for finding the subjects potentially differing labels are available or become available. One important difference with the Google, Microsoft or DuckDuckGo approach is that as a Wiki, we can ask people to add labels and missing statements. This will make our subject based data better understood in the languages people support. Yes, we can ask people to have a Wikimedia profile and yes, we may ask people to support us where we think people looking for information have to overcome hurdles.
Thanks,
       GerardM

Saturday, March 16, 2019

#Sharing in the Sum of all #Knowledge from a @Wikimedia perspective I

Sharing the sum of all knowledge is what we have always aimed for in our movement. In Commons we have realised a project that illustrates all Wikimedia projects and in Wikidata we have realised a project that links all Wikimedia projects and more.

When we tell the world about the most popular articles in Wikipedia, it is important to realise that we do not inform what the most popular subjects are. We could, but so far we don't. The most popular subjects is the sum of all traffic of all Wikipedia articles on the same subject. Providing this data is feasible; it is a "big data" question.

We do have accumulated data for the traffic of articles on all Wikipedias, we can link the articles to the Wikidata items. What follows is simple arithmetic. Powerful because it will show that English Wikipedia is less than fifty percent of all traffic. That will help make the existing bias for English Wikipedia and its subjects visible particularly because it will be possible to answer a question like: "What are the most popular subjects that do not have an article in English?" and compare those to popular diversity articles.

In Wikidata we know about the subjects of all Wikipedias but it too is very much a project based on English. That is a pity when Wikidata is to be the tool that helps us find what subjects people are looking for that are missing in a Wikipedia. For some there is an extension to the search functionality that helps finding information. It uses Wikidata and it supports automated descriptions.

Now consider that this tool is available on every Wikipedia. We would share more information.With some tinkering, we would know what is missing where. There are other opportunities; we could ask logged in users to help by adding labels for their language to improve Wikidata. When Wikidata does not include the missing information, we could ask them to add a Wikidata item and additional statements, a description to improve our search results.

This data approach is based on the result of a process; the negative results of our own Search and it is based on active cooperation of our users. At the same time, we accumulate negative results of search where there has been no interaction, link it to Wikidata labels and gain an understanding of the relevance of these missing articles. This fits in nicely with the marketing approach to "what it is that people want to read in a Wikipedia".
Thanks,
      GerardM

Saturday, March 09, 2019

A #marketing approach to "what it is that people want to read in a @Wikipedia"

All the time people want to read articles in a Wikipedia, articles that are not there. For some Wikipedias that is obvious because there is so little and, based on what people read in other Wikipedias, recommendations have been made suggesting what would generate new readers.This has been the approach so far; a quite reasonable approach.

This approach does not consider cultural differences, it does not consider what is topical in a given "market". To find an answer to the question: what do people want to read, there are several strategies. One is what researchers do: they ask panels, write papers and once it is done there is a position to act upon. There are drawbacks; 
  • you can only research so many Wikipedias
  • for all the other Wikipedias there is no attention
  • the composition of the panels is problematic particularly when they are self selecting
  • there are no results while the research is being done
The objective of a marketing approach is centered around two questions: 
  • what is it that people are looking for now (and cannot find) 
  • what can be done to fulfill that demand now
The data needed for this approach; negative search results. People search for subjects all the time and there are all kinds of reasons why they do not find what they are looking for.. Spelling, disambiguation and nothing to find are all perfectly fine reasons for a no show. 

The "nothing to find" scenario is obvious; when it is sought often, we want an article. Exposing a list of missing articles is one motivator for people to write. Once they have written, we do have the data of how often an article was read. When the most popular new articles of the last month are shown, it is vindication for authors to have written popular articles. It is easy, obvious and it should be part of the data Wikimedia Foundation already collects.. In this way the data is put to use. It is also quite FAIR to make this data available. 

For the "disambiguation" issue, Wikidata may come to the rescue. It knows what is there and, it is easy enough to add items with the same name for disambiguation purposes. Combine this with automated descriptions and all that is requires is a user interface to guide people to what they are looking for. When there is "only" a Wikidata item, it follows that its results feature in the "no article" category.

The "spelling" issue is just a variation on a theme. Wikidata does allow for multiple labels. The search results may use of them as well. Common spelling errors are also a big part of the problem. With a bit of ingenuity it is not much of a problem either.

Marketing this marketing approach should not be hard. It just requires people to accept what is staring them in the face. It is easy to implement, it works for all the 280+ language and it is likely to give a boost to all the other Wikipedias but also to Wikidata.
Thanks,
        GerardM

Sunday, February 17, 2019

@WikiResearch - Nihil de nobis, sine nobis

There is this wonderful notion how Research is going to tell us what to do in light of the strategic Wikimedia 2030 plans. Wonderful. There is going to be this taxonomy of the information we are missing.

Let me be clear. We do need research and the data it is based on, it is to be available to us. There is no point in a future taxonomy of missing knowledge when we have been asking for decades : "what articles are people looking for that they cannot find". If there is to be a taxonomy what else should it be based on?

When we are to fill in the gaps of what Wikipedia covers, we can stimulate more new articles by indicating what traffic they get in the first month. Stimulate our readers to learn more by showing what Wikidata has to offer and show its links to texts in other languages. It may even result in new stubs even articles in "their" language. This technology has been available for years now.

The WikiResearch is full of arguments on the importance of citations and Wikidata as the platform for all Wikipedia sources, why then are the WikiResearch papers not in Wikidata from the start. What is it, that WikiResearchers consider that Wikidata is not about them? Just as it is about any other subject Wikidata covers? What is it that makes their work less findable (FAIR) than what is known to have been published as open content by the NIH?

The point I want to make is that no matter how well intended it is what the WikiResearch aims to achieve, they lose the interest, involvement and commitment of people like me, the people they need to get the results they aim for.

Yes do research, but we should not wait for its results, we know how to stimulate people to write new articles.
Thanks,
      GerardM

Sunday, February 10, 2019

#Wikidata - A quick and dirty "HowTo" to improve exposure of a subject in Wikidata

When you want to expose a particular subject, any subject, in Wikidata. This is the quick and dirty way to expose much of what there is to know. There are a few caveats. The first is that the aim is not to be complete, the second that it is biased towards scientists who are open about their work at ORCiD.

You start with a paper, a scientist. They have an DOI / ORCiD identifier and, they may already be in Wikidata. First there is the discovery process of the available literature and the authors involved. The SourceMD tool is key; with a SPARQL query or with a QID per line, you run a process that will update publications by adding missing authors or it will add missing publications and missing authors to known publications.

When you treat this as an iterative process, more authors and publications become known. When you run the same process for (new) co-authors, more publications and authors become known that are relevant to your subject.

To review your progress, you use Scholia. it has multiple modes that help you gain an understanding of authors, papers, subjects, publications, institutions.. You will see the details evolve. NB mind the lag Wikidata takes to update its database. It is not instant gratification.

A few observations, your aim may be to be "complete" but publications are added all the time and the same is true for scientists. People increasingly turn to ORCiD for a persistent identifier for their work. The real science is in designating a subject to a paper. Arguably the subject may be in the name of the article but as an approach it is a bit coarse. I leave that to you as your involvement makes you a subject "specialist".
Thanks,
       GerardM

Tuesday, February 05, 2019

#Wikidata - Naomi Ellemers and the relevance of #Awards

In a 2016 blogpost, I mentioned the relevance of awards. At the time Professor Ellemers received an award and it was the vehicle to make that point in the story.

Today in an article in a national newspaper, Mrs Ellemers makes a strong point that the perception of awards is really poblematic. What they do is reinforce a bias that American science is superior. It leads to a perception by European students that it is the USA "where it is all happening". A perception that Mrs Ellemers argues is incorrect.

NB Mrs Ellemers is the recipient of the 2018 Career Contribution Award of the Society for Personality and Social Psychology.

Wikidata re-inforces this bias for American science by including a rating for "science awards". This rating values awards by comparing them. This rating is done by an American organisation and the whole notion behind it is suspect because the assumptions are not necessarily / not at all beneficial for the practice of science.

How to counter such a bias? As far as I am concerned there is no value in making a distinction between awards and "science awards" and biased information like this should be removed. Just consider, when European science is considered less than American science... how would  African science be rated?
Thanks,
     GerardM

Sunday, February 03, 2019

Dr Matshidiso Moeti, an exeption to my rules

When I add scientists to Wikidata, I really want something to link to, an external source like ORCID, Google Scholar Viaf.. When I link publications it is the data at ORCID I link to, I don't do manual linking.

From the sources I have read, Dr Moeti is the kind of person who deserves a Wikipedia article. Her work and the people she works with, the cases she works not only deserve recognition it is imho vitally important that they do, that you learn about them. This is why I made exceptions to my rule.

This is her Scholia, this is her Reasonator and please, take an interest.
Thanks,
      GerardM

The case for #Wikimedia Foundation as an #ORCID member organisation

The Wikimedia Foundation is a research organisation. No two ways about it; it has its own researchers that not only perform research on the Wikimedia projects and communities, they coordinate research on Wikimedia projects and communities and it produces its own publications. As such it qualifies to become an ORCID Member organisation.

The benefits are:
  • Authenticating ORCID iDs of individuals using the ORCID API to ensure that researchers are correctly identified in your systems
  • Displaying iDs to signal to researchers that your systems support the use of ORCID
  • Connecting information about affiliations and contributions to ORCID records, creating trusted assertions and enabling researchers to easily provide validated information to systems and profiles they use
  • Collecting information from ORCID records to fill in forms, save researchers time, and support research reporting
  • Synchronizing between research information systems to improve reporting speed and accuracy and reduce data entry burden for researchers and administrators alike
At this time the quality of information about Wikimedia research is hardly satisfactory. As is the standard; announcements are made about a new paper and as can be expected the paper is not in Wikidata. The three authors are not in ORCID, as is usual for people who work in the field of computing so there is no easy way to learn about their publications.

What will this achieve; it will be the Wikimedia Foundation itself that will push information about its research to ORCID and consequently at Wikidata we can easily update the latest and greatest. It is also an important step for documentation about becoming discoverable. It is one thing to publish Open Content, when it is then hard to find, it is still not FAIR and the research does not have the hoped for impact. It also removes an issue that some researchers say they face; they cannot publish about themselves on Wikimedia projects. 

Another important plus; by indicating the importance of having scholarly papers known in ORCID we help reluctant scientists understand that yes, they have a career in open source, open systems but finding their work is very much needed to be truly open.
Thanks,
       GerardM

Sunday, January 27, 2019

@Wikidata #quality - one example: Leonardo Quisumbing

Quality happens on many levels. Judge Leonardo Quisumbing passed away and a lot of well meant effort went into his Wikidata item.  The data is inconsistent with our current practice so in the Wikidata chat people were asked to help fix the data.

Judge Quisumbing held many positions, one of them was "Secretary of Labor and Employment". This is a cabinet position and it follows that Mr Quisumbing was also a "politician". It is one thing to include this position and occupation to a person, from a quality point of view it is best to include a "start date" a "replaces" an "end date" and a "replaced by". The problem: the predecessor and successor do not exist in Wikidata.

Many a secretary of Labor do have a Wikipedia article and they are included in a category. Using the "Petscan" tool it is easy to import all those mentioned. Typically the quality of the info is good however there is always the "six percent" error rate. Indeed one person was erroneously indicated as a "secretary of labor".  The problem is that people who only care about quality on the item level are really hostile to such imported issues. They are best ignored for their ignorance/arrogance.

A next level of quality is to complete the list with all missing secretaries. This can be done warts and all from the Wikipedia article. It results in a Reasonator page that includes all the red and black links of the article. Many new items are created in the process and having automated descriptions are vital in finding as many matches as possible.

Judge Quisumbing became an "Associate Justice of the Supreme Court of the Philippines" and became the senior associate justice in 2007. Adding associate judges from a category was obvious, adding senior associated judges is a task similar to secretaries of labor. However, a senior is the first among the many and consequently it requires a judgment call on how to express this.

Given that Wikidata is a wiki, you do the best you can to the level that has your interest. There is still a need to improve the Wikidata item for judge Quisumbing but that is for someone else.

Thanks,
       GerardM