Friday, December 27, 2019

The value of incomplete data - Fellows of the Ecological Society of America

This is about understanding data in Wikidata. The article is about understanding what you can and cannot do with incomplete data, it is not so much about the Ecological Society of America.

The most recent work started with the news of a new Wikipedia article. Prof Cottingham is a 2015 fellow of the esa, there is a category for fellows, adding her and other missing fellows to Wikidata showed that for one fellow there was no Wikipedia article. At the time there were 90 known fellows and for only two it was known when they became a member.

I expected that new fellows would be known to Wikidata not just as an "author string" but that they would be an "item". So I added 14 of the 2019 cohort and found this not to be the case. I then looked up the known fellows from the esa webpage, added their date to Wikidata because I wondered if it were particularly the older fellows that are represented in Wikipedia.

While adding the dates, I added many alternate names to aid disambiguation, I removed one item and found two false friends; fathers mistaken for their son. When I was done, I had a good impression of the data on the website and even though I do not have the full numbers, I feel to be correct in my belief that it is the old ecology/ecologists that are represented in Wikipedia.

When you scrutinize the list of fellows, you will find included "Early Career Fellows", they are "elected for advancing the science of ecology and showing promise for continuing contributions" and they take part for a limited amount of time. Programs like these are known from all over the world and from many science orgs. This time I did not spend time on them but from previous experience I can safely say that promising is putting it mildly.

Wikidata is a wiki and as such, the work that I did is of value even though it is incomplete. I did not add all the missing fellows for instance. The esa is very much an organisation for America (check the employment of its fellows) and it takes pride in global attention and solicits membership fees from all over the world. It takes a lot of additional data when you want to compare if its subject matter is biased towards America and in what way.

For many of the fellows I added, there are papers with "author strings" waiting to be linked to an author. The same can be said for the fellows that are still missing. It can be compared to other ecological organisations but how to deal with the differences takes a completely different understanding. It takes more data to make this possible but the data does not need to be complete, that is the beauty of averages.
Thanks,
       GerardM

Thursday, December 26, 2019

Why didn’t @Wikidata have an item on Margaret Nakakeeto, a champion for living babies?

Ed Erhard wrote famously in 2018 "Why didn’t Wikipedia have an article on Donna Strickland, winner of a Nobel Prize?" A year later we can say that it is extremely likely that a Donna Strickland, a Margaret Nakakeeto are known in Wikidata if only because they are a co-author of a paper (technically: an "author string").

When Ed wrote his article, it was to highlight the gender gap we have in Wikipedia. Arguably relevant and important and it needs the attention it gets. However, it does not follow that it is the only "gap" that needs addressing, it even does not follow that the gender gap is the gap with the biggest impact.

When you consider Africa and particularly science in Africa, the subjects that matter in Africa most are reflected in for instance the Scholia for the members of the South African Academy of Science. As far as I now know, its gender ratio is 27% and this is a list with a mix of Wikipedia articles and Wikidata items. It shows the attention African science gets in Wikipedia nicely.

In Africa there is a huge amount of attention for maternal and neonatal care (eg Uganda) and as programs impact the health and survival of women, it follows that more women will become notable, notable for Wikipedia.

By giving attention to female African scientists, the subjects they are known for gain relevance. Their Scholias are developed, including links to co-authors and papers. It will improve the likelihood that when African science awards are announced, we will at least know the recipients in Wikidata.
Thanks,
       GerardM

Saturday, December 21, 2019

#Science and America first

Several US American science organisations are quite adamant that for them, it is America first. Stupidity has its place and these days the United States has a lot of it particularly as those same science organisations expect people from the rest of the world to accept "pre-eminence" of the USA.

There may be good reasons to be a member of these organisations but from my perspective, it is one thing to be with stupid, it is another to have these organisations argue their case on "your" behalf. So when you are a scientist, chances are that we already know you at Wikidata. We may even know about your science, your co-authors, your memberships.

Take for instance Prof Lise Korsten, she is probably South African, this is her Scholia. She has many co-authors and for some we do not know their gender and for most we do not know their nationality. We do not know if she is a member of any science organisation and we do not know that for her co-authors either. So you may add your professional memberships at Wikidata, your nationality and when you do know the nationality of your co-authors, you may add that as well.

In this way we make obvious to US American stupid that science is global.
Thanks,
       GerardM

Thursday, December 12, 2019

Disseminate science says @EstherNgumbi, @Wikimedia projects have the power to do just that

In this day and age science is of the utmost importance. When I am pointed to a conference where an African scientist gives the plenary lecture; the message is on display in the picture. I take an interest.

When you want to disseminate research, when you want the science to be known by society, you have to pick your platform. You can do worse than choosing for the Wikimedia projects.

Professor Esther Ngumbi is employed at the University of Illinois at Urbana–Champaign. Her ORCiD profile has only one paper but at Wikidata we knew of others. As she is now known at Wikidata with her papers, she has a Scholia. At first there was only one co-author, a bit sparse, so others were added. They were linked to the papers they have on Wikidata. The same was done for some authors who cited professor Ngumbi..

When you, your science is known in Wikidata, you are more likely to get a Wikipedia article and yes, working for an American university helps. An ORCiD profile that is open will be even more potent when you trust organisations like your university, CrossRef to update your ORCiD when it knows about your papers, your new papers.

In this day and age where our ecology is no longer stable, it is vital to know and respect the science. While we aim for the best we have to be prepared for the worst; we have to see it coming. It is why our Wikimedia projects should inform about all the science and not just what a Wikipedia article has as a reference.
Thanks,
       GerardM

Wednesday, December 11, 2019

Jack needs help, so do we and, so do our audiences

Jack penciled his aspirations for Twitter in a tweet. In it he states: "... Second, the value of social media is shifting away from content hosting and removal, and towards recommendation algorithms directing one’s attention. Unfortunately, these algorithms are typically proprietary, and one can’t choose or build alternatives. Yet."

It is good news that Jack seeks a way out, he intends to hire a "small independent team of up to five open source architects, engineers, and designers" and "Twitter is to become a client of this standard"..

In the Wikimedia projects we have similar challenges and opportunities. We cannot expect for all kinds of reasons that scientists who are very much in the news (aka relevant) there to be a Wikipedia article Dr Tewoldeberhan is a recent example but there is no reason why we cannot have her, her work and the work of any other scientist in Wikidata. With tools like Scholia we already have a significant impact by making more known that just what may be found in a Wikipedia. Jack, we do know many scientists by their Twitter handle, they already make the case for their science on Twitter. This makes it easy for you to link to and expand on Scholia. What we give our readers is more to read so that they can find conformation for what they read.

Jack, Wikidata is not proprietary, Scholia is not proprietary and the Wikimedia motto is "to share in the sum of all knowledge". Together we can shift focus from what we have read before in the Wikipedias to what there is to read on the Internet. Put stuff in context and bring the scientists who care to inform about their science in the limelight.

What we do not have is the pretense that we cover everything well. we do aim to cover everything notable well. What we provide is static, Twitter is much more dynamic and together we will change the landscape. Great technology combined with both the Twitter and Wikimedia communities has the potential of being awesome.
Thanks,
      GerardM

Thursday, December 05, 2019

What is it about Jess Wade?

It is not only that Jess writes Wikipedia articles. Others do as well. It is not only that she engages girls with science; it is why she enthuses about female (STEM) scientists. Others do as well. It is not that only that her tweets engage us with for instance the #PhotoHour, that she wants us to read the (fabulous) books by Angela Saini, she also organises for schools to have Inferior in their library for girls to read and become a scientist as well.. What makes her special is that she engages people to be part of what she communicates so well.

Take me for instance, Jess is on Twitter and I read her daily new article. For the person she writes about I enrich the information on Wikidata and ensure that the "authority control" is set in the Wikipedia article. What I add is award information, authorities, employment and education info. I often add awards and depending on how interesting an award is to me I add other recipients as well.

It is not only me, there are many more people inspired by Jess who get involved, they read the books she champions, donate so that more girls read Inferior, follow her on Twitter, write articles and also get involved, are involved. It all happens because of the enthusiasm that Jess brings to us all. This enthusiasm, the involvement is what I so cherish. When the inevitable naysayers come along it dampens the positivity, the sense that we are making a difference.

When you want to know how important the women she writes about can be, consider Joy Lawn she tweets really effectively as well... It shows how women scientists really effectively communicate the relevance of science. It is vitally important for us to know about the science, the subjects they champion. At that it may be our Jess but actually, it is Dr Jess Wade, she is a scientists first, she promotes science and Wikipedia is a vehicle to get the message out.
Thanks,
        GerardM

Friday, November 29, 2019

It is not a list when it is the result of a query

A list is a presentation of data. When a list is maintained manually, the list IS the data, when the data is the result of a query, it REPRESENTS the data.

The difference is quite important. Changing the information in a query is in the definition of the query, changing the data is a matter of re-running the query. Changing the information in a list is a lot of work and therefore there is no integrity in the data itself, it is always potluck what quality the data is.

In the Wikipedia world, Listeria is king of the queried lists. For some its use is controversial but things are changing for the better. Projects like Women in Red use Listeria a lot, their work is possible because people add notable women in Wikidata. The queries work on the basis of awards, professions, nationality enabling volunteers to write the articles they care to write. This works because once an article is written they are automagically removed from the lists.

On the English Wikipedia consensus has it that manual lists are to be preferred. However, emperically the quality of automated lists perform better {{REF}} and as data in Wikidata does not suffer from "false friends" even the support for "red links" is vastly superior.

There is no point in anecdotal evidence who is best. When the English Wikipedia has a black link for Stephen Fleming on its page for the Spearman medal first, it is an obvious start for a new item on Wikidata that is more than just a person who won the Spearman medal. It then becomes a target for lists of the special interest groups who aim to cover "their" subject matter well.

The next stage of the acceptance of lists relies on the realisation that "consensus" does not serve us well particularly when it trumps established facts. It will serve us well in politics and, in what Wikimedia projects could be.
Thanks,
      GerardM

Wednesday, November 27, 2019

Please let us support #Science at @Wikidata

When the BBC informs us about reforestation in Ethiopia.. It is Dr Tewolde-Berahan who informs BBC's Justin Rowlatt about the work that is done in preparation of planting trees.

It is a humorous piece of information that gets the message across; you can plant where trees were absent for generations and make the (local) climate change.

Consider; you now want to seriously know more about reforestation in Ethiopia. Where do you go to? Wikipedia, in all its magnificence, is rooted in its articles and thereby dated. Through its references however, there are links to its authors, to many more authors and their publications. Every article has in this way its concept cloud and it could be translated in a Scholia for an article.

The current Scholias are itself already a rabbit hole that leads in many directions and a Scholia for an article would be something different again. The article links to subjects, has its papers and by inference authors, they may link to newer papers, more papers, contradicting papers. They may lead to scientists who research similar notions for another locality.. Why not reforest Spain in France? When reforestation is possible in Ethiopia, what would be different to make this unfeasible in Europe?

And all this becomes possible when you consider Wikipedia as the jumping off point in any and all directions, not just within Wikipedia..
Thanks,
     GerardM

NB I know there are two fellows of the Ethiopian Academy of Science related to this subject. Who are they and how are they connected to Dr Tewolde-Berahan?

Thursday, November 14, 2019

@wikidata - I don't scale, help me scale

At Wikidata there is always more to do and as a volunteer you make the biggest impact when you concentrate on specific subjects. I do not scale enough to do everything I would like to do.

There are a few area's where I aim to make a difference; of particular concern is where we do not represent a body of knowledge/information in Wikidata. At this time the favour scientists particularly women, young scientists and scientists from Africa.

To make my work scale, I twitter and blog. I latch on to the great work done by Dr Jess Wade. She writes articles on well deserving scientists and I aim to add value for those scientists on Wikidata. Typically I add professions, alma maters, employers and awards. In addition I add "authorities" like ORCiD, Google Scholar and VIAF. This is important because it enables the linking of scholarly papers already in Wikidata or known at ORCiD. I can more or less keep up with Jess and, I happily add information for any and all scientists I come across on Twitter.

While doing this I learned of the Global Young Academy and as a side project started adding scientists who are member of the GYA or one of affiliated organisations to Wikidata. I am so pleased  I got into contact with Robert Lepenies. Robert is happy with the opportunity that a Scholia provides for an organisation like the GYA, for him and for all the young scientists involved. We collaborated on completing the lists on many wikipedias, Robert added many scientists to Wikidata and is now battling to keep the pictures of these young scientists on Commons...

What is crucially important for me is that Robert advocates an open ORCiD profile to scientists worldwide so that they may have their Scholia. Both Robert and I do not scale and what would help us most is an easy and obvious way that enables any scientists to start a process that will include all his papers from ORCiD, will update the known co-authors and instruct in what they can do to enrich their Wikidata / ORCiD / Scholia profile even more.

I am now working on African scientists and yes, I would appreciate some help.
Thanks,
     GerardM

PS my wife would like this scale to be enough for me

Tuesday, November 12, 2019

Instant gratification at @Wikidata

As I write this, it is 11:46am at 09:26am I added papers to prof Hafida Merzouk. The edits are picked up by Reasonator but not by Scholia. In a similar way, edits done are not picked up by Listeria.

Instant gratification is now a thing of the past, the work done at Wikidata may eventually be picked up in a Scholia or Listeria but it is not funny. Can I tweet about the things I find or have done when Wikidata no longer reflects the relevant changes?

This may sound like trivial but it does mean that when I look back at my work that  there is no longer a timely way to do so.

Instant gratification motivates and it is a factor in maintaining quality. We are losing it.
Thanks,
      GerardM

Saturday, November 09, 2019

Put (modern) #science of #Africa on the map

A young African scholar commented that the info on websites of African scholarly organisations was all about its past. There is a point to recognizing those who did good and consequently making obvious that the science of today is rooted in the past.

African scientists as well as any other scientist have a place in Wikidata with their affiliations, papers, co-authors and also with their scholarly advisors. My proposal is for all scholars to check if they are on Wikidata, check if their doctoral thesis is on Wikidata. Then add their doctoral advisor to their item and reciprocate themselves as a doctoral student.

Do not forget to include where you studied and for what university you work(ed). Check if your ORCiD profile includes trusted organisations like CrossRef that will update your profile when appropriate. When many of you do this at Wikidata you will be surprised what the impact will be.
Thanks,
      GerardM

Friday, November 08, 2019

Bias in @Wikidata and a SMART approach

When at the WikidataCon quality was presented, it was rated from 1 to 5. This approach has its own bias because it does not consider what may not be there. What is not there can be made visible using assumptions like: "a university has more than one employee" (employee includes professors) and, every country has at least one university..

The bias in Wikidata starts with the way it is mostly used and consequently how it is taught. People are shown what Wikidata looks like, immediately followed up with training in the use of query and the use of tools. At every level it takes considerable skills to make a use of Wikidata. The first hurdle to overcome is to understand the data in a single item. When your language is not English you are toast. This is Cape Town in Newari and this is a useful presentation using Reasonator. With Reasonator the information is easy to digest and adding missing labels is just one click away.

The second hurdle is knowing what bias it is you want to remedy. For a known bias like the gender gap, the Women in Red have lists of missing Wikipedia articles. A Wikidata gap is expressed by the absense of data. Listeria lists are great at that.. These are all the universities of Africa.. If you do not get the extend of what we miss, you have some thinking to do. When you apply this principle to the science of Africa, you find a lot of lists and the biggest issue remains; missing lists.

When you tackle a missing subject like I did for the "Affiliates of the African Academy of Sciences", you will find a source as a reference for the group and a reference on every affiliate. To ensure that the data is relevant and actionable, I added all of them, linked them to ORCiD and/or Google Scholar enabling SourceMD to link them to their papers. I added nationality because this may trigger inclusion on the Women in Red lists and when it was obvious, I added employers so that they may be included as a scholar on African University lists..

When we as a movement want to fight bias, we have to consider the use of lists and particularly Listeria list to show the developments of a subject. With lists available on many Wikipedias, it becomes possible to gain traction on what we miss. This approach is distinctly different as it acknowledges the need for more support for item based editing and it makes the point that missing data is a quality issue that needs to be addressed as a fundamental issue.
Thanks,
      GerardM

Thursday, November 07, 2019

@Wikipedia talks about @Wikidata

"WD is unreliable. WP:V and WP:RS are completely ignored (from any editors). International NPOV is a problem too." It is so SMART, that the best I can do is ignore it. Then again it is an open invitation to talk about Wikipedia..  There is no Wikipedia there are over 300 Wikipedia language editions.. so even the acronyms are lost on me as there is no one Wikipedia to rule them all.. 

So forget about acronyms and lets talk Wikidata and by inference raise issues particularly for the English Wikipedia where appropriate. First, Wikidata includes more items than there are subjects raised in any and all Wikipedias. Its quality can be considered in many ways and verifiability is largely ensured because of the association with other "authorities" about a subject. Thanks to the increased use of open data, it is possible to verify that specific statements are shared, increasing the likelihood that they are correct. For some information like for scientists who are a member of the AAS Affiliates Programme, we have/may have references to the authoritative source. Such references may be on a project or on an item level, it makes verifiability easy and obvious. 

Wikidata has an issue with all kinds of gaps in its coverage. For many African countries no universities are known, there are hardly any scholars associated with them. Thanks to Listeria functionality we can monitor if and when data is added. Many a Wikipedia do not have such tools because of the aversion of Wikidata by some. At the same time projects like Women in Red rely on Listeria lists and by inference Wikidata to know what to work on.

In tools like Reasonator and Listeria lists are generated and, when you compare them with Wikipedia lists, the quality is measurably better. I published frequently in the past about the Polk award.. In its lists Wikipedia has a likely error rate of six percent. When they fudge the record by not linking at all, the quality of a Wikidata lists is even better because it is much better at linking items than Wikipedia is at linking red links.  There is a solution, it just requires a willingness by Wikipedians to cooperate. 

I understand what is meant by "international NPOV" and it is where Wikidata is by definition better than an individual Wikipedia. By definition because Wikidata represents data from ALL Wikipedias. Thanks to the people of DBpedia, there is a potential to highlight where Wikipedias differ and it is more likely that the fruit of their labour will enrich Wikidata than Wikipedias.

So a Wikidatan walks into a bar..
Thanks,
       GerardM

Monday, October 21, 2019

Adegoke O. S. - Fellow of the African Academy of Sciences

It is easy enough to add "O.S. Adegoke" to Wikidata and mark him as a fellow of the AAS.  With only initials there is no way to know the gender and to me that is quite unsatisfactory. This is when Google becomes your friend when you find Mr Adegoke is addressed as "Silvester".

There are some 384 fellows and slowly but surely they find their way into Wikidata. If there is a point to it, it is the same point why there are fellows of the African Academy of Sciences; "they provide Advisory and Think Tank functions and help to develop strategies that promote science in Africa and that are relevant to the continent".

The objective of Wikipedia and, by inference Wikidata, is to share in the sum of all knowledge. As we do not really consider what is relevant for our public in Africa and for those interested in Africa the AAS in its choices of its fellows at least points in the right direction. Adding the AAS fellows to Wikidata is a puzzle because the format of names differ. Some 240+ fellows are known at Wikidata as data but for it to become informative there is a need for suplemental data and even better Wikipedia articles.
Thanks,
     GerardM

Saturday, October 05, 2019

Rebecca R. Richards-Kortum

A text on the Internet read: "She’s Rice’s first-ever MacArthur grant winner. But her real claim to fame? Her clever medical inventions might just save your life." It is not as if I know her even though I added to her Wikidata item in the past .

I looked her up because she approves of the NEST360° organisation on Twitter. It is an organisation committed to reducing neonatal mortality in sub-Saharan hospitals by 50 percent.

Such organisations deserve a place in Wikidata, it has members I am adding. I consider it part of my "Africa project" even though it does not have a place there yet.

Yesterday I added an item for "neonatal care" and all the papers that are already included in Wikidata  about neonatal care need to be associated with the subject. Scientists like Prof Joy Lawn are to be marked for their specialty.

How is it possible that it takes a 60 year old white male from the Netherlands to add something this basic to Wikidata. We are talking about more yearly deaths than Ebola..
Thanks,
       GerardM

Tuesday, October 01, 2019

What data is wrangled is obvious when its presentation is considered

When you watch a game, you want to know the score. When you have a favourite author, you want to know all his/her publications and when you hear about a place you want to know where it is. Easy.

Such data may be included in a repository like Wikidata and, in essence the data is still simple. You still want to know the score, the publications or the location, the question is how do you get the data in a format that makes sense.

People are really good at understanding data when it is in an agreeable format.. These are three format for the same data; a scientist in Wikidata. This is how Wikidata presents its data and imho the data is really hard to understand. This is the same data in Reasonator, it is a general purpose tool that shows data and its relations. It can be used for all kinds of data, it is my goto tool to get to grips with data related to one item. Finally Scholia presents data formatted in a way that makes sense for this scientist.

Given how awful the default presentation of Wikidata is, it is obvious why everyone teaching the use of Wikidata focuses on querying the data and therefore people seek/work on the results provided in what is their default tool. I typically focus on particular subjects, today it was Dr Shima Taheri, I added a reference, some publications and genders for her co-authors. To do this I am triggered by the presentation of the data in the tools I use.

The holy grail for Wikidata is the use of its data in Wikipedia info boxes. However, people are taught to query data and that approach does not align well with the data items you find in info boxes. So when the purpose of Wikidata is in Wikipedia info boxes, presentation needs to become a priority.
Thanks,
      GerardM

Thursday, September 26, 2019

The lowest hanging fruit in #DBpedia

What I hate with a vengeange is make work. DBpedia as a project retrieves information from all the Wikipedias, wrangles it into shape and publishes it. In one scenario they have unanimous support from one or more Wikipedias agreeing on the same fact and, they all may have their own references.

We should import such agreeable data without further ado. An additional manual step to import to Wikidata is not smart because manual operations introduce new errors. Arguably when there is no unanimous support manual intervention may improve the quality but given the quantity of the data involved, it means that a lot of data will not become available. THAT in and of itself has a negative impact on the quality of available data as well.

So what to do.. Harvest all the data that is of an acceptable quality, that is the data DBpedia accepts for its own purposes. Enable an interface where people verify the data where their project is challenged.

When we truly aim to engage people, we enable them to target the data they want to work on. I will happily work on scientists but do not expect me to work on "sucker stars". More than likely there will be people who care about soccer stars but not about "crazy professors".
Thanks,
      GerardM

Wednesday, September 25, 2019

With #DBpedia to the (data) cleaners

The people at DBpedia are data wranglers. What they do is make the most of the data provided to them by the Wikipedias, Wikidata and a generous sprinkling of other sources. They are data wranglers because they take what is given to them and make the data shine.

Obviously, it takes skill and resources to get the best result and obviously, some of the data gathered does not pass the smell test. The process the data wranglers use includes a verification stage as described in this paper. They have two choices for when data that should be the same is not; they either have a preference or they go with the consensus ie the result that shows most often.

For data wranglers this is a proper choice.. There is an other option for another day, these discrepancies are left for the cleaners.

With the process well described, the data openly advertised as available, the cleaners will come. First people akin to the wranglers, they have the skills to build the queries, the tools to slice and dice the data. When these tools are discovered, particularly by those who care about specific subsets, they will dive in and change things where applicable. They will seek the references, make the judgments necessary to improve what is there.

The DBpedia data wranglers are part of the Wikimedia movement and do more than build something on top of what the Wikis produced; DBpedia and the Wikimedia projects work together improving our movement's qualities. With the processing data generally available this will become even more effective.
Thanks,
        GerardM

Sunday, September 22, 2019

Comparing datasets, bigger or better or it does not matter?

When Wikidata was created, it was created with a purpose. It replaced the Wikipedia based interwiki links, it did a better job and, it still does the best job at that. Since then the data has been expanded enormously, no longer can Wikidata be defined by its links to Wikipedia as it is now only a subset.

There are many ongoing efforts to extract information from the Wikipedias. The best organised project is DBpedia, it continuously improves it algorithms to get more and higher grade data and it republishes the data in a format that is both flexible and scalable. Information is also extracted from the Wikipedias by the Wikidata community. Plenty of tools like petscan and the awarder and plenty of people working on single items one at a time.

Statistically on the scale of a Wikidata, individual efforts make little or no impression but in the subsets the effects may be massive. It is for instance Siobhan working on New Zealand butterflies and other critters. Siobhan writes Wikipedia articles as well strengthening the ties that bind Wikidata to Wikipedia. Her efforts have been noticed and Wikidata is becoming increasingly relevant to and used by entomologists.

There are many data sets, because of its wiki links every Wikipedia is one as well. The notion that one is bigger or better does not really matter. It is all in the interoperability, it is all in the usability of the data. Wikipedia wiki links are highly functional and not interoperable at all. More and more Wikipedias accept that cooperation will get them better quality information for its readers. Once the biggest accept data as a resource to curate the shared data the act of comparing data sets is improved quality for all.
Thanks,
      GerardM

Saturday, September 07, 2019

Language barriers to @Wikidata

Wikidata is intended to serve all the languages of all the Wikipedias for starters. It does in one very important way; all the interwiki links or the links between articles on the same subject are maintained in Wikidata.

For most other purposes Wikidata serves the "big" languages best, particularly English. This is awkward because particularly people reading other languages stand to gain most from Wikidata. The question is: how do we chip away on this language barrier.

Giving Wikidata data an application is the best way to entice people to give Wikidata a second look.. Here are two:
  • Commons is being wikidatified and it now supports a "depicts" statement. As more labels become available in a language, finding pictures in "your" language becomes easy and obvious. It just needs an application
  • Many subjects are likely to be of interest in a language. Why not have projects like the Africa project with information about Africa shared and updated by the Listeria bot? Add labels and it becomes easier to use, link to Reasonator for understanding and add articles for a Wikipedia to gain content.
Key is the application of our data. Wikidata includes a lot, the objective is to find the labels and we will when the results are immediately applicable. It will also help when we consider the marketing opportunities that help foster our goals.

Thanks,
      GerardM

@Wikidata - #Quality is in the network

What amounts to quality is a recurring and controversial subject. For me quality is not so much in the individual statements for a particular Wikidata item, it is in how it links to other items.

As always, there has to be a point to it. You may want to write Wikipedia articles about chemists, artists, award winners. You may want to write to make the gender gap less in your face but who to write about?

Typically connecting to small subsets is best. However we want to know about the distribution of genders so it is very relevant to add a gender. Statistically it makes no difference in the big picture but for subsets like: the co-authors of a scientist or a profession, an award, additional data helps understand how the gender gap manifests itself.

The inflation of "professions" like "researcher" is such that it is no longer distinctive, at most it helps with the disambiguation from for instance soccer stars. When a more precise profession is known like "chemist" or "astronomer", all subclasses of researcher, it is best to remove researcher as it is implied.

Lists like members of "Young Academy of Scotland", have their value when they link as widely as possible. Considering only Wikidata misses the point, it is particularly the links to the organisations, the authorities (ORCiD, Google Scholar, VIAF) but also Twitter like for this psychologist. We may have links to all of them, the papers, the co-authors. But do we provide quality when people do not go into the rabbit hole?
Thanks,
      GerardM

Sunday, August 25, 2019

There is much more to read; introducing the "one page wonder"

Given that our aim is to share in the "sum of all knowledge", realistically we will not have it all at our disposal to share. It is also fairly likely that we will not know about all subjects.

When you google for a given subject, it is as likely as not that you will drown in too much data, too many false friends or find nothing at all when there is nothing to find in "your" language.

Increasingly, what we know about in the Wikiverse is linked to a Wikidata item. Pictures may depict a subject, articles may be written about a subject and all of them refer to a Wikidata item that may have labels in any language. Items that may even have links to references.

When we are to find for Wikipedia readers more to read, we need a mechanism, a place where we can link a subject to external resources. Resources like the Internet Archive, "your" library. the papers we know in WikiCite but to the free versions of these papers. The page will show the label in "your" language and  a picture. It links to all the pictures depicting the subject as well.

Putting the "one page wonder" in production is easy. It is all on one page and is fully internationalised. The localisation is done at translatewiki.net and when people want to make it useful in "their" language, they will add the missing labels for the Wikidata items.

With the "one page wonder" in place it becomes interesting:
  • Is "your" local library known to us and do we get your permission to find it for you. How do we supply "your" library with a search string?
  • The Internet Archive's wayback machine may have content in "your" language but can you navigate its English only user interface. 
  • What other organisations do we want to partner with to provide you with more to read
  • Will we be able to show local pictures, a Dutch cow looks different from an Indian cow..
  • What other issues will there be..
  • Oh and yes, we can include the Reasonator, queries and what have you.. we just have to think about what to show.
Thanks,
       GerardM

#Translatewiki.net the @Wikimedia movement infrastructure most people even do not know


Just consider this, there are more than 200 functioning Wikipedias and this is only possible because people localise the MediaWiki software in over 280 languages. It makes translatewiki.net, the website where all this work happens a strategic resource to the Wikimedia movement.

Internationalisation (i18n) and localisation (l10n) are an integral part of software development. It is an integral part of a continuous process and it requires constant attention. The day to day jobs are well in hand. The localisation itself is a community effort and with developers continually expanding the software base a continuous effort is needed of the translators to keep up with their language. This is hard and for many languages it is a struggle to keep up with even the "most used messages".

Managing this effort is a continuous effort, it is essential to maintain the i10n and the localisation optimally. It follows that it should be obvious what messages have the biggest impact first on the readers and then the editors of a Wikipedia. What should be in the "most used messages" changes over time and when it is considered strategic, such maintenance is to be considered a Wikimedia/MediaWiki undertaking. 

Translatewiki has always been an independent partner of the Wikimedia Foundation and it has always been firmly part of the Wikimedia movement. Given that partnerships are a key part of the strategic plans of the WMF, the proof of the partnership pudding is very much in how it interacts with a translatewiki.net. TWN does not need to be part of the WMF organisation for it to fund TWN, it is clearly a quid pro quo. The WMF should even encourage TWN and other partners to collaborate for their i18n and l10n and enable this for strategic purposes, strengthening these partners globally. 
Thanks,
     GerardM

Sunday, August 11, 2019

How to value open data and why Wikidata won't go stale

The data in Wikidata is data everyone knows or could know. A lot of awful things could be said of its content and quality and all of it misses one important point. It is being used, its use is increasing, it is increasingly used by Wikipedias and that provides an incentive to maintain the data.

What Wikipedia indicates is that most data is stable, not stale. A date of birth, a place of birth so much remains the same. When we bury data in text, it is always a challenge to get the data out. When we bury data in Wikidata it just takes a query to bring it back to life. Who was a member of multiple "National Young Academies, Similar Bodies and YS Networks" for instance; you do not find it in the texts of those organisations but you will increasingly find it in Wikidata. Once the data is in there, it is stable and available for query.

As GLAMS make their content available under a free license, their collections gain relevance as the collection gains an audience. Just consider that only a small part is available to the public in the GLAM itself and on Commons it is there for all to find. Commons is being wikidatified and those collections become available in any language gaining additional relevance in the process.

The best example is what the Biodiversity Heritage Library does. It is instrumental in the digitisation of books, it makes them publicly available and gains the collections they are from an audience. Volunteers prove themselves in this process and both professionals and the wider world benefit. From a data perspective the data is new because only now available.

When a publisher mocks Open data, it is self serving. It is in their interest that data is inaccessible, only there for those who pay. There are plenty of examples of great data initiatives that went to ground and obviously when the data does not pay the rent, publishers will pull the plug. It is different for the data at Wikidata. It is managed by an organisation that has as its motto "share in the sum of all knowledge". The audience the WMF has makes it a world top ten website, it is not for sale and it is not going anywhere. As long as there are people like me who care about the availability of information, the data at Wikidata may go stale in places waiting for another volunteer to pick up the slack.
Thanks,
      GerardM

Saturday, August 10, 2019

#Statistics or how many researchers are a #physicist

At @Wikidata most "researchers" are given this "occupation" out of convenience. We do not know how to label them properly, there are too many, so as all scholars must be researchers we make them so.

Nothing inherently wrong here; it is better to know them for what they also are then to know nothing about them at all. One issue though; we do not know the physicists from the chemists, from the behaviorists or any other specialism in science. We can query for physicists anyway but we will not catch them all.

Queries that show the numbers for a profession are easy enough to make. The value of such one time wonders is minimal, the results are fleeting, any moment now another scientists like Walter Hofstetter may become known to be a physicist and the numbers are no longer true. They are useful when we run queries like these regularly, save the results and present them like Magnus does for Wikidata itself.

What it takes is a mechanism that mimics Magnus's approach. We gain an insight in how Wikidata is performing over time and it provides motivation for people who care, for instance about physicists.
Thanks,
       GerardM

Tuesday, August 06, 2019

#Statistics for National Young Academies

The Global Young Academy is linked to many national young academies. It and they represent many relevant scientists. They represent all of science and, they are interested in representing science to the public. The question: how can we make them visible.

First you add the orgs to Wikidata and then the scientists to the orgs. When you then add the same Listeria lists to Wikipedias, we will see a picture when we have one and we may notice who has a local Wikipedia article.

There are many interesting statistics possible:
  • the gender ratio
  • the different professions in the mix
  • awards received 
  • the known number of publications per person
  • the organisations they are employed at
However, first things first. It is my intention to include all the current members and the alumni of the Young Academy of Sweden before Wikimania.. Second, these scholars are bright :) once they put their mind to it, they will help themselves to nice statistics based on the info we accumulate in Wikidata. They can be linked to on the Wikipedia pages.
Thanks,
     GerardM

Sunday, August 04, 2019

Helping @Wikipedia readers find their read, one author, one publication at a time

Reading is what the public of Wikipedia does and in a way, every Wikipedia is an invitation to further reading. Wikipedia is an encyclopedia and by definition, its coverage of a subject is limited. Its reliability is defined by its sources and they themselves are typically a subset of what may be read about that subject.

The quality of the invitation for further reading differs. How do we invite people to read a Shakespeare in Dutch, German, Malayalam, Kannada even English?

The primary partner in this quest for further reading; the local library. We can put all of them on a map and invite people to go there or to probe its website for further reading. Having them all in Wikidata with their coordinates puts libraries and what they stand for on the map. We can invite them to use services like OpenLibrary or WordCat, the bottom line; people read.

In this people first approach, the user interface is in the language people want to read their book in. It follows that the screen may be sparse. When it is to be a success, it is run like a business. We have statistics on libraries, people seeking, books found and a perspective in time. It is about people reading books not about transliterating books. Our business model: people reading. Funding is by people, organisations who care about more reading by more people. The numbers entice people to volunteer their efforts making more books, publications available in the language they care about.

To make this happen, the WMF takes the lead enabling and maintaining such a system and partnering with any and all organisations that care about this, organisations like the OCLC and the Internet Archive.

We will succeed when we make the effort.
Thanks,
       GerardM

Saturday, August 03, 2019

Competing with the PayWalls? Hell no!!

Competition is about business models and, the business models of the Wikimedia Foundation and publishers are utterly different. At the WMF we do better when people read more. The business model of publishers is that people pay before they read. When people like our service, they share their data, their money enabling us to do more.

Notions of "professional results" for our readers are outside of either business model. Terminology like "professional results" are interesting but to some extend they are a fringe benefit.

When a professional adds 278 ORCiD identifiers to Wikidata, he and all his colleagues benefit professionally because he did put in the effort. It follows that Roderic Page is a member of our community and his professional work benefits us all. These 278 scholars need to have their work known at Wikidata and when Roderic and others want to work on other scientists as well, they may.

There is no point is competing with paywalling publishers. Whether people do need to use a document that is behind paywall it is of no real concern to us. When we point to a free version of a paywalled document we do a better job because more people read. The business model of a publisher is of no further concern to us, our aim is for people to read.
Thanks,
     GerardM

Sunday, July 28, 2019

Daniel Pomarède or a method to include an interesting paper in @Wikidata

I read about an interesting astronomical phenomena, this is where you find the abstract. Daniel Pomarède, one of the authors is open in ORCiD about his work. I found that he has a presence on Wikidata by searching for his ORCiD id: "0000-0003-2038-0488".

Marking Mr Pomarède for an update is what it takes to give the article I think is interesting more of an audience. In the process there will be more to read about this part of astronomy.
Thanks,
       GerardM

Hey @GlobalYAcademy this blogpost is for you II

As I have added all global young academics in Wikidata, an update. What I like best about this academy is its global reach and the spread among the sciences. I am happily pleased that you may be found in ORCiD, Google Scholar and VIAF.

The most vital of them all for Wikidata is ORCiD; when your data "can be seen by everyone", we can retrieve your data, import it in Wikidata and make a "Scholia" for you. This is the Scholia of one of my favourite young academics. The import process (SourceMD) is broken at this time and this is my backlog of jobs to run.

Running a process for you will import co-authors and papers we do not know about. Given your global spread, it follows that your co-authors will have a similar global spread and this is an anti-dote to the Anglo-American bias we have in Wikidata and the Wikipedias. Particularly when I run a second job, a job will run for your co-authors with pubic ORCiD information as well, improving the subset of the data you are part of.

There are things you can do that have an impact on what we do:
  • You can check your data, add what is missing, improve what is wrong or missing on your Wikidata item
    • You can create/improve your ORCiD data and make it visible to everyone
    • You can trust organisations like CrossRef to update your data in ORCiD on your behalf
    • Please add your "name in native language" and indicate using the ISO 639 code the language it is in.
    • Check if the authorities that are linked to you are indeed correct and do not link to a false friend
    • Add your occupation
    • Please add other authorities that know you.. ISNI for instance
  • We love to have a (freely licensed) picture, it helps with disambiguation. You can upload it to Commons.. 
    • Having a picture on the GYA website and on Google Scholar is why there are so many links to Google Scholar
So what is in it for us?
  • We want people to know about science and learn about the scientific record
  • We want people to write Wikipedia articles and your papers may be used as references.
  • There are many gaps in our coverage of science. We know and it is improved one paper, one scientist at a time.. There is even the option to work on a specific subject.. like this one
As a member or the GYA, you are part of an outreach program. We happily invite you to work with us and together do the best job possible for science.
Thanks,
      GerardM

Friday, July 26, 2019

Authorities relieve us from the tedium of completeness and enable functionality

At Wikidata we do not rely on any one authority and we refer to many. As a consequence we bring many links to authorities to our users and only when they know how to value them, it is of value.

The link for William Shakespeare to the Open Library gains you access to his work.. It links those works to the Library of Congress to indicate that it is indeed that work of the bard.

When authors we know are linked to the Open Library, it does not really matter if we know their books. People find them regardless. When we want people to read, all we need to do is promote these links to Open Library and to local libraries.. Such promotion could be done in the Wikipedias. Like we do for WorldCat and WorldCat could be so much better if it is about local attention for the user and consequently have more of a purpose.

One project on Wikidata has been to include scholarly works that are free to read. Free to read enables for those works and their authors an additional audience and increased relevance. However among all the works we represent, we do not know what works were added. That makes it a fail. There is an authority for that. It is Unpaywall. However even when we have a link to Unpaywall it only makes a difference when people use it and read articles. This effect is something we can measure when people go to the free version of an article.

We can get the database of Unpaywall and add just another authority. Next is the issue of maintenance. We could partner with Unpaywall and have a hybrid system where we import the database and regularly check those articles we do not know to be open.

In this way we still do not see the effect of more reads of open science. To achieve that we should mark free articles with an Unpaywall icon in Scholia and Reasonator. Measuring the amount of reads is now possible and we positively acknowledge authors with free to read articles.

Next could be an Unpaywall icon in Wikipedia for all free to read references..
Thanks,
      GerardM

Sunday, July 21, 2019

@Wikimedia, when we do "science outreach" what audience do we reach out to and why?

A recent tweet said "If you are an outstanding woman then you have a 1 IN 6 chance of having a @Wikipedia article. If you are an #African woman then you have a 1 in 300 chance." This bias exists for all Africans and all of Africa.

In Wikicite, with all respect for what has been achieved, we find a professional approach by scientists. Their profession, their data and this is all well and good. However, as the University of California no longer has access to the Elsevier papers, business is no longer "as usual" and consequently the relevance of access to readable papers has gained priority.

We need to know if papers known to Wikidata are available and we may have all the papers known at UnpayWall but as long as we do not indicate availability, it is irrelevant.

We need to make it easy for scientists to gain a presence for their science. At this time there are too many hoops to jump through. We can make it easy by putting scientists in the driving seat.
  1. Make an Orcid identifier for yourself and open up the data
  2. Enable common sense organisations like "your" university, CrossRef to update your profile
  3. Have a button that runs a SourceMD process importing the data into Wikidata.
  4. Enjoy and improve on your Scholia
By enabling people to update their data and the data of others, you create value. When we run the API of Unpaywall as part of the SourceMD process, we help USC and we help the rest of the world that is facing the insurmountably obstacle that exists because of the likes of Elsevier. Science becomes easier, scientists gain relevance for their science and Wikidata establishes another purpose.

NB Wikipedia gains as a fringe benefit an objective criteria for the establishment of notability (it is in the science, the Scholia)
Thanks,
      GerardM

Thursday, July 18, 2019

@librarycongress and @gndnet link to @OCLC's #Viaf and beyond

In 2015 it was news that in VIAF, Wikipedia was replaced by Wikidata. In quick succession it was recently announced that both the American Library of Congress and the German Deutsche National Bibliothek announced that they are linking to Wikidata.

That is awesome enough. Awesome because as a result, Wikidata is easier to link to VIAF as every entry of the LoC and DNB results in a VIAF registration. The only thing needed to make this a reality in Wikidata is a dedicated bot for us to know all the good work done in the US and Germany.

Another relevant improvement that is of particular relevance to scientists like linguists is that it is now possible to authorise the GND to automatically update the ORCiD record. It will be truly awesome when this is the example other authorities follow.

It is a small step for Viaf to include ORCiD as it links to other scientific publications. For librarians and library systems this is most relevant. For Wikidata it will help with disambiguation and it allows us to populate our information with even more papers and co-authors.
Thanks,
     GerardM

Wednesday, July 03, 2019

Hey @GlobalYAcademy this blogpost is for you

I am adding members of the Global Young Academy to Wikidata. This was requested on Twitter and I was asked to describe the process how they are added. With 100 members added, it is high time to take the time for this.
In this blog many of the pointers are for Matthew Levy, all edits are done in Wikidata itself.. This is the item for Mr Levy.
Thanks,
      GerardM

Saturday, June 22, 2019

Bulk uploads linked to @ORCID_Org and others, then what

Bulk uploads to @Wikidata happen all the time, for instance the latest medical publications. They result in links to existing scholars and new authors. The question: "then what" was raised on Twitter and in the question was the assumption of a quantitative reply.

When such data is imported in Wikidata it does not fall into a vacuum. Many notable scientists are already known because they have a Wikipedia article and because they are linked to "authorities" like ORCiD, VIAF, Google Scholar and many others. The result is a "Scholia" for a scholar and it includes all the known papers, the co-authors, dates of awards. This is one example of a scholar without a Wikipedia article.

Scholia is a very important tool as it enables more work on scholars. The display of co-authors for instance show their gender. Orange for women, blue for men and white when it is not known. Many people are involved in "Women in Red" writing new articles about lady scientists. On the project page of Women in Red you will find lists that are the result of queries run on Wikidata. This is why adding gender info is so important. Notability may be inferred from the awards people received, notability gains relevance when it does not stand alone. This is why a link to "authorities" establish the necessary notability for a Wikipedia article. Objectively this is best presented in a Scholia like the example of Elizabeth Barrett-Connor.

When attention is given to a scholar like Mrs Barrett-Conor, arguably the "ungendered" scholars are relatively new to Wikidata and typically incomplete. There is a tool for that; SourceMD adds missing papers and links to existing papers. It also adds links to known authors and adds missing authors. The effect is a network of information that is increasingly rich. Arguably this is a bulk upload in its own right but the origin is a different one.

Presentations on topics like awards, organisations, topics and much more are available from the Scholia tool. In such a presentation it shows what we have and given that Wikidata is a wiki, there is more to know. Award winners may be enriched with authority information, they may be linked to papers. Frequent publishers to a topic may have co-authors that could do with some TLC.

In answer to the original question; bulk uploads invite additional work, the data is enriched and becomes increasingly relevant.
Thanks,
       GerardM