Thursday, July 30, 2015

#Wikidata - What is a Radcliffe fellow?

Mrs Mehrangiz Kar is a Radcliffe fellow. It is just one of the things that is to know about her. I think this fellow thing is a fancy way of saying that she was employed at Radcliffe college..  When you read the article about her, there are several quite relevant awards mentioned.

Registering awards, any and all awards, is a way of establishing relevance. Several of them so far did not merit their own article and therefore they are just registered as text. When you look close at all the Radcliffe fellows, you get an idea what they have in common. The same is true for people who received an award.

Every so often for their own political reasons, people will deny what is in their face when they look at it. The merit of the organisations that are of foreign origin is easy to dismiss.. "Gross interference", "not patriotic" is easy to utter and when the nation is said to be equal to the current incumbent.

The point of Wikipedia and Wikidata however, is not to take a point of view. Articles are there to be read, to inform. They may be the basis for an informed opinion. That is fine, that is what they are there for.

PS The point of Mrs Kar is that there are so many awards, relevant awards in the Wikipedia article that lack weight because they are not backed up with more information.

Sunday, July 26, 2015

#Wikidata - Sydney Hollander award

Some awards merit extra attention. One of them is the Sydney Hollander award. It was awarded to people and organisations that were instrumental in bringing an end to segregation in the United States. The award was brought attention to "best practices" and reinforced them. The first recipient in 1946 was the Baltimore Sun. It received the award because it finally ended the practice of indicating what race was desired in the "help wanted" section. At the time there was an argument if they deserved the award in the first place. Later the award proved to be a catalyst in bringing further changes to the Sun. After the award was received, the Sun began to cover the black community and interview notable people of colour.

The Sydney Hollander award had its end in 1964 because it was recognised that desegregation was now covered on a higher level. The need for the award was no longer so urgent.

Arguably, when Wikipedia is to document the history of the United States, an award like this and the achievements it celebrates deserve attention. One issue may be the lack of sources. There is not much to find on the Internet, there is not much to read in the Wikipedia article.

Saturday, July 25, 2015

#SignWriting #Symposium - including #Wikipedia

For a second year now the Signwriting Foundation organises their online symposium. For those who do not know, SignWriting is all about writing sign languages. This has been under development for over 40 years and it was founded by Valerie Sutton.

When a language can be written, it may have a Wikipedia and, it has been wished for for a long time. The problems are many. The characters have to show, the text is written online for it to be truly a Wiki.

In a presentation on this years symposium, Yair Rand informs us about a keyboard that has been developed to bring the reality of Wikipedia for a sign language even closer.

If there is one thing the people at SignWriting org teach us, it is that perseverance matters. With an input method, a Wikipedia in the American Sign Language is that much closer.

Thursday, July 23, 2015

#Wikimedia - #portal to all #knowledge

The Dutch Royal #Library donated, yet again, a wonderful collection of material to #Commons. This time 3100 images were uploaded using the GLAM-wiki uploadtool for the first time.

When you read the announcement, it is really interesting to find what is known in Wikidata and by inference in Wikipedia. One fun fact is that the old image for Mr Schoemaker has his name in a way that makes more sense in Dutch.; schoenmaker means, shoemaker or cobbler.

When you visit Wikidata for the subjects mentioned in the mail, you find not that much information but often a rich source of external sources. Some of it is really informative and well worth a visit. For Wikipedia articles, we provide badges for excellent material. It highlights quality where we find it. Maybe something like this can be done for external sources as well.

Having attention for the external sources we link to makes Wikidata more of a portal to all knowledge. It would extend from what Wikidata already is: the portal to all available knowlege in the Wikimedia Foundation.

Tuesday, July 21, 2015

#Wikidata - what William O. Douglas Award?

#Wikipedia has an article about the William O. Douglas Award and, it is totally disappointing. When you Google for it, you find another William O. Douglas Award, and yet another William O. Douglas Award and maybe yet another...

The last one was awarded to Hillary Clinton in 2014, a fact that until now escaped the attention at both Wikipedia and Wikidata. There is no problem in having items for any and all awards. There is a minor problem when an article is incomplete in the way the Wikipedia article is.

Then again, it is probably an article nobody sees or reads. Arguably, such statements of facts (the award exists and is conferred by) probably have an easier life at Wikidata.

Monday, July 20, 2015

#Wikidata - Inge Genefke, and #torture

Torture is a crime where the victim is expandable, without any rights, someone who is not given much consideration. When Amnesty International asked for physicians to assist those who had been tortured, the Danish Inge Genefke responded to this request. She started locally and then founded the International Rehabilitation Council for Torture Victims.

An award was named in her honour, the Inge Genefke Award and expanding the information on all these subjects, including the award winners is easy enough.

When you care about a given subject, like torture, and you want to expose the forces of good or evil, you can at Wikidata by adding information. It is all part of sharing in the sum of all available knowledge. It is all part of caring about what information is available in our world.

Saturday, July 18, 2015

#Wikidata - collaborating on data #sources II

#Freebase had a vested interest in the MacRobert Award. Its recent laureate is Artemis Intelligent Power,  As Wikidata continues to update its information, Freebase will become increasingly incomplete and as a consequence of less relevance to the people who use its data at :BaseKB.

When you do not know about the MacRobert Award, read about it. When this is the first time you hear about Artemis Intelligent Power, read the BBC article about it.. News like this gives us hope for our future.

We can support :BaseKB with our data, we can support any other source who is interested in what Wikidata has to say. It all starts with acknowledging them as sources and it all starts that quality has everything to do with it.

It is all in being open to quality. It is all to being open to reevaluating quality.

#Wikidata - collaborating on data #sources

#Freebase's data found a home at :BaseKB and that is a great opener to an alternate approach to the Freebase data for Wikidata. This proposal has many parts that makes for great cooperation between multiple sources.
  • items always have a link to its source(s)
  • statements always state its source
  • an indicator for the status of a statement is added
Linking to external sources is something Wikidata does a lot. It allows us to have a look at the source itself. When it is the origin of the data, it follows that the statements from that origin have to be exactly the same. When they are not, it should be indicated with a status.

When a statement differs from a source, we have identified something that needs work. In essence. such statuses have a function in a workflow as well. Because this is where intervention makes a difference. 
  • we set the flag for people to investigate the difference
  • we can flag the source itself that we found an issue 
  • we can find alternate sources to find what is likely correct
  • we can find corroboration in literature
  • we change the statement when needed and set a flag to indicate the status
In this way, we spend our human capital wisely. We do not blindly spend time on "approving" any and all statements. We do it only where we know some research is needed. 

As far as I am aware, you can not copyright facts. We compare our facts against facts known elsewhere. We include our missing data where we may and we always investigate differences. We signal the work we have done and, in this way we not only improve our quality, we also provide a path to the sources we work with for them to work on their quality.

In this way we spend the time and effort of our community wisely and, we optimise the amount of information available in Wikidata. In this way everyone is a winner.

Thursday, July 16, 2015

#Wikidata - the failure of the "primary sources tool"

I add data all the time. My tools do not see data that is waiting to be curated.  So I add data.

I find I am actually wasting my time.

#Wikidata - failing the #Freebase community

#Google in its infinite wisdom assumed that as Wikidata has a bigger and growing community, it will go beyond what Freebase could do. It had the power to end Freebase and it did in the expectation that the two communities would work together. An assumption that is quite reasonable.

When you work together, you share resources and you have respect for each others achievements. Wikidata may have the bigger community but Freebase has the most data and for whatever reasons the data from Freebase was not accepted at Wikidata.

The consequence is that the people who rely on this data now lost their source. It was lost because the service at Freebase was discontinued. It is therefore great news that the last Freebase dump has a new home at :BaseKB.

Some of the Freebase data is waiting for "approval" hidden in Wikidata. It is part of the Primary sources tool experiment and if you care to "curate" this data, you have to first enable a gadget. When this is not done, it is likely that the same data will be added because there is no other way to learn that the data is waiting to be approved. To add insult to injury, when data is added in this way, it does not report Freebase as the source..

This would not have happened when Wikidata was true  to the Wikimedia mantra of "sharing in the sum of all knowledge". I urge the powers that be at Wikidata to consider the following remedial steps:
  • add Freebase as the source of data imported from Freebase
  • report on the process of "curating" Freebase data
  • when a link between Wikidata and Freebase is known, add missing statements
  • add data from Freebase where there is none at Wikidata, data can be merged later when need be
  • consider tools that allow for the selection of data based on its source for further curation and conversion
  • invest in tools for people working on the inclusion of data from the "Primary sources tool" and from other sources
  • compare data from sources and report on where there are differences. This is where time of our community is well spend
  • consider a tool where the statements of Wikidata are easily available to Wikipedia editors. It may help them with additional information and it will help us in them curating this data
Most important, Wikidata is of particular importance because it links articles from Wikimedia projects. This is where it shines. The quality of its statements is debatable but improving. By including the lovingly maintained data from Freebase, we may expand our community and we do expand both the quantity and quality of the statements. 

Finally Wikipedians love their sources and they are important. When Wikipedians can easily compare the information that is in statements and in the text, we will find them more involved when there is more to curate. Given that Wikipedians get involved in this way, the sources are implicitly known through Wikipedia. This is yet another argument to make haste by including missing data, available from Freebase and other sources.

Wednesday, July 15, 2015

A #Wikidata and a #Wikipedia perspective

It may seem as if there is this divide between Wikipedia and Wikidata and, there is and there is not. It is just a matter of perspective. When you look at a Wikipedia article, there may be many links to other articles and, all of them are Wikidata items as well

The relation between Wikipedia articles is described in text and, the relation between Wikidata items is defined in statements. As Wikidata becomes more complete, more and more of the links in an article will be defined in Wikidata as well.

Consider for instance the article on the Gruber Prize for Women's Rights and compare it with its Reasonator page. You will already find a large overlap and, you will find in the link to its "concept cloud" all the article links in Wikipedia articles in any language.

What you might want is yet another perspective; a list of all the links in an article combined with the statements to the corresponding articles. This would be relatively simple to cobble together. The next step would be an option to add missing statements.

The point of such a gadget would be to engage Wikipedians. They could add statements one at a time adding the relations that they have already defined. It would be an obvious next step once links and redlinks have additional functionality possible because of Wikidata. However, it would work already as a gadget.

Tuesday, July 14, 2015

#Wikidata - appeasing the #Wikipedia crowd

To understand #quality, you have to make assumptions. For Wikipedians, quality is in the source. This source is a statement somewhere that a given fact is true. This makes sense in the text based Wikipedia.

One problem Wikidata faces is that many Wikipedians are sceptical about the quality of the information at Wikidata. "There are no sources" is the oft repeated mantra. Who checked the fact, who approved... Why should we trust the information? This resulted in a culture at Wikidata where "because of the Wikipedians" things that should be obvious are no longer obvious. When the Freebase data is imported, it is not imported it is kept in purgatory, waiting for someone to say OK.

The consequence is an epic fail when you use tools to add information. Data that is in purgatory is not seen by the tools and consequently it is added again, without the sources and without the additional qualifiers known in purgatory.

Google and the Freebase community spend an inordinate amount of time and effort on making this information right. It was used by Google in its products and WHY do we need to doubt its quality? Why should we think ourselves superior just because of the uneasy sentiments of Wikipedians?

The results are an epic fail because;
  • the newly added information is not seen in tools
  • there is no way to automatically compare it with other sources and thereby verify it
  • it demonstrates mistrust where it is not needed.
When we want to import data from external sources, it makes sense to verify its quality. However, it is best done at the gate, before information is added to Wikidata. When a source is of such a quality that we want it, we should import it. Just like we did for all the settlements in China, Just like we are doing all the time from incomplete data from many Wikipedias.

We should do better and when we forget about Wikipedians for a moment, we know better.

Monday, July 13, 2015

#Wikidata - more great statistics

When you want to know what Wikidata data is about, you find it in the statements "instance of" and "subclass of". The illustration is part of a set of new statistics by Zolo. It concentrates on these instances and statistics and a big surprise is that there are still so many more items without one (32.6%). The statistics by Magnus show that only 19.07% of all items have no statement.

The great thing of the new statistics page is that it shows pie charts for many Wikipedias. English Wikipedia for instance for instance does not know about 1.5 million people Wikidata knows about. When you analyse this, you get into the domain of set theory. These people can already be subdivided in the following way:
  • people only en.wp knows about
  • people en.wp knows about as well as others
  • people en.wp does not know about but another wp does
  • people no Wikipedia knows about
  • items Wikidata does not recognise as people
People and unidentified items are half the pie for Wikidata. In all cases they are pretty well connected to articles in whatever project and provide a perfect opportunity to provide information. I blogged about it before, wikilinks and red links are obvious targets for links to other information not only for people but also for people.

Many people have #Babel information on their user page. They thereby indicate their proficiency in a language. When a Wikipedia does not have an article, why not provide an article in a known language? When there is no article why not provide at least the information that is available? It will help an editor write a new article, it will provide a reader with at least some information.

The point is that we should share in the sum of all available knowledge because we can.

Sunday, July 12, 2015

#Wikidata - an invitation to add missing #data to the #AAAS

The American Association for the Advancement of Science, confers several awards. You can read more about it on Wikipedia or on their website. It is not unlikely that the Wikipedia article is read as much as the official website. The Wikipedia article of the AAAS Award for Scientific Freedom and Responsibility for instance is available for improvement..For many laureates there is no article, for many of them there is not even a red link and there is only an article in English.

At this time, Wikidata is not much better. It knows about the laureates there is an article for. A few additional facts have been added but that is about it. The information is available in the most relevant languages..

When an organisation like the AAAS so chooses, they can add additional data. Data about the laureates, when the award was conferred to them and they may even add people missing in Wikidata. or they may find that  Kyoshi Kurokawa is the 2012 laureate . The benefits to the AAAS will grown in time. Japanese people may learn about the AAAS, Wikidata will become more relevant as a hub for Open Data and consequently people will become even more exposed to the AAAS.

Obviously, this invitation is open to all organisations that confer awards. It is open to people who care about such things.

Saturday, July 11, 2015

#Wikidata - Don't fence #Wikipedia in!

There is a technical discussion at Wikidata about the text used on properties. The idea is that the label used is fixed. My understanding is that as a consequence once a property is given a label, it remains that label always and, there is no way to change it both in Wikipedia and Wikipedia.

Best practise at Wikipedia is that you can override text as the editor sees fit. For instance: P570 has a label of "Date of death", when it is used in a template for people who were executed, an editor may choose to use "Date of execution" instead.

When you consider that there are over 280 languages with a Wikipedia, the notion that both the labels and its aliases in a language are unique has been demonstrated to be problematic. In addition there are Wikipedias without labels in that language. Once people start adding labels, it is not obvious that the first label chosen is correct. It is obvious that the property will not be recreated because someone made a booboo.

In conclusion, it is best when properties can be selected based on a label and a description. Internally the property identifier is used and the label is provided as a default for use in Wikipedia. In this way there are no problem for editors. The only issue left is that some developers have to scratch their head on how to do this.

Friday, July 10, 2015

#Wikipedia - is it about articles or about information? II

Dear Wikipedians. Red links are good. They are an invitation to write more articles and, in them olden days, it was the main way to many, many more articles. The draw back for readers is that behind those articles there is nothing. As a result articles look unfinished and many Wikipedias removed the red links.

Enter, Red Links 1.1. Currently it is just a template and it should be properly embedded in MediaWiki. What it does is link to Wikidata and as a consequence it can do multiple things. It could provide you with an article in a language you can read. It could show you information in the Reasonator. In all cases it does provide an editor with data to base an article on.

We could even change the behaviour of normal links and allow for a "left click" on a link. This could provide a list of all the articles in other languages and a link to Reasonator. The beauty of such an innovation is that "disambiguation pages" become redundant because the disambiguation is in the link itself. When the Wikidata items is enriched with statements, it means that the subject may pop up in associated lists, in categories.

The beauty is that this is a winner for everyone. Readers get access to more information. Editors get access to more information to write new or improve articles. Wikidata becomes used for a task that it is ready for; link articles through links and red links.

Thursday, July 09, 2015

#Wikipedia - is it about articles or about information?

To be blunt, Wikipedians write articles in one language. Many are quite good at it. they think that their way is the only way. They scorn what others do when they do not do it in their image.

Wikidata is a centralised hub for information on everything that is in most Wikimedia projects, that includes all Wikipedias. It proved to be vastly superior to the old Wikipedia way of "interlanguage links" from its start.

From the start items and statements have been added about all the facts at a fantastic rate. It is obvious that Wikidata knows about more facts than all Wikipedias combined. When Wikipedia has an article, it is typically better presented, but when there is no article Wikidata may inform where Wikipedia does not.

Except that Wikipedians do not accept information from Wikidata. In many ways their argument are similar to the arguments against Wikipedia in "them olden days". Their call for sources is in many ways understandable but wrong. Take for instance the article on the Wateler Peace Prize, It exists in three languages, French, Dutch and English. It is an old and respectable peace prize and all three articles list the laureates. They are people and institutions. Slowly but surely more information is added about all the laureates linking them to the award. It does not take long for Wikidata to have superior information over all three articles.

It becomes superior because there are no redlinks and they are always linked. The information can be seen in any language, it just takes transliteration or labels to be always understood. It is information, it stands on its own and as Wikidata matures, it will become more obvious why information like this is solid.

Wikidata is not beholden to Wikipedians. They may choose not to use it in the articles they work on. But denying the value of Wikidata as a source of information for everyone puts them in a position where history will prove them wrong. The tragedy is that they deny it to the people seeking information and that is why their attitude hurts what the Wikimedia Foundation stands for,

Monday, July 06, 2015

#Wikidata - Inspired Teacher of #India

Subjects and relevant subjects taught in university are not necessarily the same. When medicine is taught in India, diseases that are prevalent in India are relevant while diseases prevalent in Europe and North America are not so much. An article in the Metro India suggests that Indian textbooks do not have their priorities right and therefore subjects that matter are not taught or researched in University.

The Inspired Teacher Award is a new award for India and as it means that professors are honoured when they teach subjects that matter, it is quite wonderful. 

As a reward it is quite special; it is not the money it is time with the president of India and top bureaucrats that provides them with a platform to have their voice heard. At this time there are no articles nor items for the individual professors, something that is bound to change.

Sunday, July 05, 2015

#Wikidata - Dr B.R. Barwale, winner of the World Food Prize

The English article on the World Food Prize mentions that Dr B.R. Barwale is the 1998 recipient of that award. Google has it that Mr Barwale was born in 1931 in India.

Because Wikipedia does not know that Mr Barwale was born in 1931, he is not in a category "Born in 1931" and consequently Wikidata did not know that Mr Barwale was a human and therefore he was not automatically awarded the World Food Prize.. This was added by hand.

The World Food Prize has a website, it mentions Mr Barwale in plenty of detail. Arguably, as a source the website of the World Food Prize provides many details for its laureates. It is however like Wikipedia not a source that makes for easy automated comparisons.

#Wikidata - #verifiability

Verifiability for Wikidata is different from verifiability for Wikipedia. One sentence like "Mr X was born on 7-5-1959 in Zwaag, he became known for activities in Y" contain multiple statements and Wikipedia could use one source while Wikidata needs the same source multiple times. The sources for Wikipedia are nice out of the way and for Wikidata they are in your face.

Yet again there is a discussion about verifiability and to be honest, it is boring. On a typical day the vast majority of new statements come without any sources. To be brutally honest, I have never added sources and I do not intent to either. I do remove sources when I update information that is wrong and is sourced.

Wikidata is hardly the only source of linked data and it is relatively easy to compare databases. This is when the idiosyncrasies come out. It is where you have to  map data from one database to another. Once this is done, you can compare multiple sources and find how they match and mismatch.

Arguably this is more powerful as individual sources because there is little interest in adding missing sources per statement.  There is a lot more interest in finding out why there are differences between the data in databases. It leads to a finetuning of the mapping or it leads to changes in the data on either end.

Wikipedia does not need sources for each Wikidata statement. What it needs is confidence. Confidence in best practices that ensure the data is as good as we can make it.