Wednesday, December 30, 2015

#Wikidata - Wilma Boevink and the #stigma of #psychiatry

A subject like psychiatry is all too often ignored, neglected and not a topic people spend equal effort on. When people rely on Wikipedia as a primary source for information it is vital that they find concepts like recovery. It is what gives them hope. It is what tells them that even when they suffer from psychiatric ailments there is hope. They can learn to manage their situation, they can become more integrated, achieve the goals they strive for.

Mrs Boevink is a leading light in the Netherlands. She has been pivotal in a movement that empowers people to pick up their lives and make the most of it. She is a published author, published often and published with many others. A person as notable deserves an article in Wikipedia but for now, Wikidata will do.

When people talk about quality, it is easily abstract. Mrs Boevink was instrumental in the development of the HEE method. It has been proven as an effective intervention of aiding people in their process of recovery. This "intervention" has no link yet in Wikidata and, I have no clue how to indicate that it has been certified as such.

By including information like this, it is easy to learn about recovery, about HEE, about Mrs Boevink and, it enables people to inform themselves. Insurance companies make it really hard to fund the people best positioned to provide HEE. They manipulate information by informing the public just so. Such manipulations become hard when Wikipedia provides its NPOV about this topic.

Both Wikipedia and Wikidata are works in progress. They do improve their quality and it does not take a genius to understand how Wikipedias in a language are manipulated. It takes some sober reflection to understand that an existing lack of information enables manipulation as easily. Subjects that are associated with discrimination, stigmatisation are exactly where people are vulnerable and where basic, encyclopaedic information is really needed.

Thursday, December 24, 2015

#Wikidata - The Anne Vondeling Prize

Mr Vondeling was a Dutch politician. A Dutch journalism award was named after him and, this award is still active. Currently there are three articles about this award; one in English, one in German and one in Dutch. Arguably there could be one in Frisian, Limburgian, Sealandish..

The problem with list articles is that they need maintenance. Having all the data in Wikidata helps. Given that many of the winners are "red links" in all languages, new items had to be created in Wikidata.

All the lists include a reference to the newspaper these people were writing for. For some of them multiple employers are known and for others I did not add the newspaper.

With a list like this, it is possible to include an award like this. It could have been any award. Maintenance is largely left to Wikidata and that is one headache less.

Friday, December 11, 2015

#MissingBassel - What to do for him and, for #Syria

#Bassel is a Wikipedian. He has been sentenced to death and according to some, there is not much that we can do. They are wrong. We can do so much even within the confines of what they think is our neutral point of view.

Bassel is from Syria, in many ways people think they Syrians are the enemy. They are wrong. Bassel is a Syrian and he is one of the 735 Syrians in all of Wikipedia. Writing about Syria, about Syrians is what we can do. Without adequate coverage of Syria there is no neutral point of view. We need to know about Syria and its history, the people that make up the Syrians without making information available in Wikipedia, it is all too easy to persist in the idea that they are the enemy.

When we write articles, we need illustrations. We need illustration about modern Syria and the Syria that once was. Particularly for this last part, our friends in the many GLAM institutions for help. When we explain our need, they are quite happy to raise to the occasion, after all it is for a Wikipedian..

At the Erasmus award ceremony we got to talk and people from the Tropenmuseum came up with the idea of checking their collection. They have a small collection of 356 images readily available for upload and, they need some assistance.. It is such a small collection because it is not easy to ask the often Syrian photographers for their permission.. the fog of war you know..

We can make a difference for Bassel, make no mistake. We can ask our GLAM friends, we can ask Syrians who live abroad, we can ask anyone for illustrations about everything Syria. We can ask ourselves what we can do and the least we can do is make a difference and write about Syria and the Syrians.

Thursday, December 10, 2015

#Wikidata - #quality and the Nansen #Refugee Award III

When a lot of effort is spend on adding data to Wikidata, when the data is correct, it is important to share the resulting information. This is why I am happy to share the information about the Nansen Refugee Award in this way.

Having a list like this on Wikipedia in this way is wonderful because when updates occur and, they may be in the way of more illustrations, or a later addition in 2016. When this happens, the new award winner will be at the top. It is a manner of presentation and arguably at this time and in this context Mrs Asifi is more relevant than Mrs Roosevelt.

Two big improvements in this list are that SPARQL is used and a new statement available in Listeria. Magnus was so gracious to add this new statement and, particularly for lists it is important to have it.

A lot has been said and done about quality and Wikidata lately. Quality is also in it being used. Lists like this represent obvious quality in Wikidata but there is little point when it cannot be used. Tools like Reasonator and Listeria enable the use of data. It is how it may reach a public in any language. This exposure entices more people to come to Wikidata to add data that is relevant to them.

Wednesday, December 09, 2015

#Wikipedia Signpost? Yeah right!

First have a read of the op-ed on the Wikipedia Signpost. Then come back.

I love its notion of "one ring that binds them all", they apparently did not watch the movie; the ring dissolved in Mount Doom and that is the end of it.

The article suggests a lot of things, it is all based on what others had to say and it asserts that the quality of Wikidata is bad. This is only based on the assumption that because there are so little sources it must be bad.  In the whole article it is not acknowledged that Wikidata is a wiki and consequently its implication is ignored. There is no notion that quality is anything but "it has a source" and doom is predicted not only by using the "one ring" but also in "the tower of Babel" as an analogy. For me it qualifies as FUD.

When you strip away all that has nothing to do with Wikidata and its quality, there is not much left. There is no definition of quality, there is no notion of the current quality, it is suggested that it will get worse but not why. It is a sad piece of prose pretending to have an answer. It is all about elsewhere and others.

Quality can be many things. It can be an error rating in percentages, it can exist in comparison to what other sources hold, it can be in the way errors are dealt with, it can be in the completeness of the data. It can be in the way you connect to others and in how you deal with the work of others. It can be in how your data is considered by others. Finally there is this most restricted form of quality where everything has to be perfect. Quality of this type is so far away from what a wiki stands for that it only deserves a shrug, it is patently foolish even when it is what we aspire to. Ironically when Wikidata came into being, it started as an important quality improvement for Wikipedia. It largely solved its interwiki mess and made it manageable.

When people add data to any project, they make mistakes. This has been studied a lot and depending on the manner of the edit, quality is up or down. When people or processes check on what has been done some of the errors are easily spotted and remedied. This is one way of improving quality. Once data is fairly complete, it is the context of the other data that gives an indication of the likelihood of new edits. It is for instance unlikely that an American Football player received an international refugee award.

Wikidata is severely incomplete. The statistics show that things are improving. The biggest problem is with the number of items that are not identified for what they are about. When this is known a lot of additional work can be done.

One of the qualities of Wikidata is that people already find an application for it. VIAF, the database of the OCLC, for instance links to Wikidata in preference to the English Wikipedia. In this way they link books and authors in any language to all Wikipedias.

Originally, Wikipedia had operational values like "be bold". It was ok to stand on the shoulders of giants and make incremental changes or wait for other giants to continue where you stopped. It is these values that enabled the growth of Wikipedia. As a wiki, Wikipedia has degenerated and it now has policy wonks determined to impose their notion of quality. Wikidata is immature and it needs to fight for remaining a wiki. If anything it is this anti wiki sentiment that is holding Wikidata back.

When Wikidata is to have more quality, what can we do. We can import more data for instance from Freebase. Like in Wikidata, volunteers have spend considerable time adding data. It has been a sincere effort and it deserves appreciation. It may have its issues but these are the problems we have to deal with anyway. By doing this we finally reach out to the people from Freebase and recognise their effort. The least we can do is recognise their contributions that make it into Wikidata by citing Freebase as the source.

We can compare data from other sources. This is the most obvious way of finding what may be in error.  As a rule, the differences are where most likely an error can be found. This is where it makes sense to go find a source to identify what is likely correct. This is where collaboration from Wikipedians is helpful because the source for Wikidata is most likely a Wikipedia. Adding sources while curating differences is most effective and makes a real difference.

The argument for adding data is easily explained from set theory. When there is no data to show, we have a 100% failure. When new data is 90% correct and when when we have processes in place to compare sources, we have a 90% improvement and work to do. When of the 10% we can identify 80% as statements with issues, we can flag those and work on improving the data.

The beauty of Wikidata is that is may be used in many languages. This is not without its issues but as a consequence, any item in Wikidata can be found in for instance the Tamil Wikipedia. By only adding the label in Tamil the information may be nicely presented in Reasonator. In this way it is easy to start to build information building in a language and consequently work towards a Wikipedia article.

Given that Wikidata is based so much on Wikipedia data, it is obvious that Wikipedia has most to gain from quality improvements in Wikidata. Those American Football players are still in the Wikipedia article, there is more information in Wikidata for all the red linked winners. There are 14 Wikipedias that could already benefit from the data at Wikidata and for other Wikipedias it makes it easier to include this award because there is no longer this maintenance requirement.

The same is true for mayors, is the mayor or your town correct on every Wikipedia? Does it show who he or she is on every Wikipedia? It is not for mine.. Is the number of inhabitants still correct? Wikipedia has a problem with such facts, there are too many of them and given that facts may exist in 290-ish Wikipedias who is going to do it all for all the cities, municipalities wherever in the world?

What Wikidata has to offer is collaboration. It is growing fast, its data is improving constantly and arguably as more data becomes available, more eyes will see what is good and what is bad. As our tools become increasingly sophisticated, it is not that we do not people to make a difference it is that we are increasingly able to point to them how they can make a difference. This will increasingly make Wikidata a place where we improve the sum of all available knowledge and share it widely.

Monday, December 07, 2015

#Wigi - are these gender indicators about #Wikidata or #Wikipedia?
Wigi is a wonderful project. As you can see, it stands for "Wikipedia Gender Indicators" and when you think it is Wikipedia as you know it, it is not. The data is from Wikidata and the data included represents all the Wikipedias and all the other projects that include people.

Its purpose is to inform about the gender gap and, it relies on the quality of Wikidata to do so. This is not a bad thing because a lot of effort goes into the information about people, their gender, their "ethnicity" and the dates of birth and death. This information is mostly based on information from Wikipedia and by giving it this name, WIGI gets the attention it deserves.

The aim of this project is to show what progress is made regarding the gender gap. It shows the change between periods. In this way we get a feeling how much effort goes into writing articles and how it affects significant numbers regarding the gender gap. When some articles are missed, it does not make much of a statistical difference because as the quality goes up, it is unlikely to change the variance much.

It is however important to understand that in this data other biases are hidden. The number of people dying in Syria or Iraq is suspiciously low. It demonstrates a lack of awareness of the culture and the people who live there. With the approach of WIGI it is fairly easy to produce these biases as well and it is obvious that implicitly the NPOV of all Wikipedias is in doubt.

Sunday, December 06, 2015

#Wikimedia and what it could do with money

Some say that the Wikimedia Foundation is having enough money and does not need as much as they do. Some who say such things are fairly influential and are so convinced of their opinion that they happily give their opinion to the Washington Post. It is one way of making their opinion count most and it is sad for numerous reasons.

It is as easy to argue that the Wikimedia Foundation deserves more money. It could do with more money and it would be obvious if the point of view differs just a little. These people are active in the English Wikipedia and they are involved in policies, they do not take it kindly when they are ignored or opposed. The Wikimedia Foundation has as its aim to share in the sum of all knowledge. When we are to evaluate if the WMF has enough money, it should be in the light of what it aims to achieve.

I can imagine that Wikipedia suffices for some Americans, America is relatively well described. I can imagine when young professionals find that their subject is well described in English that because of this they are content, do not see the need for more activity. If that describes there point of view, then they are not aware of the discrepancies with other countries, with other cultures and in other languages. When we are to be afraid of Syrians, Wikipedia shares the blame because it does not cover Syria and Syrians with the same quality it does for Americans.

People may express their point of view. However, when are they held to account for their actions? When Joe Public reads that these fine Wikipedians arguably find that WMF is a fat organisation not worth the money it asks for, the WMF and its fundraising effort suffer. From the point of view of these precious Wikipedians, it is fine. They voiced their opinions before and they may have evaluated what others have to say.

However, they are in their own circle, they will not suffer the consequences. So my question to them is can you convincingly prove, not suggest but prove that we are done, that we do not need more money to do a better job, a job that is to deliver the sum of all knowledge.

In this blog I have said it time and again, we do not even share in the sum of all knowledge available in the Wikimedia Foundation. So I am sad to say that these precious Wikipedians costs us a lot of money, money that we could spend effectively in the quest to bring information to the world.

Thursday, December 03, 2015

#Wikidata - #quality and the Nansen #Refugee Award II

My objective is to add high quality data for the Nansen Refugee award to Wikidata. At this time there are 70 people and organisations registered as receivers of the award. One way of understanding quality is in bringing people closer together.

In the picture you see Mr Ruud Lubbers and Mr Akio Kanai. Mr Lubbers was the United Nations High Commissioner for Refugees and he is presenting the Nansen Refugee Award to Mr Kanai. The only article for Mr Kanai is in Japanese, I do not read Japanese and thanks to Wikidata and Google translate, I was able to learn that the item in Wikidata does represent the recipient of the award.

Mr Kanai and his company have provided glasses to refugees for decades. It is just one of those things people need and refugees are people.

Mr Lubbers is just one of the UN High Commissioners for Refugees. All the others are now also known to Wikidata. Because of this Mr Kanai and Mr Lubbers are easily linked through two steps of separation.

This is a work in progress :)

PS I wish Bassel was a refugee, it would be safe to discriminate him but he would be alive and well. 

Tuesday, December 01, 2015

#Wikidata - #quality and the Nansen #Refugee Award

Many people consider quality to be of a paramount importance and, for all I care I like what John Ruskin had to say: "Quality is never an accident. It is always the result of intelligent effort". For this reason I like to work on things that connect Wikidata items. It is very much the "steps of separation".

At the Wikipedia awards & prizes project people are interested in cooperating on awards in Wikidata. We agreed that I would make the Nansen Refugee Award an example of what can be done. The starting point is the article. The wiki links enabled me to import information using the Linked Items tool. I did this on two Wikipedias, French and English. After this  I was left with a number of red links. In Wikidata I started added dates to the award and nationalities.

All this improved the information as seen in Reasonator significantly. I had to curate two items because two American Football players were incorrectly identified as a recipient of the award. As I searched for "red links" in Wikidata, I found one Wikipedia redirect and added information for Princess Princep Shah of Nepal. Arguably she is notable enough for her own article. When I could not find an item, I added an item for the missing person or organisation.

This is a work in progress :)

PS I wish Bassel was a refugee, it would be safe to discriminate him but he would be alive and well.

Sunday, November 29, 2015

#Wikidata, its fucking amazing quality

Finding this illustration captures in many ways Wikidata. When we are to convince about quality we first have to be totally fucking amazing.

Quality is not an absolute. Quality is in being the best of the pack, in being better that what went before. Some say quality is in making no errors but for me that is a dead end. Quality is in acknowledging errors and dealing with them.

The first thing Wikidata did was bring some needed quality to Wikipedia. It brought all the interwiki in one place and it made for much better interlanguage links and it brought us time to improve on the mess that it was. The other Wikimedia projects and with two projects to go, Commons and Wiktionary it is fucking amazing what a difference it made.

Once Wikidata existed, many people feverishly started to include statements. Its progress can be followed in the statistics set up by Magnus. In a way it is similar to the wild west. Every Tom, Dick and Harry moved in and what has been accomplished is totally fucking amazing. It happened in the wiki way and some watch it in shock because there is no controlling it. Some of what has been done is good, bad and some is totally fucking amazing.

What Wikidata offers is to take much of the drudgery out of Wikipedia. Bassel for instance received the Index Award. The latest prize winners are from the batch of 2013. When this data was from Wikidata, it would be easier to update all fifteen Wikipedias when new data becomes available. It would improve quality and, would it not be fucking amazing to make that happen?

The chair of the Dutch Wikimedia chapter remarked at its annual conference: "So much is about Wikidata". Several of the presentations were about new ways collaborations with GLAM's, the potential they offer is huge. The one person with the biggest impact on Wikidata was mentioned often. It is fucking amazing to hear people from GLAM's say that Magnus's work enables them to contribute to Wikimedia.

All this is happening before your eyes. You have to see it to believe it.

Is #Wikidata a #Wiki?

Initially, wiki wiki was meant to be quick. It was meant to bring you where you wanted to be without much of a fuss. Then it became a metaphor for quick and easy editing; you edit and someone else who knows how to improve it may do exactly that. It was ok to be incomplete, it was ok to be wrong in part.

Nowadays, Wikipedia is celebrating its 15th anniversary and, it is received the Erasmus prize for past performance. Wikidata is in its third year.

Not much of the original notion of a wiki is left. It has benefits, it has drawbacks. The argument for a wiki is inclusion but currently all the excess baggage makes Wikipedia and Wikidata an environment where you can easily feel excluded. It is in practices and it is in language; I am supposed to understand a sentence like: "I think I recall you showing P-hardness of RDFS proper a while ago, which would obviously preclude translation into single SPARQL 1.1 queries (unless NL=P).". I do not and I am not inclined to study sufficiently in order for this to make sense. I am assured that it is not difficult, but hey, do I really need to know this stuff, am I really supposed to make such an effort to be part of it all? Do we need university education or do we need universal tools?

It is the same with the data itself; it has to be immaculate when it is to be included. When it is not "good enough", it is either excluded or it is goes to the data hell that is the "primary sources" environment. When people enter data a statement at a time, such discussions do not take place and it is very much done in the wiki way; quick and dirty at first with improvements following later. Quality is not considered but hey, it is a wiki and we are Wikipedians right?

Poor quality does not have to be a problem when it is seen as an opportunity. It is an opportunity when people are invited and enabled to improve quality. It is an opportunity when our tools are about inviting change in Wikidata itself and consequently bring improved embedded data to Wikipedia.

People do not mind to learn the use of tools when what they learn is directly applicable. People will not mind a challenge when the challenge is realistic and relevant. The problem is very much with the high priests of Wikidom who will sacrifice anything for their perceived consensus. These people fail to consider arguments and became as bad as the people who denounced Wikipedia in the day because it would never work.

#MissingBassel - #Wikidata as a tool II

At the Dutch Wikimedia conference I demonstrated how at Wikidata the data for Bassel Khartabil was enriched. The presentation shows how I added other people who have been awarded the Index award.

There are many ways to measure quality. Bassel is a member of Wikipedia. He deserves to be recognised as such because I hate to think that others like him will see an end to their life for our values. Through the index award, he is linked to for instance Malala. Given the steps of separation, how far off is Bassel from you?

Saturday, November 28, 2015

#MissingBassel - #Wikidata as a tool

We fear for the life of our fellow Wikipedian Bassel Khartabil. Rumour has it that he has been sentenced to death in Syria. We would dearly love to know that he is save and we hope for his release.

I will present at the Dutch Wikimedia conference and my topic is sets and quality. Bassel received the Index award and I will present how to add the award to people who received the award as well.

Some people say that we should not get involved in the case of Bassel because we are .. then they fail to convince me because for me Bassel is not political, he is one of us. Some Wikipedians talk endlessly about what Wikipedia stands for and how important our NPOV is. They however fail to grasp how much Wikimedia fails Syria. Only 732 Syrian citizens are known to Wikidata. Arguably we do not inform about Syria at all particularly not because even Caracalla is seen as a Syrian. As Wikipedia fails to properly inform about Syria, is it that we fail Bassel because he is from Syria as well?

Thursday, November 26, 2015

#Wikipedia - #Freebassel and #Erasmusprijs

Yesterday our Wikipedia community received the Erasmusprijs. It was a wonderful occasion where the king of the Netherlands handed the prize to three Wikipedians :).

It was wonderful. I met with many old friends, it was the first time that I was in the presence of our king and queen, and princess Beatrix.. AWESOME is the word.

One burning topic, something that tempered the spirits, was Bassel, our fellow Wikipedian from Syria who has been sentenced to death. He embodies our values and the values of the Erasmus society. One question that was considered was what to do.

We can steal a page out of the book of Amnesty International and the director of the Dutch branch suggested calling embasies. It did work for Amnesty in the past. We may ask attention for Bassel on the Wikipedia central notice and, this is being worked on.

More ideas came up. I met with old friends from the Tropenmuseum. They are the original GLAM partners and they will consider providing us with material about Syria or Iraq. Thinking along these lines, it could become a "snowball project" where every GLAM is asked to participate and donate from their collection.

I met an old friend from Germany, she came up with the idea of asking refugees to edit Wikipedia in whatever language about their country, their topics. It would keep them occupied, not happy but busy in a positive way and make them Wikipedians like Bassel.

The only thing we can really do as Wikipedians is do what we do best. It is provide information in a neutral point of view. Arguably because of the lack of information about Syria we are not doing what we should do. Recognising this it is fitting when we support Bassel by writing in our own way in our own project about Syria, its people, its history. It is the best message that we can give. Like Bassel we dream of a world where we can all share in the sum of all knowledge.

Friday, November 20, 2015

What kind of box is #Wikipedia

At #Wikidata we know about likely issues at #Wikipedia. The problem is that #Wikipedia does not seem to care. When Wikipedia is about quality at some stage likely issues are important to tackle, they are the easiest way of improving quality.

There are three scenarios:
  • It is incorrect, and Wikidata knows about a correct alternative
  • Wikidata is wrong and needs improvement
  • Both Wikidata and Wikipedia have it wrong
At present, Wikipedia is a black box, communication may go in and it is neither obvious nor visible that quality improvement suggestions are taken seriously. It follows that when Wikipedia sees Wikidata as a foreign body, at some stage all the quality suggestions become toxic and it gets out of the box. Such a box has a name.

Sunday, November 15, 2015

#Wikipedia on #Syria and #Iraq in the light of #Paris

There is no excuse for what happened for what happens. Now that the news of Paris sunk in, lets consider the other side. The other side are the people of Syria, Iraq.. Countries where many people suffer beyond belief. They are from places that have a rich history, brought us many notable people and even when we look for it, we will not be able to find it in Wikipedia or Wikidata.

If there is one thing that is most often true about an "enemy", it is that you do not know them for who they are. Our true enemy is not the people from Syria or Iraq, they are the people that describe themselves as Daesh. By there own definition they are apart from Syrians and Iraqis.

This distinction is important and, it does not help that we know so little in any language about Syria, Iraq, the notable people, the history, the culture. The lack of knowledge is often seen as a necessary component of discrimination and the associated belief that the other is the enemy.

The war is now uncomfortably close, it hit Paris and who is next? Refugees have arrived in Europe and they have their story to tell. To understand these stories, it is important that Wikipedia has enough information to fill in the background. It is vital that Wikipedia, Wikidata knows about those who are not the enemy.,

Saturday, November 14, 2015

#Wikidata - #India and the #Peshwa culture

CIS announced in its newsletter a large donation of Marathi books about the Peshwa culture. It is hard to overestimate the relevance of this gift. It makes knowledge available to 73 million people. It provides sources to the history of a large part of India. This is the text:
1000 Marathi books by Marathi-language non-profit to come online on Marathi Wikisource with Open Access

As the Maharashtra Granthottejak Sanstha (MGS), a non-profit organization working for the preservation of the "Peshwa" culture in Maharashtra, and based in Pune, India, celebrated its 121st anniversary recently, the organization relicensed 1000 books for Marathi Wikisource under CC-by-SA 4.0 license so that the books could be digitized and be made available for millions of Marathi readers. Avinash Chaphekar from the organization signed a document permitting Wikimedians to digitize the books on the Wikisource. On this special occasion of the anniversary, a three-day book exhibition was organized starting October 30.

Answering our question "Could you please share with us your ideas of opening these invaluable books for Wikisource? How they are going to be useful for the online readers to learn about the Peshwas?", Mr Chaphekar says:

“These books are of historical importance and cover topics that are rarely covered well anywhere else. This information should reach to more people. Right after our Prime Minister Narendra Modi recommended to read the autobiography of Benjamin Franklin as it contains a lot of messages for a common man, a lady walked to us once and asked if she can read this Marathi. Such books that were published by the Sanstha should not be kept closed as a lot many readers are searching for such books. We might not have a very great presence in the media or the Internet. How does any reader who does not know us buy a book? If these books are available online they could at least find and read them”
As I follow what is new, I often check what Wikidata has to say. What I find is often a lack of information. There is a wealth of data about minor nobility from the Netherlands. Given the major relevance of a nawab of Awadh or indeed a Peshwa, many improvements can be made to acknowledge the relevance of a major culture

Sunday, November 01, 2015

#Wikipedia versus the sum of all knowledge II

A question was raised again: "Whatever happened to Wikipedia, the encyclopedia anyone can edit"?"  It was meant to be a rhetorical question. It assumes that everyone can edit Wikipedia and do "all" things necessary. It is a funny because practically it has never been true. 

It never mattered. Wikipedia had people operating bots, add sources, add images, add templates and only because of this cooperation Wikipedia was functional. When an editor did not know how to do something, he had to learn new skills or had someone else do the job for him.

Wikipedia is a living project. Things change and consequently the skills needed evolve as well. Sometimes new technology is disruptive and old technology is grandfathered; no longer potent, no longer relevant. 

Three years ago Wikidata made its first appearance. From the start it was disruptive, It replaced the old interwiki links and we all benefited from a much more robust technology. This is however a niche area of Wikipedia so nobody complained.

Wikidata has ambitions; it has the potential to serve the sum of all available knowledge. To achieve this over the years data from many sources, often Wikipedias were harvested and found their place in one integrated environment. At this stage, selected areas of information may be served to Wikipedia from Wikidata. 

We are at a stage where Wikidata is increasingly the objective best place for particular fields of information and where a local Wikipedia becomes a backwater, becomes stagnant. People who care about external sources for instance moved a long time ago to Wikidata because it was much more inclusive. It allowed for easy cooperation and comparison with external sources. It had VIAF link to Wikidata in stead of Wikipedia. 

The issues we will face will be similar to the ones at Commons. Wikidata is  a project separate from Wikipedia. It has its own set of rules, its own set of priorities. Bluntly speaking, its user interface sucks bigtime for newbies and it is hard to grasp many concepts. Have a look at a page like this. It may prove disruptive to Wikipedians in a big way.

The problem we face is that for "grey beards" like me, them olden days are gone. New technology that is obviously superior will replace the current crop of tools. It must do so because expectations of service and quality change. Wikipedia is increasingly used from a mobile phone and we are stuck in so many ways with desktop (not even laptop) technology. 

The sum of all knowledge may be edited by anyone who cares to in the Wikisphere. It may become increasingly easy to do so when we care about the user experience for our editors and are willing to let go of all the cruft we accumulated over the years.

Thursday, October 29, 2015

#Wikipedia versus the sum of all knowledge

Today #Wikidata is celebrating. Its achievements are great. The developments over the last three years have made a difference. Expectations for the new year are good; the glass is filling and half full and the air above is buzzing with ideas.

I comment often on things Wikidata. Most often I have a point, sometimes I am wrong but that is the prize for independent thought. <grin> Blazegraph for instance IS the SPARQL software Wikidata uses. </grin> It is like templates, it is another language to learn and I am spoiled having used Magnus's wonderful toolset featuring WDQ.

As Wikidata matures, it will be a challenge for Wikipedia to adapt. Business is not as usual. It is not the WMF introducing technical novelties, it is another community that knows Wikipedia intimately well and knows where the skeletons are hidden. Wikipedia has never been about the sum of all knowledge. At most it is the knowledge that is deemed notable in one language at a time. Wikidata knows about all knowledge that seems notable in all Wikipedia languages. Wikidata is expected to indulge in "article placeholders". Guess what, they will include what is deemed not to be notable but what is notable elsewhere. Article placeholders as I understand them, will be generated texts. They will not be always that easy or obvious to create. Thankfully we still have Reasonator to fill in the gaps.

The future for Wikidata will include more of the sum of all knowledge. The notion of aiming for the sum of all knowledge gave focus to the effort that build Wikipedia. With Wikidata we already have the sum of all available knowledge in the Wikisphere. It will be surpassed as more linked data will poor into Wikidata. Information that will be new to the Wikisphere, information that Wikidata will happily share with all takers.

The future will be great particularly when Wikipedia is able to adapt.

Sunday, October 25, 2015

#Wikidata - #Blazegraph does it matter?

In an argument that is hardly of relevance for all of us, issues about Wikidata centre around a tool called blazegraph. The tool is server based and as we do not all have a spare server to run a tool like this what does it take to make this relevant?

When issues with a tool have an impact on Wikidata, it is hardly a problem for the rest of us. When people make points like this they are in "insularly thinking mode", it is self centred on them and their ivory tower peers.

This mode of thinking can also be found in their tools that puts them in places where mere mortals do not venture. When asked, Wikimedia Labs is maybe for later. At issue is that integration of such tools for the rest of us is not considered. Labs is exactly there to have all the tools and share result and integrate for better results.

When tools like Blazegraph are only available in proprietary settings, arguments about this kind of proprietary use should be irrelevant. This does not need to be. All it takes is the installation of Blazegraph in a Wikimedia Labs setting for the "rest" of us.

Tuesday, October 13, 2015

#Histography - History explored with #Wikipedia

If you have not seen Histography yet, go and see it first. It is really great fun. Chances are that you do not get back to this blogpost :). Histography is a wonderful user interface created as a final project in Bezalel Academy of Arts and Design.

There are several things that peek my interest; it says: "The site draws historical events from Wikipedia and self-updates daily with new recorded events." They may have written their own software, they may have used Wikidata for it. Suppose that the software is freely licensed and suppose that it uses Wikidata.

Straight away it becomes a potential user interface for all kinds  of tasks. It may use labels from other languages making it multi lingual. It may show the items where there is no such label in a different colour inviting you to add labels, It may even invite you to write a Wikipedia article in that language.

Histography is a wonderful approach. The beauty is not only in what it does, it is also in what it can do.

Thursday, October 08, 2015

#FREEBASSEL: Free culture advocate who built 3D renderings of Palmyra missing in Syria

When the Wikimedia Foundation asks on its blog to campaign for a Wikipedian who has gone missing.. How can I not express my regret that this is necessary and how can I not mourn the destruction that is happening in Syria.

Free Bassel Khartabil !!

Monday, September 28, 2015

#Wikidata - ten questions about #Kian

The quality and quantity of Wikidata relies heavily on technology. When the people who develop their tools collaborate, the results increase exponentially. Amir has made his mark in the pywikibot environment and now he spreads his wings with Kian.

I have mentioned Kian before and I am happy that Amir was willing to answer some questions now that has over 82,000 edits.

What is Kian
Kian right now is a tool that can give probability of having certain statement based on categories but the goal is to become a general AI system to serve Wikidata.
Why did you write Kian
Huge number of items without statement always bothered me and I thought I should write something that can analyse articles and take out some data out of articles.
How is Kian different from other bots
It uses AI to extract data, I have never seen something like this in Wikipedia before. The main advantage of using AI is adaptability. I can now run Kian on languages that I have no idea about them. 
Another advantage of using AI is having probability which can be useful in lots of cases such as generating list of mismatches between Wikipedia and Wikidata that shows possible mistakes in Wikidata or Wikipedia.
Is there a point in using Kian iteratively
With each run of Kian Wikidata becomes better. After a while we would have so much certainty in data that we can assure Wikipedia and other third party users using our data is a good thing
What can Kian do other bots cannot do
First is generating possible mistakes and building a quality assurance workflow. 
Another one is adaptability of adding broad range of statements with high accuracy.
What can Kian do for data from Wikipedias in "other" languages
We can build a system to create these articles in languages such as English since using Kian now we have data about those articles.
Let me give you an example: Maybe there is an article in Hindi Wikipedia, we can't read this article but Kian can extract several statements out of that article. Then using resonator or other tools we can write articles in English Wikipedia or other languages.
What question did I fail to ask
Plans about Kian. What I'm doing to make Kian better. Hopefully we would have a suggesting tool using Kian very soon.
What does it take for me to use Kian
We have a instruction in github you only need an account in Wikimedia Labs
Does Kian use other tools
Yes, right now it uses autolist which makes it up-to-date.
What is your favourite tool that is not a bot
Autolist, Wikidata can't go on without this tool.

Sunday, September 27, 2015

#Wikidata - primary sources tool statistics

It is a good thing that there are statistics for the primary sources tool. It is a dashboard that shows the current state only.

Given the discussion on the usefulness of this tool, this is not really helpful. It does not help any argument because everyone will be given different numbers at a different time.

Compare this with useful statistics for Wikidata. Here values are available that show trends over time. Consequently action can be undertaken based on the numbers. It would be really welcome that as part of the creation of these statistics, current numbers for the primary sources tool would be included.

Either way, success or failure, statistics help when people agree that numbers are relevant.

Saturday, September 26, 2015

#Wikidata - #Freebase atrophies in the Primary sources status

It is a good thing that there are statistics for the primary sources status. It demonstrates clearly how dysfunctional it is. Only 18K statements have been approved. After all the time that the tool exists, it is not even one percent.

For a "serious" power user it is quite possible to do add this number of statements in a day to Wikidata any day. The sad thing is there is every reason to believe that the quality of a power user is just as good as anything that is in this dump in the first place.

Mathematics show that it is easy to check and verify the data that is in Wikidata with other sources. When such a process is well designed, it is iterative and consequently adding data that is deemed useful for inclusion in Wikidata will be processed in every iteration.

These sad statistics demonstrate one thing and one thing only; the failure that is in this approach. It would be wise to abandon it and concentrate on workflows instead that leverage the value that is in the huge community that may serve fixing issues.

Sunday, September 13, 2015

#Wikidata - What is the #Buikslotermeer

In the history of the Netherlands, land was steadily disappearing. The peat that was the land was replaced by water and this process increased in speed as lakes increased in size. One solution was to make a polder out of a lake. It worked well and it resulted among many others in the polder of the Buikslotermeer.

As the city of Amsterdam grew in size, a new part was called after the old polder.

Wikidata needs disambiguation between the two. One of the reasons is that a picture like this one, is about the polder and not at all about the neighbourhood of Amsterdam.

The polder will have statements about things like when the dikes broke.

Saturday, September 12, 2015

#Wikidata's embarrassment of riches

Wikidata is improving its content constantly. Proof may be found in people pointing to issues and the follow up it generates. They add data, change data and remove data; Wikidata is better for it.

With the official Wikidata Query being live, it is even easier for people who understand SPARQL to query, compare and comment on Wikidata's content. As mentioned before, it is in the comparison of data that it is easiest to improve both quality and quantity.

For this reason it is an embarrassment how a rich resource that is Freebase is treated; it might as well not exist. It lingers in the "primary sources tool" a lot of well intentioned work is done. In Q3/2015 there may even be a workflow to include even more data in there.

Probably, this tool is only relevant for static data and, that is not necessarily the best. Actively maintained data is much to be preferred.  When I understand things well, people may tinker with it in this data dungeon and it is then for the "community" to decide upon inclusion in Wikidata. It is not obvious what its arguments could be. It is not even obvious how any data will compare to the quality of Wikidata itself. Its quality is not quantified for quality either.

Once data is included, there are many ways to curate the data. It is done by comparing it against other sources. It is obviously a wiki way because it invites people to collaborate.

Monday, September 07, 2015

#Wikimedia - more #contributors or more #editors?

The foundation of the Wikimedia projects are its people. Whatever effort these people do, the more it generates data. The data may be in the form of text, images, sources, software or statements but it is all about mangling it into information. The question is not so much what has value for us all, as it all has its own value, its own merit. It gets its value from the people who take an interest.

The question is how to generate more merit. How do we get people involved to do their "own" thing. One way of doing this is by not seeking for the perfect solution. Yes, we can do a lot in an automated way. However, this will only get us mostly more of the same and not necessarily more of what is of interest to some.

Consider, there are people who demand better quality. When all they can do is look helplessly from the sidelines, they get frustrated. When you give them something to do, they have a choice; to put up or to shut up. Personally I care about human rights so I enrich content related to human rights. The data is not perfect but I notice improvements. I notice when other people contribute as well. It feels positive.

The problem with many tools is that they are great for what they aim to do but once they get into the grey area of doubt and uncertainty they flounder. Technically the negative results from Kian are perfect. It is just that it does not make it easy for people to work on these results. It is not obvious what result will be enough for Kian.

What we really want is tools that people can use, tools that are as obvious as we can make them, tools that have descriptions and workflows. Tools that do not need nerds or developers to use. Tools that can be used by you and me. Tools that get us more contributors. Contributors that like me work on a subject they care about.

Sunday, September 06, 2015

#Wikimedia - improving #search

One "key performance indicator" for search is the number of people who get zero results for a search. The objective is to make the number of people who do as small as possible. [1]

In the early days of the Dutch Wikipedia, a librarian was always happy to explain how he improved search results at his library.

His first observation was that he needed to know what people could not find. The observations were aggregated in timeslots. In this way he knew what people were looking for. His favourite observation was "People are stupid; they do not know how to spell". Allowing for the most prevalent spelling errors improved the results a lot. The other part was that people were looking for things the library did not provide.

The message for the Wikimedia search team.. Consider publishing the known errors aggregated over time. Have the community mark the spelling errors as such and use that to serve content anyway. The other part where we do not have data, consider that Wikidata has more information than any Wikipedia, when results do not exist as articles, publish what people seek and there might be a community out there adding missing articles.

#Wikidata - Tiberius, Modestus and Florentia

Tiberius, Modestus and Flotentia are three Catholic martyrs. There is a Dutch article by that name and Kian has it on its "problems list".

Lists like these "lists of possible mistakes" are necessary. At best they are an evolutionary step between having no awareness and being aware of issues. It would be wonderful when there are workflows for fixing issues in a way that prevents them from reappearing on such a list.

For these three martyrs new items were added and they are linked to the item for a "group of people". Joseph Guislain is in the Dutch article also a museum in Ghent. It is easy enough to add an item for the museum and link him to the person. But will it fix this issue for Wikidata?

Workflows for the issues that we face would be wonderful:
  • when done, an item should disappear from a "to do" list
  • it should be more obvious what it is that will fix an issue
  • when we identify martyrs or whatever, we should involve people who are into the related subjects
Magnus has many tools that have people fix things. They are workflows we could adopt and by doing this make them even easier to use.

#Wikidata - #StrepHit, the package damaged the message

When a good idea is posted, the message of the announcement can completely blow it away.

First the good news. StrepHit has the potential of becoming a valuable tool for new content for Wikidata. It is all about Natural Language Processing and consequently it is all about harvesting facts from text. The idea is to harvest structured facts and provide references for statements and harvest references for existing statements. This is really welcome, it may prove to be important.

For the bad news, the plan is based on a number of awful assumptions that prevent it from being taken seriously at first glance.

The best thing the authors can do is appreciate that what they are building is a tool. A tool that analyses text, a tool that can be trained to do a good job. A tool that can be integrated with other tools. A tool that is not defined by particular use cases or assumptions.

When it runs in an optimal way, it is much like Kian. It runs and makes changes to Wikidata directly. This week it added 21.426 statements with a very high rate of certainty. Problematic data is identified and lists are created and this is where people are invited to make a difference.

Kian works in the Wiki way, it does its thing and it invites people to collaborate. It does not assume that people have to do this that or the other. Contrast this with StrepHit where the author suggests that people should not be allowed to add statements without references. If that is not enough, it will not even add data to Wikidata but considers the data it generates a "gift" and condemns its data to the "Primary sources tool". It is a sad place where valuable data lingers that is not finding its way into Wikidata.

StrepHit and tools like it may become valuable. Its value will be in a direct relation to how it integrates in other tools.  When it does it will be great, otherwise it will sit in its corner gathering dust.

Thursday, September 03, 2015

#Wikidata - #Wikimedia Public Policy

To make its point about its public policies absolutely clear, the Wikimedia Foundation dedicated its own website to it. It is well worth a visit, it is well worth it to give this subject a good think.

In Wikidata the discussion was started on one of the more important Wikipedia policies; its "BLP" or Biography of Living Persons. Obviously, Wikidata does not have a BLP because it does not have biographies. We do however have data on living people and data on people can be as libelous as text. Talk about "hard data"...

With data on people, there are all kind of potential issues. It may be incomplete, it maybe wrong, it may be problematic. It is obvious that Wikidata has its problems with quality, this blog has mentioned them before.

When Wikidata is to have a DLP or Data on Living Persons, there are two parts to it. The first is having a way of addressing issues. The second is a way to prevent issues arising.

When issues arise, much of the best practices of the BLP can be adopted. Yes, have sources, Yes, investigate the sources. But first things first, have a policy, have a place where issues can be reported.

The question of quality is in two. Typically Wikidata does not have enough data to be balanced. This can be remedied in many ways. We should be more aggressive in adding data, this can be by cooperating with other sources and by investing in tools like Kian. The other part is in being sure about the veracity of the available data. This is also something where tools will make a difference.

Both a BLP and a DLP are important aspects of a Wikimedia Public Policy. Wikidata shows its maturity by not having had reason to have its DLP. Something to be grateful of.

Monday, August 31, 2015

#Wikidata - Kian and #quality

Last time Kian was very much a promise. This time, after the announcement by Amir, Kian is so much more. Kian is a tool that can be trained to identify items for what they are. Training means, that parameters are provided whereby the software can act on its own and based on likelihood will make the identification or list it as a "maybe".

Obviously once it is known what an item, an article is about, so much more can be deduced. That is something Kian will do as well.

The thing that pleases me most, is that Kian for its learning makes use of autolists, it means that Kian became part of the existing ecosystem of tools. Eventhough hard mathematics are the background of Kian, it is relatively easy to train because prior knowledge is of value.

In the announcement mail Amir asks for collaboration. One area where this will be particularly relevant is where people are asked to decide where Kian has its doubt. It currently uses reports in the Wiki but it would be awesome if such questions can be asked in the same environement where Magnus asks for collaboration.

Yes, Kian makes use of hard scientific knowledge but as it is structured in this way, it makes a real difference. It is possible to learn to train Kian and when ambiguous results can be served to people for a result, Kian will be most glorious. Its bus factor will not be Amir.