Words and what not: November 2013

Friday, November 29, 2013

Isabella Karle - A notable scientist

A nice article about Mrs Karle indicated that while her husband was awarded the Nobel prize, and he felt that she should have been in on that same prize, she was awarded more prizes than he did.

This controversy is in itself not that interesting. More relevant is what Wikidata has to say about her. It turned out that many of the awards she received were missing and, that is easily enough rectified. What you learn in the process is how to indicate an "honorary degree". That Gregori Aminov does have his Wikidata item and that the full name of the "Bower award" is the "Bower Award and Prize for Achievement in Science"..

One award she received that still needs to be added is the "Women in Science and Engineering’s Lifetime Achievement Award". Multiple sources indicate that Mrs Karle received this award but there is no Wikipedia article about this prize and, the sources do not indicate who awarded this prize. Google does not really help either.

Diversity is and particularly women in science are considered to be important. Making sure that the information about notable women in science is of the highest quality is one way to celebrate their accomplishments. It would be really good if someone is able to add the lady scientists who received this lifetime achievement award.
Thanks,
GerardM

Thursday, November 28, 2013

A #brilliant idea barnstar

User:West.andrew.g provides really neat information about the English #Wikipedia. He shows the "popular redlinks".

This is a weekly list of the most requested red links on Wikipedia. It was developed as a corollary of the Top 5000. While this list contains many non-human views, this list can help identify notable topics and search requests that should be created or redirected. This list displays redlinks that have 1000+ views in the prior week interval (run on Sunday mornings, UTC locale).

Information like this is so valuable; it shows what people are looking for and cannot find. It is highly likely that adding information about these subjects will result in many more page views. As a Wikipedia becomes better able to provide information, it will at some stage reach the tipping point where it is really useful. For many Wikipedias this is still a goal we are aiming for,
Thanks,
GerardM

Wednesday, November 27, 2013

An #encyclopaedia in #Konkani

The #Wikimedia blog announced that the Vishwakosh was re-released under a free license. This is really welcome news for Konkani speakers and, it allows for the inclusion of the 3632 pages in a Konkani Wikipedia.

When the Vishwakosh is to stay in its current format, it is no Wikipedia but a Wikisource. Given the text in the blog, they want a Wikipedia and by definition the texts will change. There will be updates that reflect more recent developments.

But first the Wikipedia has to be released from the Incubator. With so much involvement of so many people, it must be easy to have all the localisations done in next to no time.

The language committee can expedite it when there is movement in this direction.
Thanks,
GerardM

Tuesday, November 26, 2013

Fixing #Wikidata labels using a concept cloud

The new "concept cloud" is great; you may have read about it elsewhere and as I know about it for some time, I have been using it a lot lately. It is ever so addictive.

It is probably the most obvious way to make sure that information can be found on a subject and all the subjects that surround it. Contrary to Wikipedia where it is wise to stay clear of subjects that are dear to you, there is nothing stopping you from making sure that information on your subject is well established.

Here are the things I have learned so far:

the spelling of a lot of labels need improvement.. things are not capitalised
many items have little or no statements
refreshing a concept cloud gives a sense of accomplishment
many items have no labels in a language I know.

Please help with Korean, Farsi, Arabic and add English

There is "method to the madness"; as more items gain labels in your language, it becomes more important to utilise this information in search requests. Particularly for the smaller languages this is the only feasible way to get to the tipping point quickly where we provide a reasonable service.

Thanks,

GerardM

Sunday, November 24, 2013

#Wikidata HOWTO - #Commons categories

The basic building blocks for a multi lingual #search engine for Commons already exist. They exist in #Wikidata.

To realise this is actually something quite relevant. Consider the example of Q726 or horse for instance. "Horse" is known in 178 languages and consequently there is already a word for it in these 178 languages. To find it in even more languages someone just has to add the right label.

It works by linking Q726 to the Commons category "Equus ferus caballus". It is the item for the concept itself that needs to connects to the Commons category, not other categories nor lists nor whatever.

The Commons category provides you hopefully with loads of images and while everything that is written on that page may be unintelligible it gets you more horsies. When you are adventurous you will find that Q146 gets you kitty cats and Q144 will get you doggies...
Thanks,
GerardM

Friday, November 22, 2013

More of that heady stuff

The blogpost about the "heady stuff of Wikidata" proved interesting. Many people read it and recently there was a lengthy reaction from Mike Bergman. It is quite interesting because he indicates that the work he does may be used for Wikipedia and Wikidata as well.

So have a read of what Mike had to say. <grin> Maybe there will be a follow up </grin> ...

Thanks,

GerardM

Hi Gerard,

Sorry my comments on this are coming in a bit late; I only recently stumbled upon your thoughtful post.

For some time, as one of the editors of UMBEL, I have been convinced that much could be done to improve the category structure of Wikipedia. In fact, our most recent formal release of UMBEL has mappings to about half of the content of Wikipedia, most done by hand or using heuristics. Our hope in doing this work is to provide consistent organization, faceting, inference and semantic search over Wikipedia's contents. The extension into Wikidata is only raising the importance of a tractable upper structure.

Though there are literally hundreds of hours in our effort to date, it is not easily repeatable nor scalable. Further, there are many of the longer-tail concepts in Wikipedia for which heuristics and manual approaches simply are not appropriate.

Our efforts in this area have led us to look at many alternative mapping approaches, many of which we have tried ourselves and tested internally.

The best that we encountered that moved in the right direction was Aleksander Pohl's work with OpenCyc and Wikipedia. After about a year of discussions, we have just reached a formal agreement with Aleksander and his team to extend his prior efforts, only focusing on UMBEL as the mapping target this time. (Though the same new approaches may again be extended to OpenCyc.)

The market will likely tell us if small structures (as you cite), medium structures (such as the 28 K concepts in UMBEL), larger structures (the 300K or so in OpenCyc), or other structures (such as SUMO or whatever), will best fit the bill of better organizing Wikipedia content and making it tractable. We are committed to placing our approach into the public domain soon.

We should have some results to share on our UMBEL-Wikipedia mappings shortly (weeks, not months). We will do what we can to announce this availability, and we look forward to your and the community's comments and response to this effort. We think we will have a winning approach, but only the market will tell!

Good luck on your own efforts, and let me know if we can be of any assistance.

Best, Mike

#Wikimaps loves #Wikidata

Wikidata has potential. That much is clear. What is not so clear is when a tipping point is reached for a new use of its data.

The image to the right is that it is from a presentation by Susanna Ånäs about the "Wikimedia Old Map project". In it Wikidata is already very well connected. It makes sense because it may include maps and by using qualifiers it can be obvious when that map was drawn. Wikidata is also where you can add old names for everything that is on the map. Last but not least, Wikidata has always been able to include names in other languages.
Thanks,
GerardM

Wednesday, November 20, 2013

#Wikidata & #Freebase - an #interview with Denny Vrandečić

When Denny signalled the availability of data that links Wikidata to Freebase, there was a lot of follow up and several questions were left unanswered. So I asked my questions and, I am happy with the answers I received. So enjoy this interview about the Wikidata and Freebase connection.
Thanks,
GerardM

By moving to California you embrace yet another culture, language.What does it do to you, your family

My wife is from Uzbekistan, I am from Croatia. We met in Israel. She has lived the last few years in Estonia, I in Germany. We have moved to California, and we are very excited about this move. Finally to live in a country where we both speak the language! And also, we have high hopes with regards to the weather.
The US in general, and San Francisco in particular, is a real melting pot - or rather fruit salad, as they say these days - of people from all over the world. This seems to be a good match to our own background, so we are looking forward to see what this will mean for us.

What is your job description and title at Google (and what is it you actually do)

My job title is "Ontologist". I am working on Google's Knowledge Graph. The Knowledge Graph for Google is basically what Wikidata is for the Wikimedia projects: a repository of structured knowledge about the world, that is used in many different ways in various different applications. I am working, together with a great team, on the schema and ontology of the Knowledge Graph: its data model, schema, and the way it captures and represents knowledge.

What is Freebase and how does it compare with Wikidata

Freebase is the publicly available and editable part of the Knowledge Graph. Freebase and Wikidata are very similar. At the same time, there are quite a few differences: the user communities, the incentive architecture, the license, the way sources and knowledge diversity are handled, to name the big differences. There are also minor differences in the data model, the way classes, properties, types interact with each other, the UI, the workflows, the prominence of internationalization, currently also the size and scope of the knowledge bases, etc. Wikidata currently has 22 Million statements, Freebase has 2 Billion facts. Now one should always be careful with such simple metrics, and especially these numbers are not comparable, but it hints at some differences in the knowledge base.

Do you work on FB or is this only an outward facing project

Freebase is a part of the Knowledge Graph, and I work on the Knowledge Graph, so yeah, I also work on Freebase.

With a link between FB and WD it will be possible data can flow from WD because of its license. How is it for a flow from the other direction

I am not a lawyer, but unfortunately it seems that the licenses used by Freebase and Wikidata would not allow for that direction. We are aware that this is not good, and we are working on ways to fix this.
But even without actually letting data flow from Freebase to Wikidata, such an alignment can be valuable for the Wikidata editors: bots could compare the data, and flag inconsistencies. Bots could do simple comparisons, like, go through all the countries, compare the capitals, and report on the differences. And this can be done not only with Freebase, but with any other structured or semi-structured base that Wikidata connects to. One example is the work that Maximilian Klein did, comparing the sex/gender of authors connected through the VIAF ID.

The data for the FB<->WD connection is based on a WD dump, How will it be maintained in the future

This is not decided yet. In principle, we could create this dump regularly, but I am unsure if this will be needed. We will have to see how things develop before deciding on what to do next.

When info exist in both WD and FB and it differs, what would you like to see done

The difference to be fixed, obviously! Both knowledge bases may and do contain errors, and the ability to compare them can lead to an increased quality level in both. It is obvious that the respective communities should take care of such errors in their knowledge bases. By providing links between the knowledge bases, this should get easier to automate.

Would that be a model for collaboration with other sources like VIAF

Yes, in many cases that should work. It is clear that some sources and their communities are more open to corrections than others. Wikidata can become a central hub for identity on the Web, collecting and reconciling IDs from many different sources. What makes Wikidata so interesting in itself is its commitment to let everyone share in the sum of all knowledge - both in participating in creating it as well as in accessing it. The barriers are so much lower than almost anywhere else. In which other knowledge base can you easily fix an error?

Does FB have sources for its statements

Yes, but this is quite a different notion from what Wikidata does with sources. When a dataset is being uploaded to Freebase, the source for this upload is usually recorded. It is closer to what is usually called "provenance", whereas Wikidata's notion of a source is closer to what is called a "reference", an external authority that makes the given claim. A strength of Wikidata is the diversity of the sources, and that they can be refined later by the community. A statement in Wikidata can have several sources (which makes sense if you think of them as references supporting the statement), but in Freebase every statement only has one source (which again makes sense if you think about it being the provenance of that statement, from where this statement came from).

spot the differences

What do you prefer, no diffs or sourced diffs

(I don't understand the question, and rephrase it to: do you prefer an unsourced statement over no statement?)
In most cases, yes, I'd rather have an unsourced statement than no statement at all. In a perfect world, all statements in Wikidata would have great, authoritative sources. But for now, I really think that Wikidata should be lenient with regards to unsourced statements in most cases. There are obvious cases where this is not true: data about living persons has to be more carefully sourced, especially when it has the opportunity to hurt the person. Also, we have to remember, that a Wikipedia community can decide to use a statement from Wikidata only if it is sourced, and to drop it otherwise. Once Wikidata has matured a bit, I expect it to move towards a more stricter policy, and to see tools develop that help with getting there. That will be a quite exciting time.

What FB feature would you LOVE to have in Wikidata

The expressive query features of Freebase. The Wikidata team is hard on working towards enabling this, and I am very much looking forward to see it happen: this will open a whole new world of possibilities for everyone using Wikidata's data, and it will lead to much more visibility of the data and help to refine the Wikidata properties.

Tuesday, November 19, 2013

#Wikidata HOWTO - Some properties are best used as a qualifier

Some Wikidata properties are practically by definition qualifiers. Consider time for instance, there is the "point in time", the "start date", the "end date" and all of them are in relation to something else. For instance when a Noble prize was awarded, when a King ascended to a throne and when for whatever reason he was no longer the ruler ... "Le roi est mort. Vive le roi !!".

Working on Wikidata items is very much a learning process. As you go, you learn and come to conclusions. For me it is now obvious; properties about time and dates are qualifiers.. always.
Thanks,
GerardM

Sign language in #Tunisia can have a #Wikipedia too

On #Facebook I noticed what looked like an encyclopaedic article. It was about the poet Aboul-Qacem Echebbi.

When asked, they were interested in having a Wikipedia. Once a request has been made on Meta, the procedure to determine eligibility will start. In this case it is a formality as both native "speakers" and an ISO 639-3 code are present.

Once this procedure has been completed, it will be permitted to edit Wikidata as well.

In the Language Committee a discussion has started to consider if the completion of a set number of fully labelled items can be accepted as part of fulfilling the current requirement of the 200 Wikipedia articles. Reducing this number to 50 will make it more easy to complete the writing requirement. Currently many of these articles are not much more than sub-stubs.
Thanks,
GerardM

The Case for Localizing Names, "part 3"

My friend Amir wrote twice ([1], [2]) about the need for the localisation of names. The need for localisation is rather obvious when the original name is in another script. But names written in the Latin script can be as foreign. Take for instance the Czech author and screenwriter Jiří Růžička, I do not know how to type his name and, I am sure 98% of the users of Wikidata will be able to do so either.

When a name uses characters that are not in use in a language, it follows that the name is unlikely to be found in a language and consequently transliteration is in order. The name as originally written is not an alias. It is something different. It probably needs an attribute of the type "multilingual text" for support in Wikidata.
Thanks,
GerardM

Monday, November 18, 2013

#Ask and it will be #given to you II

Sharing in the sum of all knowledge is the aim of the #Wikimedia Foundation. The challenge is to bring it to all people of this world. Given that its projects are Internet based, it is reasonable to concentrate on the people connected to the Internet.

With one extra line in the USER/common,js it is possible to expand the search by adding results from Wikidata. This has been mentioned before.

What is new is, that it will now include references to articles in Wikipedia and pictures in Commons. Just click on the Commons logo and you go to the Commons category for the subject when it is known in Wikidata. When you click on the Wikipedia logo, you get a box with the Wikis you can choose from.

Really cool is that the Reasonator will provide you information in YOUR language. It is probably the best incentive to add the labels missing for your language. Remember, labels are used everywhere.

Talking about Reasonator, we are looking for a nice logo that we can use...
Thanks,
GerardM

#Ask and it will be #given to you

Reading about Ask 1.0 is like getting the sensation of falling through a rabbit hole and ending up in Wonderland. You read that it is the query language that is already used by Semantic MediaWiki, it allows for the creation of lists of any item in a Wikipedia article.

Semantic MediaWiki was great but this, this allows for the use of Wikidata !! It is written by the developers of Wikidata!!

Please can we have it... like yesterday ???

PLEASE

Thanks,
Alice

Sunday, November 17, 2013

#Wikimedia #HOWTO - Supporting dyslexic people

One small detail that deserves much more attention is the ability to support people who are dyslectic. This support is thanks to a wonderful little project called OpenDyslexic. It is a font that is designed in such a way that most persons with dyslexia find it a lot easier to read.

There are several reasons why our support for dyslectic people deserves more attention;

seven percent of a population is dyslectic
people who have found how to enable this font are happy
we do not know how many people use OpenDyslexic
we are told that people find it hard to enable the font
OpenDyslexic could be used for languages like Polish
some languages cannot use OpenDyslexic because characters are not supported

The CEE conference in Modra was really constructive; so many subjects were discussed including OpenDyslexic. The bug to enable OpenDyslexic for Polish, indicates that many of the things discussed are actionable and are being acted upon.
Thanks,
GerardM

The #Wikidata workshop

At the #CEE meeting in #Modra, it was a pleasure to inform about the latest about Wikidata. At the conference the idea of a workshop was launched. The question posed was: "show us what is already there, show us what we can already do".

Wikidata workshop from Gerard Meijssen

There is a lot to show. There is the progress that is made by the development team, there are the many tools created by Magnus and, there are the templates used on several Wikipedias. People are impressed but for many purposes it is not good enough. This is understandable, Wikidata is not feature complete. Many things we want to do are under development.

If there is one thing I wanted to achieve is that people understand that while Wikidata is incomplete, it is already functional. There are great tools that emulate what people are looking for. There are things that do make a difference today .. Tipping points have been reached for several uses and as more people make use of Wikidata and include the data that is relevant to them, more tipping points will be reached.
Thanks,
GerardM

Friday, November 15, 2013

#Wikidata HOWTO - Lists are evil

Ivan Gašparovič is the current president of #Slovakia. During a demonstration of Wikidata in Slovakia, it became clear that he was linked for the office he holds to "list of presidents of Slovakia". This was considered to be wrong as there is an article and therefore an item for the "president of Slovakia".

It did not long to fix this and fix this for the former presidents of Slovakia as well. So now the labels associated with the list is wrong in many languages and, there is no label in languages like French and German for the office Mr Gašparovič is holding.

All in all, many Wikipedias refer to lists. In Wikidata this does not really work.
Thanks,
GerardM

Thursday, November 14, 2013

#Wikidata HOWTO - micro tasks

When you have some spare time, there is always something that does not take long to do. The main page of the English Wikipedia mentioned the scientist Michael E. Brown. In 2012 he was awarded the Kavli prize for astrophysics. It does not take much time to make a statement about this prize on the "item" for Mr Brown and on the the items for the other recipients as well.

Yes, many of them received other prizes, but that is another job well worth doing.
Thanks,
GerardM

#Wikidata HOWTO - generated descriptions

#Disambiguation in Wikidata is typically done based on the descriptions of items. Most items do not have descriptions so it is typically a struggle to select the right items. Enter a nice little hack by Magnus called "autodesc"; it automagically creates descriptions on the fly based on existing statements.

It is one of the most valuable hacks I know off. All it takes to activate it is adding this one line to your common.js page.

mw.loader.load('//tools.wmflabs.org/wikidata-todo/autodesc.js');

Many hardcore Wikidatians do not yet know about this little tool. Once they find out how useful this is, they will not go back to adding descriptions but they will add the missing statements that identify the items in any language. Like me they will consider this an obvious improvement in the Wikidata user experience.

Thanks,

GerardM

#Facebook knows about #SignWriting

I do not know any Tunisian Sign Language, what I do know is that it is only one of the sign languages that are written in real life in the SignWriting script. Valerie Sutton developed this script. There are many publications that corroborate this. Valerie also developed DanceWriting to document the Danish ballet tradition.

These are two reasons why she should be notable enough for a Wikipedia article in the English language. When you compare this with the many politicians and people playing a sport, it is incredible that they are considered to be notable enough for an article.
Thanks,
GerardM

#divcon - #UNESCO "#women in African history"

#Gender equality is one of the global priorities of UNESCO as well. As such, the teaching of history has a crucial role to play since it enables the understanding of cultural dimensions, and highlights the social, political, and economic conditions in the lives of women in past societies.

"Women in African history" highlights twenty women. They should all have Wikipedia articles in all the languages used in Africa.

Obviously there are many more African women notable enough for our attention but as these women have been picked out by UNESCO, they and subjects associated with them, deserve the tender love and care that makes them high quality items in Wikidata.
Thanks,
GerardM

Wednesday, November 13, 2013

#Russian #Wikipedia presents #English #Wikipedia

A real surprise can be experienced on the Russian Wikipedia; their article about the English Wikipedia has a template that shows a nice info-box. What makes it so special is that the fourth Wikipedia in page views is using Wikidata to present its information.

An often repeated set of questions has been: "When will the bigger Wikipedias start using Wikidata for its information? When will the tipping point be reached when they will update information at Wikidata. When will any and all other languages benefit from the effort from the big boys".

When big players start to rely on Wikidata, the items they are interested in will get more attention. The resulting high quality information will become available for all these subjects on all the other languages as well. It now becomes possible that the Russian point of view will be supported by the highest quality information.

For Wikidata to support what some may consider a neutral point of view it is important that all items will gain high quality attention from all our communities.
Thanks,
GerardM

Tuesday, November 12, 2013

#divcon - The #Wikidata presentation

Diversity conference 2013 berlin from Gerard Meijssen

I had the privilege to explain how Wikidata can be leveraged. Wikidata is different in so many ways. There are potential benefits for any language, any special interest and any platform. The easiest win is by providing information about subjects that cannot be found in an other way. However, the biggest benefits will be realised when the data that we include is as good as we can make it.

Thanks,

GerardN

#Wikidata HOWTO - Add value by labelling what is in the #news

If there is a purpose to Wikidata, then it is to share in the sum of all knowledge. Wikidata provides a service to over 280 languages and if they are all to share in the same sum of all knowledge we have a challenge on our hand.

Take Typhoon Haiyan for instance, A lot of specific information can be given. The date it became recognised as a typhoon and the date when it was no longer a typhoon. The country or countries it passed maybe even maps that show this progression. When everything is said and done, there is the potential for a substantial number of statements and qualifiers.

Obviously there are other items that are still of interest; typhoon, Philippines, United Nations are all related to the aftermath of this typhoon. When we ask attention for the information we can currently provide in Wikidata and for the associated items, we will be better prepared for a next tropical storm.

Only a few are needed to set up the data properly but we need 280 times attention to make sure that people can find our information in all our languages. That attention is our biggest challenge.
Thanks,
GerardM

#Wikidata #HOWTO - Nelson Mandela

Nelson Mandela was the President of South Africa. Many statements have been made about him on Wikidata in the past. At the Diversity Conference Wikidata was a topic in several conversations. One question was how Wikidata could be leveraged in the many languages of South Africa. In that same conversation the question was asked how can we give people micro-tasks that have a big impact.

I asked the chairman of the South African chapter to add all the labels used on the item for Nelson Mandela in his native language of Tsonga. He did get some instruction but he was left very much to himself to do this. When he had a look at Jacob Zuma, he noticed that because of his earlier work several labels were already available in Tsonga.

What was really clear to him was that adding labels to statements and items are micro-tasks and, that kids can do them. Anyone can do them. Seeing the effects of the work done is really motivating.

As a thank you I wanted to present Nelson Mandela looking good in Tsonga. To do this I had to learn from Magnus how to show the Reasonator in other languages. My choice was to use Chinese as the fall back language. This is just to make obvious that there are still a few labels missing :)
Thanks,
Gerard Meijssen

#Wikidata #HOWTO - Lilavati, the curious daughter

Lilavati was the daughter of the famous mathematician Bhāskarāchārya. The first part of his book was dedicated to her and from its text the love for his daughter is obvious and the wish for his daughter to follow in his footsteps.

The book Lilavati's Daughters very much celebrates the daughters of India who became scientists. These fine ladies are the subject of an editathon in India. The objective is to write articles in the 20 languages of India that have a Wikipedia or an Incubator projects.

It is not easy to coordinate what work has already been done. By adding the "Lilavatis's daughters" as the subject of the book in Wikidata, it becomes a place that links all the articles for them together. It does not take much time for someone to add the names for these fine ladies in their own language. It does not take much time to have a look at the statements made for them to see how much could be added.

Some of these ladies received awards like the Jnanpith Award and, when you think it is relevant for the whole world to know who won this award, you could add the other price winners as well. When you add statements describing these ladies and their achievements, you create pointers to what could be in the article. When you state that they are a "scientist" and "female", people like Emily will find them and get a growing understanding about the lady scientists that have a Wikipedia article in any language.
Thanks,
GerardM

Saturday, November 09, 2013

Ottoman #Turkish for #Wikidata

Labels in Wikidata provide an item with a name in a language. Historically many subjects have had different names during their lifetime. Often these names were in different languages and sometimes in different scripts as well. Using labels, statements and qualifiers it is possible to register all these names. Jakarta existed before the Dutch raised it to the ground to build Batavia. When the Dutch left Indonesia, Batavia was named Jakarta again. The original Jakarta was named in Javanese and the new Jakarta is named in Indonesian.

Original sources are in its original language and, when an original text is processed, it helps when it is made easy to link text automatically to Wikidata items. When we have a source, we already know a lot about it. It was written at a specific time by a specific person in a specific environment and this knowledge helps the matching of texts to concept known in Wikidata.

With labels recognised as Wikidata items labels in other languages can be used to translate a text. They can also be used as a key to enrich the experience by including maps, illustrations and Wikipedia articles in the language of the user.

The language committee of the Wikimedia Foundation changed its policy for Wikidata; all languages recognised in the ISO-639-3 will be granted the status of eligibility when asked. This will make it easier to include the Javanese for Jakarta or the Ottoman Turkish for Istanbul.

Thanks,

GerardM

Friday, November 08, 2013

#divcon - #Search, #statistics and the proof of the pudding

When you search #Wikidata, chances are that you have a better result for your language than in Wikipedia. As the proof of the pudding is in the eating, there are 5,801,862 items with a label in English and, there are 5,319,467 items with a link to Wikipedia. Based on this comparison, there are 482,395 more labels than links.

These numbers are provided yet again by a new tool from Magnus. This is not to say that we cover the sum of all knowledge; Google does a better job at that. What these statistics show is that we have a perspective on doing a better job at providing information.

When we look at people with the Reasonator, you get a glimpse of what can be done with visualisation. It shows the information in an ordered way and it is easy to notice what is missing. When we present you with all the female painters, it does not take much to notice that women from Scandinavia, East Europe or Asia are mostly missing. This is not even to say that the lady painters from the Anglo-Saxon cultures are well covered in Wikidata. It is however exactly the point; this lists motivates to find more lady painters. You are welcome to fiddle with the query to find data out that motivates you.

These tools are a hack; they demonstrate a future that needs to be build for us all to use. As the future gets build, it needs to scale and it needs to be usable in any language. There are so many pieces to that puzzle, the questions are many; what will be build first and, who is going to do all that and when will we be happy.
Thanks,
GerardM

#divcon - The #train from #Amsterdam

#Wikidata supports coordinates. All the train-stations in the Netherlands have been labelled with information by the Railway Taskforce. I will be going by train to Berlin Hauptbahnhof and from there by S-bahn and U-bahn I think to my destination.

When I have some spare time I will add information that will end up on this map.
Thanks,
GerardM

Thursday, November 07, 2013

#divcon - An #anonymous #female #painter

At the Dutch #Wikimedia conference, Jane gave a presentation about female painters. This is the kind of presentation that would have done well on the diversity conference.

Female painters were a curiosity; they often painted with their fathers, husbands and brothers. Her presentation provides numbers for known female painters. They are few and far between.

To add insult to injury, as there are so few sources for these women, they are said to be not notable and articles were lost as a result.

So far Jane did most of her work in spreadsheets at home. Wikidata offers her the opportunity to include all the data she has so painstakingly collected. They are names, aliases, dates of birth and death.

The illustration is of an Italian painter that is unknown..
Thanks,
GerardM

#divcon - Valerie Sutton found on the #Occitan #Wikipedia.

Valerie is notable because she invented #SignWriting. SignWriting is a script that enables people to write their sign language. At this moment there is no article on the English Wikipedia for her. There is a shitload of sources in many languages mentioning her achievements, all the material on the SignWriting website is explicitly freely licensed and I am for multiple reasons the wrong person to write her article..

Anyway, one of Magnus's latest hacks allows me to configure my Wikipedia profile with the additional functionality of including search results from Wikidata. The Occitan Wikipedia is my preferred project to experiment with language support from Wikidata. I am really pleased with the result and this hack allows me to find information on Valerie on the English Wikipedia as well.
Thanks,
GerardM

#divcon - Ali Ekrem Bolayır an official from the #Ottoman Empire and #Turkey

Arguably the Ottoman Empire is as relevant as the Roman Empire. The only argument why the Roman Empire gets more attention is, because it has been the basis for many ideas in the Western cultures.

Wikidata does cover more subjects and consequently it is better at making a step at representing the "sum of all knowledge". One of the people known to Wikidata is Mr Bolayır. It is certainly a challenge to type his name; I know how to copy paste it...

The Turkish language changed from the Arabic script to the Latin script in the 20th century. It does characters that are not in general use in the Western languages and consequently people do write names like Mr Bolayır differently; typically it becomes Bolayir.

Without this transliteration, people will not find Mr Bolayir when they search for him. Without his name in the Arabic script, it will be hard to link him from original sources. For all these reasons it is important to include the transliterated names people use when they search for subjects like Mr Bolayir.
Thanks,
Gerard

Wednesday, November 06, 2013

#divcon - #Search beyond the tail

#Wikipedia aims to "share in the sum of all knowledge". It is an aspiration that works better for some languages and not as good for others. Wikipedia will provide you with the articles it has and, Wikidata allows you to find items with a label in your language.

Wikidata knows an item for each article and, the name of an article will serve as a label. As the number of article is different for each Wikipedia, there is now a new report that indicates the number of links per language in Wikidata.

At this moment, the English Wikipedia has 5,317,999 articles, this is just 37,9% of the items in Wikidata. Many of the missing subjects may not be notable enough for the English Wikipedia but this does not mean that we do not know about them. Just add a label in English and another item can be found, just add an alias and it can be found in a different way as well.

We do not know what people are looking for but we do know that people are not properly served with in most languages. When search is enriched with all the added labels in Wikidata, people will gain access to Wikipedia articles in other languages as well as to a pointer to Commons. They may be even interested in Wikivoyage. Our aim is to share in the sum of all knowledge and Wikidata provides a platform to serve data and choices.
Thanks,
GerardM

Tuesday, November 05, 2013

#Archive #KIT saved by #Library of #Alexandria

When governments cut spending, the results are often disastrous. The subsidies of the Royal Institute for the Tropics were slashed and, consequently its activities and its archive were no longer funded. The archive is enormous; 7 kilometres of books, magazines and more.

If it was not for the Library of Alexandria, it would all go into the shredder. Much of the information in this archive is unique and it contains information that is relevant for the Netherlands and for many countries in the tropics as well.

The contract for the transfer has been signed and on the one hand it is a relief that all this information is kept on the other hand it is a disgrace that an archive of such prominence is only saved at the very last moment.

The Library of Alexandria is well known for its ability to digitise books. With a little bit of luck they will decide to digitise the KIT archive. In this way the information that is in this archive is not completely lost to people doing research in the Netherlands.
Thanks,
GerardM

#Wikidata - Date of death IV

Wikidata has currently only 175,586 people who are dead. We know about them because their "date of death" has been registered. Obviously, this number is pathetic. The records held by the grim reaper needs to find their way into Wikidata as it did to the many Wikipedias.

When you consider that we know about 1,314,659 people, we find that our current death rate is 13,3%. As more than half the Wikidata items have no statements at all, it seems obvious that the number of people is likely to grow faster than the number of deaths.

A friend pointed out that when you analyse many Wikipedia articles, the text is just words based on what can be entered in Wikidata. The example he gave was Mr Gamani Corea, he was a Sri Lankan economist, civil servant and diplomat and died on November 3th. The Reasonator visualisation supports his argument.

At this time there is no article for Mr Corea in Tamil and Sinhala, the languages of Sri Lanka. Following his recent death it seems obvious that people will look up Mr Corea in their Wikipedia. When labels or Tamil and Sinhala are available, it is possible to do more than provide a not found screen. We could do a more reasonable job and provide something based on the Reasonator visualisation in stead.
Thanks,
GerardM

Monday, November 04, 2013

#Wikidata - Date of death III

As we are into #death, there are other websites that provide information about the dearly departed. One of them is "Find a Grave". It includes many different approaches to the people and their graves.

What is really cool is that they often know about people who are dead before we do. Our inimitable Magnus saw a great opportunity and he linked their information to Wikidata. The result is really nice; at first glance there seems to be a lot of information lacking in the WMF. However when you drill deeper, you find Wikipedia is aware of the coming and going of the grim reaper. Tomorrow there are bound to be more of his visits to report.

Another friend indicated that the list of 2013 deaths on the German Wikipedia is the nbr 13 of most watched pages. It will be interesting to learn how popular the Wikidata list of deaths from October 1th is.
Thanks,
GerardM

Lajos Kalános, a 2003 Academy award winner

Contrary to popular opinion many subjects that are notable and relevant for people who read English have no article in the English Wikipedia. Winners of the Academy award qualify as having enough notability for an article.

Lajos Kalános won the Academy award in 2003. Wikidata did not have an item on him, there was an article about him on the Dutch Wikipedia.

Wikipedia and Wikidata are not in competition. Far from it. Both offer complementary information. Mr Kalános died October 8th in Naarden.
Thanks,
GerardM

Not in #Wikidata

When a #Wikipedia article is not registered in Wikidata, it has an issue. For the world it is as if it does not exist. You are not aware of the statements that have been made on the subject. The readers of your article cannot compare the facts with what is known in other languages..

PS Thanks for linking the Ovadia Yosef article :)
Thanks,
GerardM

#Wikidata - Date of death II

#Notable people like any people do die. The feedback from my blogpost was really positive; people added several recently deceased notables in Wikidata. People died in Germany, France, Indonesia that we know of.

All in all there are currently 154 people identified as departed after October 1 2013. This list is really interesting because it shows for the first the time the distribution of interest in these people over the different Wikipedias.

Contrary to popular opinion, many subjects can not be found on the English Wikipedia. It is perhaps as surprising that Nigel Davenport has no article in the nl.wikipedia and that Ovadia Yosef has no article in the en.Wikipedia.

The beauty of this wonderful list created by Magnus is. that the underlying data is regularly updated. You can include more recently departed from your Wikipedia in Wikidata and you may notice the growth of the list.

A list of people likely to be dead is this one. They are people with a date of birth before 1900 and who are not known to be dead yet. All in all there are over 1300 of these zombies. Ovid among others makes the list.'

It is popular these days to decide on a mode of transportation to escape the walking dead. My advise is to make sure they are dead by registering the date of their demise.
Thanks,
GerardM

Sunday, November 03, 2013

The #Wikimedia Report Card

The Wikimedia Report Card is the result of the analysis work done on the metrics of the WMF. When you read or know about its history, you will agree that it takes considerable effort to produce relevant information.

In the data for the report card only the 25 biggest projects are considered and, the number of editors is de-duplicated to allow for people who contribute on multiple projects.

The one thing missing in these numbers is Wikidata as a project. It is missing because it should already be the number 5 in the list of active editors. Like Commons, its user base should be considered because many people join Wikidata as their prime project.

There are other reasons to consider Wikidata as well. The aim of the WMF is to share in the sum of all knowledge and arguably, Wikidata is more representative off all knowledge than Wikipedia. Combine Commons to illustrate and Wikidata to provide facts on all items of knowledge and you have the basics for information dissemination in any language.

Another aspect of Wikidata is that it knows implicitly about Wikipedia and its coverage. Combine Wikidata with the accumulated traffic data for a subject and it will be known what the English reading public is missing out on.

It will be a great improvement when the report card does recognise Wikidata for its success. The number of active editors is probably fairly easy to fit in.
Thanks,
GerardM