Tuesday, December 31, 2013

Animated #SignWriting

I learned today about animated SignWriting. It is very much alike to an audio recording. The difference is that an animated gif like this will make it easier for someone who signs to learn SignWriting. The gif is an abstraction of real life movements and SignWriting is fixed, not animated, and uses the same symbols.

If it were possible to have gifs for all the words used, you could present them in the order of a book, an article.

English #Wikipedia uses #Wikidata data

One of the tipping points for Wikidata usage is when the English Wikipedia starts using its data. It has. There is even a category for templates that use Wikidata data. This category is linked to Wikidata as well and there are more Wikipedias that have a similar category.

This list is obviously not complete; the Occitan Wikipedia for instance is not included.

It is exciting to learn about this at the very end of 2013. Next year will be even better.

Sunday, December 29, 2013

#Wikidata - how many ontologies do we need?

When you check out items in Wikidata, chances are that they have links to one or more external resources. The picture to the right shows how DBpedia is connected to even more external resources. As you may know, DBpedia and Wikidata also have a very tight connection.

In a way, this can be compared to the many connections Wikidata has to the categories in the many Wikipedias, in Commons. All of them have structure of some kind. Information is inferred and acted upon. The category "United States Army Medal of Honor recipients" refers to people who received the Medal of honor.

When Wikidata is linked to external sources, these links have to be there for a reason. It can be to compare information or it can be to add missing information either way. Applications based on the ontology structure of the external sources can apply their logic on Wikidata.

Yes, there may be a purpose for yet another ontology structure in Wikidata. It may allow for implied information in tools native to Wikidata. But there is no point to all of them when we do not make use of them.

#Wikidata - how to reach its tipping points

If #Wikidata is to be useful, it has to be usable. What it is usable for, depends on who is using it and in what way. The information about Muhammad Shah, a mogul from India, is easier to digest when formatted by the Reasonator than by looking at the raw information in Wikidata. There is more information in a Wikipedia articles BUT, there are only 16 Wikipedias who have an article about him.

Currently there are labels in 19 languages for Muhammad Shah and consequently most of the Indian languages will not find him in their language.

Most people will not care that much about this mogul. They are more interested in what is in the news. The references shown are related to what is considered to be in the news in any language. Once people added labels missing in their language, items that are likely to be of interest can now be found. At the same time, everywhere where the associated items are referenced, these labels will be available as well.

Doing this helps build enough information for Wikidata to become usable in a language. But it still needs to be used. That is at this time very much a tripping point.

Adding Wikidata Search to a Wikipedia search helps. It gives Wikidata an application. More applications are possible. A separate place where the Wikidata information is leveraged would help. In this way pictures from Commons can be found, information presented by the Reasonator and articles in the available languages.

The beauty of Wikidata is that it takes so little effort to have a big effect. However, this effect is only realised when Wikidata is used. The question is very much what are we going to do to reach the tipping points and bypass the tripping points.

Friday, December 27, 2013

#Wikidata handwaving II

Being an academic can be a blessing and a curse. What it implies is learning. This means that an academic has mastered "much of the understanding of certain subjects. One of the biggest challenges of someone who has understanding on an academic level is, being understood by "other" people. To achieve this, a clear understanding combined with simple sentences and words is needed.

This type of language is knows as "Jip en Janneketaal" in the Netherlands. Expressing yourself in this way is as hard as expressing yourself in a foreign language.

The next challenge for an academic as expressed in Wiktionary, is "having no practical importance". The combination of unintelligible language and no practical relevance is an absolute killer in any conversation. This is when helpful people are so important; they point to the W3C when the Wikipedia article about OWL ass-umes too much.

The argument about upper ontologies and classes is for me academic. I understand the notions expressed in the W3C document but they are not implemented as a system in Wikidata. I understand the idea behind the database reports with "constraint violations" but there are no constraints in place and not much is done as a result of the reports.

The "Wikidata generic tree" shows how much effort has gone in building something that resembles an ontology. It has 17,612 entries! There are reports with constraints that people think are important. All this work needs to be championed.

Tools like Reasonator work better when an item is recognised as a human, male or female. There should be tools that help an editor add relevant information. The use of such tools should be obvious, it should be default and most importantly they should be up to date. When they are, everybody will use them. As a result we will get more data and we could have something like an ontology as well.

The Laurent Clerc #challenge

Laurent Clerc is credited as one of the developers of the American Sign Language. He is also a co-founder of the American School for the Deaf. The Gallaudet university issues a yearly award in his name to a deaf person for his or her outstanding contributions to society.

Wikipedia has at this time articles for three of the forty-three recipients of the "Laurent Clerc award".

This can be improved upon and it is why there is this "Laurent Clerc challenge". The objective is to have more information available on the Internet about deaf people who are winners.

To take part in this challenge you can:

Thursday, December 26, 2013

#Wikidata handwaving

The Libris prize has been awarded 20 times and, there are 20 links to the item for the Libris prize. Sounds good? Actually no. The prize was warded twice to the same person. It appears that there is an issue.. Wikidata:Database reports/Constraint violations/P166

What this amounts to is a lot of well intentioned hand waving. Reports like this are regularly generated and as far as I can tell nobody bothers. My understanding is that it is part of an approach to Wikidata that does not conform to what it is.

Wikidata is by design free of restraints. Reporting constraint violations only make sense when restraints exist and are managed. As these lists are not updated when "constraints" change, they have all been invalidated.

#Wikidata - The State prize of the #USSR

When Wikidata is to be relevant, it has to include sufficient data that is sufficiently structured. The question is very much how this relevance is to be achieved and as a consequence it has an impact on what data to add.

When Mr Kalashnikov died, a lot of effort went into adding to the Wikipedia articles about him. His Wikidata item gained many statements as well. This is great because as Mr Kalashnikov died recently, we know there is a lot of interest about him.

One of the awards he received was the State Stalin Prize. Obviously he was not the only person to receive it and, the English Wikipedia has a category with them. But so does the Russian Wikipedia and it contains many more people.

The Stalin Prize became the USSR State Prize. Its Russian category again is the most complete. Using such categories we can use "Widar" to enrich the items for the winners of awards.

When you approach what is in the news with attention to specific details, it will make Wikidata more informative. It is the best that we can do given that there is no data that indicates what people are really looking for in Wikidata.

Wednesday, December 25, 2013

The #sense of a #year

There are many calendars. When a year is completed in a #calendar,  a cycle is completed and a new year starts. Not all calendars are based on earth rotating around the sun. Even when they do, they do not have the same starting point nor the same exact length. There is no standard defined by ISO for a year.

A good sense for the concept "year" is problematic. It is very much something that is culturally defined. The English article starts by defining a year as "the orbital period of the earth moving around the sun" to finally acknowledge that a year is more than that. The article is mainly about all the variety that makes a year hard to define.

There is a practical side to all this, The year 2014 could be the year 2014 HS or 2014 CE, 2014 AD, 2014 AH or 2014 AM... It is possible to define the period of this year in any calendar but there should be no doubt what calendar is in use.

The Wikidata item for "2014 AD" defines 2014 as a year that is part of the Gregorian calendar. In this way all years of any calendar can be defined; it is just a matter of qualifying what calendar is used.

Mikhail Kalashnikov

All that is too complex is unnecessary,
and it is simple that is needed 
A friend asked me some advise, he wanted to know how to add some awards Mr Kalashnikov had received. I challenged him to write something about it. I am happy to publish it. :)
Mikhail Timofeyevich Kalashnikov is best known as the inventor of the AK-47 assault rifle. He died on December 23, invoking a host of volunteers working on his Wikipedia entry in numerous languages.  
Moreover Kalashnikov is an interesting subject for Wikidata: he received +50 decorations, awards, medals and other honours which allow us to perform queries or populate infoboxes.  
But there is more to the Russian small arms designer. The 6 books he wrote, his participation in WW2 and basic information about his family are not added to Wikidata so far. 
Enhancing Kalashnikov's Wikidata entry allows us to find new connections that have not been visible before and helps anyone to gather basic information about Kalashnikov in their native tongue.

Reasons to be cheerful

#Wikimedia #Labs had its issues lately. There were problems with file systems, with tools that failed. There was not much that could be done about it.

The good news is that Labs does get attention. Support is given on IRC and the feeling I got was that there was a gentle progression from the big issues to the smaller ones.

Widar was not working for me. As things had become quit, there was room to look at the code. There was an easy way to make it more efficient, to make the code perform better.

Such attention is a gift. It helps me and other users of the tool. Such service is a reason to be cheerful, to be grateful. It also makes labs a pleasant environment.

Tuesday, December 24, 2013

#Wikidata - Statements per item

The two best #statistics for Wikidata show the statements per item and the labels per item. They are relevant because each in their unique way indicate the usefulness of Wikidata. Items become useful when they are related to other items and as more items get connected they make up a network of knowledge. The network is still immature as most items have none of only one statement connecting.

The items that are getting the most tender love and care are probably the ones that people are most interested in. At this time it is blind luck when these most popular items get the attention they need and become part of the best connected. Certainly the items that are most popular with the Wikidata contributors do get more connections. People similar to the Wikidata contributors will love what gets done in Wikidata. As our community becomes more diverse, the Wikidata data will become more diverse as well.

Who to recruit, what are we missing ... where is the data that allows us to be data driven?

The "labels per item" indicate how many languages have a word for an item. At this time most items are known to only one language. Many items have or should have additional labels; this is particularly useful when someone is known by multiple names.

The most popular subjects like countries and heads of state typically have over 100 links. Consequently they have over 100 labels because each article name does serve as a default label. A bot runs regularly to make sure that there is a label for each link.

You will appreciate that items with more than 10 labels are the best known subjects. They are likely the most sought after subjects as well. All other subjects need labels in many more languages. Among them are the ones that indicate cultural bias. Items with labels in languages like Farsi, Hebrew, Georgian, Chinese only are the ones that I fail to give labels in Dutch. There is a need to use dictionaries to populate Wikidata with more labels. This is still a new frontier.

One function of the "concept cloud" is to find related items and their labels in "your" language. It is probably the best way of adding missing labels to relevant items.

What we need is more people adding labels in more languages. We have some tools to keep them occupied. Knowing what items are most used and sought after is what we do not know. They are probably the missing items that have the most impact.

Again,  where is the data that allows us to be data driven in optimising Wikidata?

Monday, December 23, 2013

Solar #hijri #year 1328

Many different #calendars are in use. We are approaching the year 2014 of the "common era". This year is however not in use everywhere. This is for instance reflected in the "concept cloud" for Jordan. You will find a reference to ۱۳۲۸ or 1328 HS. When you see a date with the addition of HS, it is a date in a solar calendar that starts with the hijri or Muhammad's migration to Medina (622 CE).

This solar hijiri calendar is quite different from the "common era calendar". The first six months have 31 days, the next five have 30 days, and the last month has 29 days in usual years but 30 days in leap years. It begins on the vernal equinox as determined by astronomical calculations for the Iran Standard Time meridian (52.5°E or GMT+3.5h). This determination of starting moment is more accurate than the Gregorian calendar for predicting the date of the vernal equinox, because it uses astronomical calculation rather than mathematical rules.

At this time you can only enter "common era" based dates in Wikidata. This is sad because we do lose precision as a result. When something occurred in a the year 1328 HS for instance, three months are in another "common era" year. When you look for illustrations in Commons for the solar hijri calendar, you find them in the same category as the moon based or common hijiri calendar.

In Wikidata, a bot has run to add Latin labels for Dutch and English for the items for the years of this calendar. When you then check "2013 CE", you find its labels are problematic. Check out the statements for "year", it presumes that a year is a "common era" year. It does not define itself as being a year in what calendar. Obviously a year like "1328" is not well defined, as it could be in several calendars.

Thursday, December 19, 2013

#Awards and #Wikidata

When people are awarded for their accomplishments, it is proof positive of their notability. The Breakthrough Prize in Life Sciences is in the news because the awards for 2014 have been announced.

At this time, Wikidata has a complete list of all the award winners. All these people are known to be human, their sex has been defined and obviously they are known to be the winner they are. Obviously there is more to be known about all of them.

There is no category for these award winners yet. The content of such a category could be defined by the result of a query; it might include local red-links but it does include all the winners of this award.


Happy birthday to the fa.wikipedia.org; at this time it is the 15th Wikipedia in page views. When you consider that it is the 11th Wikipedia in growing these page views it is obvious that it has a fine future ahead of it.

Happy 10th anniversary and I hope many more people will share in the sum of all knowledge in Farsi as well.

Wednesday, December 18, 2013

Finding #diversity in the #Ottoman Empire

In #Wikidata everybody is more than welcome to work on the subjects that they consider relevant. There is so much to do that everybody can be the king of his hill. As you gain more experience, it becomes obvious that a previous approach may not really be that good. The information about the Sultans of the Ottoman Empire could do with improvement.

Indicating that the sultan was the sultan is elementary, adding relevant qualifiers makes it informative. One other bit of information that is really important is identifying the mother of each sultan. All these women are known as the "valide sultan" and they were among the most influential people in the Ottoman empire.

Their influence was obviously theirs because in Islam going to paradise relies heavily on having parents and particularly a mother that is happy with their offspring. It follows that learning about the women in the Ottoman empire will shed light on many connections. As a sultan had several wives and concubines in his harem there was no lack of offspring. The future for most of the boys was bleak; many died at the time when one of the brothers ascended to the throne. Many of the girls were married off and strengthened the position of their husband and became really influential as mothers themselves.

These women were well educated. Their position is in a marked contrast with how some perceive Islam should treat women. It is well worth to highlight this aspect of the Ottoman treatment of women.

Tuesday, December 17, 2013

Hans #Rosling - about enriching #information II

There are sources and there are sources. In a previous blogpost I mentioned the many awards granted to both Gapminder and Mr Rosling.

On the Gapminder website they are sufficiently grateful and mention the many awards. This should make it easy to register them in Wikidata as well. It should but not always.

Take for instance this one:
2011  - Grierson Award, Best Science Documentary - The Joy of Stats
The only Grierson award I can find is the "Dr. George Grierson Award" and it is an award for writers of outstanding works in Hindi Literature. One alma mater of Mr Rosling is St. John's Medical College and, it is based in Bangalore, but somehow I think it does not fit.

Or take the other award, the Gannon Award, it has been awarded at least twice and once to Mr Rosling but the information referenced on the Wikipedia article is no longer there.

I trust that Mr Rosling appreciates that it is important to ensure that the data about him on Wikidata is correct. With a little bit of luck we can cooperate and make the item that is about him and about Gapminder of the highest quality.

Monday, December 16, 2013

Congratulations Sue !

Facebook has it that Sue Gardner will receive the "Knight Innovation Award". It was even referenced by a source as well. 

Congratulations Sue :) Have fun at the ceremony!

P.C. Hooft award 2014

In the #Netherlands it is breaking news that the premier award for the Dutch language has been awarded for 2014. This news results in an update to the Wikipedia articles about Mr Otten and it is a reason to add statements to his Wikidata item.

When you read the Dutch article, you will find that Mr Otten is well appreciated; many awards and even an honorary degree is mentioned. When you add this information you will find that for many awards there is no label in English. When you read the English article, you will find that the honorary degree is not for literature but for theology.

Most previous P.C. Hooft award winners are present in a category on the English Wikipedia. Thanks to Widar it is easy for Wikidata to know them as well.

Writing in more than one #Wikipedia

A small minority of editors write in multiple Wikipedias. Research was done if they write on similar subjects and, they do. The paper suggests that they may serve an important function in diffusing information across different language editions of the project, and prior work has suggested this could reduce the level of self-focus bias in each edition.

The paper is interesting but this conclusion deserves a healthy dose of cold water. In order to reduce a self focus, you have to look at the numbers involved. English, contains only 51% of the articles in the second-largest edition, German. English has 4,402,073 articles and German 1.662.589.

In order to have a noticeable impact on either the German or English Wikipedia many articles need to be affected and the number of multi lingual editors is too small for that. This research concentrates on the top 46 language editions and the effect of writing in multiple Wikipedias is probably more pronounced in the smaller language editions. There are indications that acceptance of contributions on the English Wikipedia is particularly problematic for foreign writers. This has people write only in their own language or give up.

Interesting is that Wikidata has been used as a tool. "Wikidata avoids some previous issues with out-of-date or conflicting interlanguage links. Further study of the impact of the Wikidata project on Wikipedia and its editors is not within the scope of this paper, but would be a fruitful area for future research".

As people start to appreciate Wikidata for its capabilities, Wikidata will become used for so many applications among them searching for images, finding articles in another language, visualisation of information, populating infoboxes. More relevant in the context of the topic of this research, it will open up the information that does not have an article yet because what is called the "self focus".

Condensing the concept cloud

The concept cloud is a #Wikidata tool that indicates what #Wikipedia articles link to a given Wikipedia article in any and all languages. The idea is that is will help you decide what might be included in a language.

It also allows you to check the labels in your language. Make sure that the spelling is correct and add any missing labels. Quite often you find labels that have not been linked yet. In the screen shot you find items in Nepalese.

Linking them up is part of creating an improved concept cloud. What is really gratifying is when you find halfway that other people are working on merging items as well.

Saturday, December 14, 2013

Hans Rosling - about enriching #information

2012 - Time 100 most influential people
Mr #Rosling and #Gapminder are synonymous with bringing more awareness about the world we live in. If you do not know either of them, you should.

At Wikidata it is a given that you are free to work on the subjects you care for. There are many facts that can be included in Wikidata about Mr Rosling including the many awards he received.

The list at the English Wikipedia is impressive enough. When you want to add all this and related information to Wikidata, it quickly becomes interesting. Take for instance the Illis Quorum, it was introduced in 1784 by king Gustav III of Sweden. Currently it is awarded by the Government of Sweden and it is the highest award that can be conferred upon an individual Swedish citizen. It is awarded to seven people per year, on average.

Or consider the 100 most influential people according to Time. Mr Rosling is only one, who are all the others?

There are the awards the English Wikipedia is not aware of like the "Kunskapspriset"...

There is so much to do. It is good fun to do it. All in the name of "the sum of all knowledge".

#Wikidata - #Death

The tool for the slightly morbid inclined, informs you who "Find a Grave" knows to have deceased. The method in this madness is that Wikidata has items by the same name, they should have the same expiration date when it is about the same person.

When you run the tool, you are provided with information as available in Wikidata. The kind of things you find are people not known to be dead. You find articles that should be merged. There are also people that do not have a Wikipedia article.

Tools have different applications, even a tool about the dead has life in it.

#Discrimination in #Wikipedia

A definition of discrimination is: prejudicial and/or distinguishing treatment of an individual based on their actual or perceived membership in a certain group or category.

When you analyse the screenshot above, you find a good example of a perceived membership in a certain group or category. You will find two categories that are mutually exclusive. Someone is either an "American people of Jordanian descent" or he is "Jordanian people". You cannot have it both ways.

There is a practical point to it as well. Because of this lack of common sense, a category like "Jordanian people" can not be used to populate Wikidata with people from Jordan.

My challenge to the Wikipedia community is: explain to me why I am wrong in considering this discrimination?

Friday, December 13, 2013

#Wikidata HOWTO - It is in the news

The "concept cloud" of what is in the news, the right livelihood award is topical. According to the article, it is a prestigious international award. One purpose of a concept cloud is to make sure that the information relevant to a subject is well presented and that statements are added.

Awards like this one have been given over many years. Wikipedia often has a category with all the laureates. This is a perfect opportunity to use Widar to add the fact to all the laureates that they received this award.

Obviously you can use it for other categories as well.

Wednesday, December 11, 2013

#Wikidata - sorting sort off

The latest Wikidata functionality allows you to add an arbitrary order on the order of statements. Frankly, arbitrary order is not a good idea. Order and the ability to sort is a good idea.

The problem with arbitrary order is that what makes sense to some is stupid for others. It is also ground for endless bickering and implied points of view.

The other part that is suboptimal is that this implied order is on the level of individual items. Having a consistent order for items that are alike provides consistency.

When you consider data, order can be provided by sorting on a specific attribute. The name is an obvious candidate to bring order. It is however not static as alphabetical order relies on the alphabet used. When numbers or dates are available as qualifiers, they do provide order in an obvious and acceptable way as well.

If computers are good at anything, it is in sorting. If people are good at anything it is in making a mess. For me the order of things is best left to the natural order that exists in the result of sort algorithms.

#Wikidata - using #Wikipedia categories for best effect

Many people who study Wikipedia find its categories relevant. They certainly provide a great start. Once there were only 13 Eritrean nationals known to Wikidata. Adding thirteen more by hand did not take long. This was done by adding people mentioned on categories like the "Category:Eritrean people stubs".

That was cool. The latest tool by Magnus called Widar makes it even easy to make Wikidata aware of Eritreans. It is just a matter of selecting the right category and making sure that only Eritrean people are selected.

As you can see in the query part of Widar, you have to watch out for the fluff that is endemic in categories; the lists and even a few US ambassadors to Eritrea.

Now that it is possible to populate Wikidata from categories from any and all Wikipedias with "Widar", studying Wikipedia may go to the next level because studying Wikipedia updates Wikidata and it improves the use of Wikipedia in return. With more statements on more items automated disambiguation becomes ever more powerful. With more statements "Reasonator" becomes more informative. With more statements the Wikipedias that include info boxes will be more informative as well.

The one reason why Wikidata might not be used in this way is  because the data acquisition changes the subject that is being studied. Then again, as the reflection of the information in Wikipedia improves, the conclusion based on the queries performed on Wikidata will be increasingly informative. Even better, the information will always be near real time and it reflects any and all Wikipedias.

Monday, December 09, 2013

The best #Sinterklaas #gift ever

One type of gifts at a Sinterklaas party are the #surprises. They may come with a rhyme, with special packaging and some surprises are beyond special, they are personal; they have an impact. The wife of my nephew is an emergency ward physician. Well educated, highly trained, with great bed site manners, all in all a lovely lady.

She explained when the topic turned to writing, why for her writing in the same way as her husband who writes professionally is impossible. The conversation moved in the direction of reading and how hard it is for her. Differentiating between "rr" "m" and "nn" is hard; it takes two three tries to know for sure what word is in front of her.

This sounded like dyslexia and, I asked her to have a look at Wikipedia with the OpenDyslexic font enabled. This was an eye opener for her. Every word showed itself the first time. Reading was not the big effort it usually is.

The OpenDyslexic font is a free font. It is easy to download. You can use it in Windows, Linux or MacOS. It can be used with Chrome and Firefox. You can select it in many Wikipedias.

For me the surprise was in learning how much of a struggle it was for her to learn to read. My respect for her has grown considerably. Learning first hand how much of a difference a different font makes is different from knowing this intellectually.

The configuration of language settings is well hidden on a Wikipedia page. My concern is how dyslexic people will find the functionality that is there to help them.

Sunday, December 08, 2013

Saturday, December 07, 2013

#Wikidata - people known to be #African citizens

#Diversity is more than caring about the male female ratio. #Discrimination is not only about the colour of your skin versus the colour of my skin. As I have been working on the concept cloud of Nelson Mandela, I have been adding labels for many South Africans. I then realised that I could add their nationality. it proved not to be hard to add more than 1% of the people known to be South African.

Given the large number of English speaking people in South Africa, it is obvious that the number of people known to be citizens of Eritrea or Ivory Coast will be even less. Add only one person as a citizen of these countries and you have added more than 1%.

Friday, December 06, 2013

Nelson Mandela died II

this is NOT the Nobel prize
Nelson Mandela is one of the most admired politicians of our time. This picture has him with the trophy for the World Cup Football, a reward he did not win :). The number of awards, accolades, prizes, honorary degrees, citizenships he did win is staggering. So big that there is even an article listing them all.

You find these things when you work on the "concept cloud" for Mr Mandela. One of the many prestigious prices he received is the Jawaharlal Nehru Award. This award was presented to him by the government of India. All of the people who were awarded the prize are now linked to this award. Given the relevance of the people who won the award, it felt weird to add a new Wikidata item for one of them.

One other item that is there to be found is "Qamata". He is a deity of the Xhosa and, I have no clue what statements to add for him. I do not know what is sensible and does justice to the subject.

Thanks to the concept cloud, you can learn what Mr Mandela is associated with. Improving the quality of that information is my way of showing respect to a great man.

Thursday, December 05, 2013

Nelson Mandela died

The news that Mr Mandela died is world news. Extra news bulletins inform about Mr Mandela, his relevance is explained to those who do not know. Wikipedia articles are read by many people.

It is what you do. If you are like me, you want to do something as well. At Wikidata it was already known that he died. So I have been thinking what else there is to do.

I am going to spend half an hour working on the data that is in Mr Mandela's concept cloud. I will be adding labels in Dutch for the items that do not have a Dutch label. I may explore some items and see if I can add some statements.

When you are like me, you want to do something, it is a way to grieve. Maybe you will follow my example and dedicate some work in his memory.

#Wikidata HOWTO - #disambiguation

The Wikidata #search results are sometimes not as informative as you would like. Above you find the result when you search for "ricerca". On many lines you get in wonderful English "Cannot auto-describe" it would probably make more sense to have something like "Nessuna descrizione automatica".

This translation into Italian is probably as helpful as the English message. As far as the software is concerned, it cannot create a description and, it says so. What the software needs in order to create a description is statements on the individual items. These combined with labels in YOUR language enable automated descriptions.

One of the lines says: "Instituto nazionale di ricerca metrologica" .. possible statements could be:
  • country - Italy
  • is a - institution
the article indicates that this institution is based in Turin and there are probably many more statements that can be made. Once these are made, there will be one line less that the software cannot describe.
'     GerardM

For the love of #Statistics - Get the numbers, right?

In statistics, results depend on the data they are based on. The data is selected in many ways but it starts with the questions: "what is it that we want to know" and then "when we have this information what is it that it tells us".

Consequently, when it is the considered opinion that Wikidata will be mainly be used by bots, its page views are not relevant. When Commons is considered to be the repository for use in WMF projects, we do not consider downloads and other usage from outside the Wikimedia Foundation.

When we change these assumption, we will evaluate existing data differently and end up with statistics that will be different. Our own reports show where our assumptions result in a problematic representation of how things are. The "Report card" for instance does not include Wikidata as one of the biggest projects when we measure individual contributions.

An increasing number of Wikipedias are using Wikidata to find information be it Wikidata based, Commons based or even Wikipedia based. Initial reports are that people like it. We don't know how often this functionality is used. We do not know to what extend people are adding statements or labels in order to improve the disambiguation. We do not know how many people reach Commons and find an image they are looking for and download for their own use.

We do not know to what extend Wikidata provides the only results when searching a Wikipedia. We do not even know what people are looking for and fail to find.

Statistics are meant to be actionable. It is not that hard to change the existing software and identify that pageviews were the result of Wikidata based functionality. It makes sense to do this when the data is considered useful. The best motivation to do this is when statistics are given added importance when the results are actionable.

This is why we want these numbers and improved statistics. This is why we need the WMF statisticians to share this journey with us.

#PanLex - its advisory board

PanLex is a system that is being developed by The Long Now Foundation. PanLex aims to help express any lexical concept (such as “democracy”, “elongate”, “à la carte”, or “Africa”) in any language. Its aim is to preserve language diversity.

I have been asked to be on its Advisory Committee and I have accepted. After an initial conversation they considered me to be affiliated with Wikidata. I am happy with that; they approached me originally because of my involvement with OmegaWiki.

My hope is that information that exists in the Panlex system will find its way into Wikidata. By including information in Wikidata, all this lexical wealth gets a new application. It will open up any and all information that Wikidata connects to. Given that the PanLex information is linked to DBpedia, it should not be that hard.

For me language diversity is enabling its use. The objective of the Wikimedia Foundation is to share in the sum of all knowledge. The objective of the PanLex project is to promote language diversity. My hope is to bring these two organisations together because together they can do so much more.

For the love of #Statistics - Get the numbers right

When you search on the Italian and Polish #Wikpedia, a script will be loaded that provides additional Wikidata based information. This script exists on the English Wikipedia.

Are you as interested as I am if this counts as a pageview for the English Wikipedia?

The Tech #Barnstar

Yesterday, the #Wikimedia #labs had a problem; it was being hammered by what looked like a DDOS attack. It proved that the script that provided the Wikidata search functionality needed to be optimised more than a bit.

The software was killed but soon after that a modified version went life. Definitely a reason to be cheerful, also a moment to be grateful. I would like to present this Tech Barnstar to

  • Magnus for writing the original software
  • The Ops / Labs team for having to deal with a nasty situation
  • Legoktm for modifying the software that has the approval of the Ops team

Wednesday, December 04, 2013

#Tamil does #Wikidata

In an embarrassment of riches; the Tamil Wikipedia is the third Wikipedia complementing its search with the Wikidata search functionality.

There is only one snag; I try to understand if it works properly and I don't. I search for "தமிழ்" it is the autonym for the Tamil language. I should be able to find it and I do not get the expected results ... I do when I search for "Tamil"..

To make it even more complicated, I do not find "தமிழ்" either on Wikidata itself. Not even when I change the language to Tamil.

I am really happy, and I trust that it will be all-right eventually. The one question I have is where is the problem.

Wyniki wyszukiwania w #Wikidata

On the #Polish #Wikipedia, search results are now complemented with Wikidata as well. Only one line of code has been added at the very end of their MediaWiki:common.js and the people of Poland have in addition to the WMF provided search:
  • a link to Commons categories for a subject
  • a link to Wikipedia articles in other languages
  • a link to the Wikidata item
  • visualisation care of the "Reasonator"
  • more Wikipedia items than there are Wikipedia articles
The kids in Poland can now search for their "horsies", "kitty cats" and "doggies" in Polish and they will find a link to Commons and to the category for horses, cats and dogs.

Congratulations to the people in Poland and I hope many more Wikipedias will add this extra functionality.

Tuesday, December 03, 2013

Will this be the "Reasonator" logo?

When you use the #Wikidata based search functionality, you have to click on a "R" to trigger the "Reasonator" for the visualisation of the information available for an item. Reason enough to look to look for something better.

The first proposal for a logo was created by CristianCantoro. It looks really smart and it is certainly useful. The only drawback is, that we have a choice of one. Maybe there are other people interested in proposing a logo for the Reasonator. I understand that the winning design will get a Wikidata t-shirt....

#Wikidata HOWTO - #Lists

Lists are evil but Wikipedia contains many of them. The question is how to deal with them. The problem is that Wikipedia often uses a list as an item of a list; the "president of Armenia" is actually the "List of presidents of Armenia". To make it worse, the interwiki links bundle these together as if they are one and the same.

The solution is simple; create a singular item, split up the interwiki links as seems obvious. The next thing to do is to add "is a list of" on the list item and refer to the singular item.

As the Italians are all having Wikidata search results available to them, I was disambiguating "Michael Skinner". An article about the biologist Michael Skinner had me look if he was already known to Wikidata..

Monday, December 02, 2013

#Wikidata complements #search in the #Italian #Wikipedia

The search results of the Italian Wikipedia is the first that is complemented with results from Wikidata. When you search for "Alain Danilet" for instance, you will find that there is no article about him in the Italian Wikipedia. What you do get is the following:
Alain Danilet Francia politico (1947–2012) ♂  R
As Mr Danilet is known in Wikidata, it will be shown that he is a French politician together with his birth date and the date of his demise. As more statements becomes available to more items, disambiguation will become easier when there are more Wikidata search results.

When you select the R the "Reasonator" will show you the information available in Wikidata.

Had there been a Commons category for Mr Danilet, a small   icon would have been included as well.

The Italians are first to have an improved search result. It will be interesting to learn how this will change the perception of Wikipedia. Obviously they now get a bigger share served of the sum of all knowledge.

What #priority? Applying #standards

Bug 8217 is old. It is about applying standards to the naming of Wikimedia project. The bug was filed in 2006 and as time goes on, an increasing number of people are upset. Some have gone to the next stage and became bitter.

One upset Wikimedian changed the priority of the bug to "immediate" and, given the law of the land, it was changed back to "normal" by an administrator.

At this time, several Wikimedia languages are not properly identified. Applicable standards should be applied. A really long time ago the language committee was created to prevent future issues. We are still waiting for the old pain, the old wrongs to be righted.

Please Wikimedia powers that be, please intervene. Please make sure that these issues are resolved.