Saturday, May 31, 2014

#Wikimedia Foundation has its eyes firmly on one ball

In the mean time, other games are doing well. Given that #Wikipedia is the centre of attention, it is an improvement that other languages also exist and get attention from official WMF research.

This long-standing bias of no official attention means that "it"[1] does not really exist because "it" is not officially studied and therefore "it" is not considered.

"It" however attracts so many people that it gets mentioned in the Guinness book of records. "It" could also be something a researcher calls "some of the most exciting things to happen in Wikimedia wikis in a long time".. "It" did not happen in a Wikimedia wiki as it is not a wiki.

What is studied is what it takes to attract new volunteers for Wikipedia. People who are willing and able to help us "share in the sum of all knowledge". That IS an aspiration. What we can achieve is "share in the sum of all available knowledge". That will not be delivered in any Wikipedia as we know it.

[1] "it" is not defined because it could be so many of the bright things happening in the Wikimedia ecosystem.

Friday, May 30, 2014

The #MediaWiki #GeoData extension is a #dinosaur

"This extension adds a new parser function, {{#coordinates:}}, that saves coordinates to the database. Function's input format is made as compatible as possible with GeoHack".

What it does is save data for one Wikipedia article at a time. It links it to another dinosaur, GeoHack, and hopes for best results..

For as long as such dinosaurs are aliveish and kicking, it may be good. At the same time, it provides an excellent target for harvesting all this geo-data and include it in Wikidata. That will make the same geo-data available to all articles in any Wikipedia on the same subjects.

GeoData saves its data to "the" database, when it decides to evolve and store its data in Wikidata, it may have some future after all.

Monday, May 26, 2014

#Wikidata - the game

With the Wikidata Game, Magnus did it again. He created a few games that are quite compelling. In a few days the games have generated more than 100,000 edits in Wikidata.

It is awesome. Many people make a big job light. We now know about 1,964,351 humans, this is up from 1,332,383 on April 20th. We know about 1,459,421 males, up from 760,616 and 260,521 females, up from 154,455.

When a game makes a difference in only a few days, it follows that making a game out of our issues will provide us with quick results. As important, it grows awareness of Wikidata, it shows people that we need their help and the numbers prove that games work.

The question is not only if people are interested in making games but as much if they find it in themselves to cooperate and produce more, better and integrated games.

Sunday, May 25, 2014

#Wikidata - Brazilians who died in 2014

How many people from Brazil died in 2014 according to Wikidata. The answer is obvious when you know that a human is a Brazilian. So when he dies, ie a date of death is available, it is just a matter of applying the right query and you get an answer.

Surely there are more than 41 notable Brazilians who died in 2014. It means that we need to know for more humans that they are Brazilian. The category Naturais do Brasil knows about some 26,520 Brazilians, 1,128 are not known to Wikidata and 14,393 are known to be human. Of these humans 7,481 are known to be Brazilians and consequently we can safely add a statement of a Brazilian nationality to 6,912 humans.

This process is under way and the number of deaths is on the rise. Not as much as you would expect because nationality is often added when someone is registered as dead.

What is left are 1,128 articles that need an item and 12,127 items that need to become both human and Brazilian.

Saturday, May 24, 2014

#Wikimedia-#Labs its script kiddies and its public

Labs is the home of many tools. These tools are often essential to workflows or they show people information that they cannot get in any other way. When you want to work on people who died in 2014, you use this script. When you want to have information about 土门岭村, a village in China, you use Reasonator.

Labs has three types of public; there are the heroes who develop tools, there are the people who use these tools to generate more data and there are the consumers of all the information that becomes available.

They all have one thing in common. For them to function Labs has to be operational.

When Labs is not always available, when the quality of service suffers, the tools will not get the exposure they seek. Script kiddies will not generate more data and the public will not share in the information that should be available to them.

Friday, May 23, 2014

#Wikidata - statement #statistics

The Wikidata statistics can be bewildering. Above you find a comparison of the last two dumps. You would expect that statements are added and that the number of items with more statements only grow. This time it is not the case. Yes, the total number of statements has gone up but the decrease of the number of items with 6, 7, 8 or 9 statements is more than the increase of the number of items with 10 or more statements.

The good news is that the number of items with no statements has gone down significantly.

Thursday, May 22, 2014

When #Google is my friend; translating from #Marathi

I am adding dates of death to many people who died in 2014. Included in my workflow are people with an article on the Marathi Wikipedia. It is well known that translating into Marathi for articles is not great. For my purpose; identifying a date of death and a date of birth it is good enough :)

#Wikidata - Jaime Lusinchi, president of Venezuela

Yesterday Mr Lusinchi died. Thanks to a ToolScript this was noticed. It was found that Wikidata was not aware that he was president of Venezuela.

Currently he is the only Venezuelan president where information is included like when he became president and who preceded him. For many if not all of the others Wikidata is now at least aware that they held the office of President of Venezuela.

#Wikidata - Conflict of interest

Wikidata has many objectives and the one thing that makes Wikidata relevant is that from the start it is there to be used. The most important use cases for Wikidata are:
  • repository of interlanguage and interproject links
  • depository of data that may provide information in Wikimedia projects
There is an inherent conflict. In order to be a depository of data, all articles need an item in Wikidata. For a complete set of interlanguage and interproject links items on the same subject are not acceptable.

This issue becomes even more problematic when external use of Wikidata is to be considered as well. 

Wikidata is immature and it shows. When its problems are to be solved, more and better tools are needed. These tools will need room to evolve and most of all, the people who are involved in Wikidata need to talk with each other.

Tuesday, May 20, 2014

Authorising #oAuth in Widar

Tools like AutoList and Reasonator get a boost from their ability to apply changes to Wikidata. Reasonator shows labels in the languages you know and missing labels in your language can be added. AutoList enables you to add multiple statements to items that it shows in a list.

This is powerful stuff. It helps when you understand how it works. Widar is the tool that does handle the authorisation for oAuth. It remembers the authorisation by using cookies. These cookies are associated with your browser. When you use multiple browsers, you have to authorise multiple times.

As Reasonator is used everywhere, people will authorise often. As the cookies persist, it is important to end the authorisation for Widar. The functionality to do so has been added recently.

#Wikipedia vs #Wikidata - a qualitative comparison

When a Wikipedia thinks it knows something, and Wikidata begs to differ, what to do? Wikidata has its information from a different source or someone actually looked it up or someone made a booboo. It is all possible and someone has to look into it. This is where sources become actually relevant.

The thing to do is actually pretty simple. You research it or you report it. For a bot that is looking for specific facts, reporting is the obvious way to go.

Two reports by Amir show the differences for the date of birth and the date of death between the English Wikipedia and Wikidata. By highlighting these issues, it becomes possible to raise quality in both Wikidata and Wikipedia.

At this time these reports are generated for a first time. Some false positives have been found. The location of these reports is not optimal and, the communities of Wikidata and English Wikipedia are not yet aware.

Reporting on problems however is one obvious way of raising quality. It is hopefully the first of many similar reports.

Sunday, May 18, 2014

#Wikipedia - an alternative to #bots

What bots do is generate text from the data that is provided to them using an algorithm. When you have a lot of data, it will create a lot of text or a lot of articles. When data is added at a later date, the text of a bot generated article will not change. Typically an article is served from a cache.

As bot generated articles are a bad idea when they are in a fixed format, why should we have them? That is, why not generate them when they are requested and serve them from a cache, just like any other article.

Typically, the data of a bot generated article can be provided by Wikidata and Commons. Scripts that generate articles already exist. What is needed is to import the data in Wikidata and generate the text when requested. The number of articles will go down and this is actually a good thing. It means that the articles left are the ones that humans have edited, are involved in.

So what we need are some people with language skills to create the scripts for all the languages we support. We need the WMF engineers to consider articles on demand and think through how such an article gets into the caches and statistics. We need to inform our communities about what people are reading and where a human touch can bring such articles to the next level of quality.

#Wikidata - Mukesh Gadhvi, used to be a member of the 15th Lok Sabha

Mr Gadhvi died in 2013. What makes him interesting is that there were two articles about him on the English Wikipedia. Both have him represent the Banaskantha electoral district in Gujarat. That article made it obvious that he died and was succeeded by Mr Haribhai Parthibhai Chaudhary.

Like so many members of the Lok Sabha, Mr Gadhvi had a father who was also into politics. Given that it provides more information, an item for Mr B.K. Gadhvi was created as well.

NB For many if not most members of the Lok Sabha, there is no picture yet. The pictures is from the obituary you can find with Google.

Thursday, May 15, 2014

#Wikidata - 2014 deaths

In the many #Wikipedias, articles for people who are notable or noted indicate their demise. The deaths of 2014 are registered in Wikidata. Adding the deaths from the Ukraine Wikipedia is rather depressing. Then again, if the death of the people who died in the Ukraine is not noted, they may have died in vain.

Wednesday, May 14, 2014

#Wikidata - duplicate items

In the Matrix, every duplicate will fight you. In Wikidata you may kill of every duplicate. This is done by merging them. The result is good; statements are combined, articles are combined in the oldest item of the two.

Merging is easiest using the merge gadget. There are loads of duplicates, there could not be as much as 20% doubles .

A first tool to attack this multitude is the "Wikidata duplicate item finder". Based on the assumption that everything has an unique name you will get a list with many possible doubles. With this tool Magnus very much provides a first tool to fight the agent Smiths of Wikidata. Other tools are needed; for instance to look for a combination of labels and date of birth.

Tuesday, May 13, 2014

#Wikidata - the 3th earl of Mayo

Joseph Deane Bourke, 3rd Earl of Mayo has a Wikipedia article. It is largely filled with his genealogy and his ecclesiastic career.

Wikidata used to know him as a "list of" "person". An automated process added "instance of" "human" to Mr Bourke. These two claims are obviously mutually exclusive. Thanks to AutoList it is possible to find the items that have both claims.

At this stage you have to remove the wrong claim one at a time in Wikidata itself. The automated process operated on the presumption that an entry in a category that is named like "1794 deaths" are humans. As can be expected, this is mostly true.

Monday, May 12, 2014

Tom Kettle, a UK MP

Preparations are under way for edit-a-thons about the great war. The objective is that more quality information becomes available that can be associated with World War I.

As I am waiting for a plane home, I am adding images to humans who can be found in the category British Army personnel of World War I. The tool I use is WD-Fist and, this is a perfect way to wait, idling.

Many of the people who fought in World War I died. When you look at their images, they were so young. The ones with an article have to be notable in one way or another. Mr Kettle died at the Somme. He volunteered to fight, he was not the only MP who died.

#WMHack #Maps and #Wikidata II

This hackathon had many people with an interest in maps come together in Zurich. There were several challenges they faced; how to represent maps in a wiki, how to store them and what do we need to know about them in Wikidata. In this mix of challenges the differences between contemporary maps and historic maps feature as well.

Wikidata needs to know several specific things; it needs to know that something is a map, it needs to know the four corners of a map, the location where that map can be found and finally it is nice to know what type of map it is. More attributes are possible but this was considered the minimum for Wikidata.

The thought process about Commons was forward looking; it is going to be "Wikidatafied" and this will surely affect current practices. Information that is currently in templates will move into Wikidata and many of the galeries and categories will surely become redundant because queries will provide a more reliable and complete result.

For Wikis, current best practices were analysed and, it was found that information on a map exists in many layers. There is a base layer and on top of that you can show a contemporary or historic map. On top of it you may want to show the shapes of countries or districts. These may be sprinkled with pointers that reflect the result of a query. To finish it off, you may want to add even more that demonstrates a point made in a particular article.

All this information needs a place. It needs a special place because you may want to use a map several times. In Zurich we ended of a working example of a map that included all these complications by inserting information in a namespace. The next challenges are to make it robust and user friendly enough.

#WMHack - #Wikidata and bias towards English #Wikipedia

English Wikipedia has more articles, it has loads of info boxes. As importantly most of the people importing data know English and consequently a lot of data is imported into Wikidata from that source. According to some it must be that Wikidata is biased because information available in the English Wikipedia is better represented..

Uhm, a great argument. Lets analyse that. The current statistics show that 50% of all Wikidata items have zero or one statement. They typically refer to only one article and, have only one label. Effectively it is garbage wherever it originated from. That leaves us with the other half. Less than one in thirty has 10 or more statements and for less than one in fifteen we have ten or more labels. Only a fraction of the subjects covered by the English Wikipedia has great information.

We assume good faith when we say that the English Wikipedia aims to share in the sum of all knowledge AND aims to provide a neutral point of view. It is easiest to retrieve information from the en.wp and when we add all that to Wikidata, we gain the most improvement in quality for the least amount of effort for most of us. Even then, it is surprisingly problematic to set up the effort to get data in. It takes time and an inordinate amount of bickering before data finally finds its place.

When you look at all that data once it found its place, you find that Wikidata itself looks best in English. As long as fall back languages for statements are not supported, its information looks reasonable only when seen in the Reasonator.

Wikidata aims to be a quality resource for information. When all it has is information from the English Wikipedia, you might as well go to the English Wikipedia. When Wikidata gets its house in order and compares its information with many sources and reports on the differences found, it helps improve information everywhere and reduces the existing bias.

When those sources include multiple Wikipedias, it will be biased towards those it compares with.

Sunday, May 11, 2014

#WMHack #Maps and #Wikidata

This hackathon and many maps put #Zurich on the map. When you consider maps and, particularly historic maps, they have four corners and a date. That is the minimal approach to a map.  You can add to this what a map intends to show, it can be a thematic map or a generic map.

When you add the four corners of a map as properties to a map, you can query for the maps that include Zurich.. When the maps are dated, you can show them in order..

It is really exciting that it has been decided what we need in a map on Wikidata.

#WMhack - #Wikipedia #Zero and #Reasonator

At a hackathon people consider things they did not consider before. Brion is now looking at some of Magnus's finest.. Reasonator, ToolScript.. What Magnus does brilliantly is create functionality that just works. The challenge for Brion is to make it usable in a WMF context.

Consider Reasonator for instance; it is one of the results when Wikidata search is enabled. It however has a different IP-address from a Wikipedia and consequently it is content that people may have to pay for when they use Wikipedia Zero. What we want is to "share in the sum of the available knowledge". For this reason it is wonderful that Brion has taken an interest.

Once he is done, it may be part of the service provided by Wikipedia Zero. Obviously, it may be subject to the contracts with all the telephone companies.. But the best case scenario is that WMF loves it, Brion improves Reasonator for use on a mobile and we are even more in business to share in the sum of all knowledge.

Saturday, May 10, 2014

#WMhack - GLAM statistics

People of the Wikimedia analytics team are in strength at the hackathon in Zurich. It makes a difference; Multichil reported on work done on page views of GLAM pictures.

The first results are available. It may need more attention but there is actually something to see.

#WMHack - This is not a hack, we can share in the sum of available knowledge

The Wikimedia Foundation wants to share in the sum of all knowledge. What we can do now is generate text on the fly based on the knowledge available in Wikidata. In that way we can share in all available knowledge we languages on all subjects.

Wikidata may provide Wikipedia of services.When Wikidata was conceived, its first line of business was to replace all the "interlanguage links of Wikipedia. As a result it knows about more subjects than any Wikipedia. It knows about for instance more US-Americans than the English language Wikipedia. An other objective is to include statements for each item so that information can be centralised in Wikidata for use in any and all projects.

When the statements have labels in a language, it is possible to provide information in that language. It could be any language even English. The current thinking is very much: "we can serve the information boxes in articles from Wikidata". What Reasonator and WD-Search prove is that those articles do not need to exist. Most members of the South African National Assembly do not have articles in any language but information could be found in any language spoken in South Africa.

We can use machine translation to translate articles but we can also use similar algorithms to generate text based on the information we have. This has been done often in our wikiverse; they are the bot generated articles. In the Reasonator we generate text about humans in English and a few other languages. It is not rocket science to improve on what is there. In essence it is exactly what we do in our localisation functionality in It follows that we have some ability for at least 280+ languages.

This is possible with current technology, the software comes with a great pedigree. It is brought to you by the same human who started with MediaWiki.  He is a scientist and the functionality is for you to enjoy in so many ways.

The point is that we can. We can share in the sum all the knowledge that is available to us. We can do more than aspire, we can share much more of the wealth that is hidden on our servers.

Thursday, May 08, 2014

#Wikidata - May 7th #Elections in South Africa part II

The elections have come and gone. When the results are in, it will be possible to make a list of the current members of the South African National Assembly.

At this time Wikidata knows about 405 of them. When you look at it with AutoList, you will find 408. This is because for three people it is not known that they are human..

One of the major languages of South Africa is English. There may be a lack of interest in politics in South Africa but the quality of Wikipedia suffers when it apparently has no interest in South Africa.

If you think this is bad, try countries like Ivory Coast, Libya, Uganda. When you want an other perspective, check Wikipedia for the mayors of any moderately big US city in the 19th century.

NB Wikipedia knows about 170 members of the South African National Assembly

Wednesday, May 07, 2014

#Polio is back

Polio is a decease that can kill, can cripple. It is not pretty, there is no upside and it is not necessary. We had a chance to eradicate this most horrible decease and it did not happen because some abused the trust people need in the people immunising. This was followed by the targeting of the people who went out in the campaigns to immunise kids.

Pakistan has taken steps to contain polio. People who want to travel abroad, will be immunised. This is to be applauded and, it should be compulsory to be immunised for contagious diseases.

Next to my passport, I carry a yellow immunisation passport. When asks I can show what diseases I am immunised against. It would be really good when such a passport was mandatory for any travel that crosses borders, uses planes or ships.

To the people who talk about the "right to refuse" I have only one point they have to deal with. Do they take it upon themselves to take full responsibility for the consequences of all people that have not been immunised? Really !!

Monday, May 05, 2014

#Wikidata - Mr #Torvalds received an #award

Mr Torvalds was awarded the Computer Pioneer Medal from the IEEE Computer Society. It was all over the news so it was added to Wikidata.

As you may know, there is a lot that needs to be done at Wikidata. The award was made an award. There was no category of past winners so some winners were added to make it look better.

The English article did not mention a 2013 winner; Mr Stephen B. Furber was a winner according to the IEEE website. This information was available on the German article as well.

There are many more winners to add. Not just for Computer Pioneer Medal. There are many more Prime Ministers, Ministers of Transport, of Railways to add. Once all these statements are made, labels have to be added as well.

No reason to be bored.

Sunday, May 04, 2014

#Wikidata - May 7th #Elections in South Africa

South Africa will elect members to its National Assembly. Some of its members have a Wikipedia article but most do not. They are however "known" because of red links in a list and as red links in a template.

Adding them to Wikidata is easy and, by adding a distinct statement: "position held" "member of the National Assembly" everything is in place to select them using AutoList. Obvious statements to make are "is a" "human", "nationality" "South Africa" and "occupation" "politician".

The information shown for members of the National Assemblee looks like this in the Reasonator. As these members of the National Assemblee do not have an article, WD-Search enables access to this information. For all these new items there are only four statements requiring eight labels.

When the results of the elections are in, many of the incumbents will be re-elected. When the list of winners includes that fact, it will be easy to complement the list of members of the National Assembly with the latest crop of representatives of the South African people.

Thursday, May 01, 2014

#Wikimedia Foundation has a new #MD

I just heard that Lila Tretikov will be the new MD for the Wikimedia Foundation. Many people will say "wise" words. I do better, I may be first and I just show a picture of her reaching out..

correction.. executive director .. she will manage :)

#Wikidata - #statistics; adding #human

The latest statistics are available. Many new items (445,778 or 3%) were created in Wikidata and the percentage of items without any statement rose. The challenge is to bring their number down from 4,946,696. For these items we only know there is an article about them.

There are two approaches to it:
  • add statements
  • merge items
In order to merge items, we have to know things about them. So adding statements is a great first strategy. Many articles are about humans so what is more obvious than stating that they are human when there is a reason to think so. A similar process is under way on the Italian Wikipedia where basic information about people is added including the fact that they are human.

The statistics are based on a dump from April 20th and since then some 176,729 humans have been identified. As more information gets added, it becomes possible to guess that items may be the same. For instance they are both about humans with the same name that died on the same date. Or had the same occupation. Or are in similar categories. Maybe they have a strikingly similar "concept cloud".. 

The likelihood of missing inter language links is quite high. Obviously with Wikidata this has improved a lot but the number of items with only one link, 9,028,632, is high and it is rising. The question is very much what mechanisms we can come up to find the items we should merge.