Monday, March 31, 2014

#Wikidata - Expressing its quality

Quality is relative. Take for instance the category about "Thai painters" or the Dutch or Thai categories. The first two know only about one painter who is Thai. The Thai Wikipedia knows about 24 painters.

At this time Wikidata knows about 20 Thai painters. It takes a little effort to add any missing painters known to a Wikipedia.

When all the Thai painters known to Wikipedia are known to Wikidata, it means that its quality to list them is as good as any Wikipedia category. Obviously there are many more notable Thai painters. They all deserve to be known in Wikidata.

Reasonator is able to express the quality of Wikidata. It uses queries that are based on the Wikidata data. By quantifying for example the painters from a given country, it becomes obvious what Wikidata has to offer. Reasonator can do a better job than any category that can be expressed as a list or a query.

A barn star

#Appreciation is a good thing. Today I received notice that last year I had made my first edit on the Dutch Wikipedia ten years ago. Apparently it had to do with the genus Hoodia.

It is wonderful that people take the effort to make others feel appreciated. It is how a community maintains itself, it gives a sensation of belonging.

Sunday, March 30, 2014

#Wikidata and #diversity

Frank Lloyd Wright is considered to be one of the best USA architects. When you look at his off spring, you will find only boys. When you read his article in English, you will read that several girls turned out to be an architect as well. He had several wives they too are not covered either in a Wikipedia article or as a Wikidata item.

As diversity is relevant, there is hardly an excuse to have only articles for the boy architects and not the girl architects. They are related to the same great man.

Addressing issues that have to do with diversity in Wikidata is easy and quick. It translates easy to many languages and cultures. It is an obvious way to address one item at a time.

The sound of #Wikidata

#Quality is a relative thing. When people talk Wikidata and quality, typically they complain that both quality and quantity is not good enough. The reality though is that it is its qualities that makes Wikidata function in the first place.

When people ask: "When will Wikidata be good enough to be used by Wikipedia". They fail to appreciate that its first achievement was to bring improved quality to Wikipedia's interlanguage links. Every Wikipedia does benefit and uses Wikidata.

These interlanguage links can be improved on. Many articles and items should be merged. Research supports this as a fact. However, items can only be merged when they are on Wikidata. The Chinese item for the "Quercus robur" was merged because someone had added the name as a label. It is therefore a quality when articles have a corresponding Wikidata item and are enriched with statements and labels.

Many items link to external sources. For many items such links are yet unknown. Often the external sources know about links to Wikipedia and when we collaborate, we can add all of them as items to Wikidata. It increases the quality of Wikidata. When Wikipedia articles have been missed, they can be merged to improve quality even more.

Some statements are made by bot and others by hand. James Schlesinger died. Many new statements were made. Statements were made to other people who were "United States Secretary of Defense". The web of information grew and with it the quality of Wikidata. Dexbot is adding claims for "is in the administrative-territorial entity" for items that Wikipedia knows to be in the USA. It grows the web of information in a spectacular way.

When you wonder why a  Reasonator is succeeding, why a WDQ makes a difference, it is because Wikidata is increasingly valuable. Its value is in the sum of what it connects. When you assess the quality of Wikidata, it is obvious that there is more to connect. It may be obvious that many connections are redundant. The relevancy of Wikidata however is in the great amount of data that is added all the time, that gets connected all the time, that gets merged all the time.

The question when Wikidata will be good enough to be used by Wikipedia has an answer; it has been used from the start and as time goes by more subsets of information will be better represented in Wikidata than in any Wikipedia.

Friday, March 28, 2014

#DBpedia & #Wikidata - Gerarda Rueter, a Dutch sculptor

At the Dutch #Wikimedia chapter, we had a talk with the Dutch DBpedia chapter. We want to cooperate and there are multiple things that we can do. DBpedia is excellent in harvesting information from Wikipedia. The Dutch DBpedia harvests from the Dutch Wikipedia.

For one of the first DBpedia projects, we want to set a repeatable example. It should be easy and the benefits need to be obvious. Mrs Gerarda Rueter typifies what we hope to achieve. We only knew that she was a sculptor and all the other information on the page was added by hand later. What we did was identify her as a human and as a female. We added the date and place of both her birth and death. We added her father. For him and Mrs Rueter's siblings, we added spouses when known.

When you present the data using Reasonator, relevant information becomes visible. What can be considered is that names are spelled the same in many languages. That however is probably for a next round when nationalities are included.

Thursday, March 27, 2014

#Monuments of #Switzerland

The problem with #data is, how do you keep it all up to date. For the monuments of Switzerland much of the data is kept on the toolserver. It works just fine.

Wikidata has matured enough to include much if not all of that same data. Missing in the official functionality are the tools to make use of it. Yes, you can store information on the connected Wikipedia articles but that is not the same as using it to administer a project like Wiki loves Monuments.

Un-official functionality meanwhile does provide much of what is needed. This is a list of all the monuments known to Wikidata in Switzerland. I added 86 pictures to the Wikidata items using the WD-Fist functionality.

As more tools get connected, these tools are increasingly attractive to use for a project. One big advantage of Wikidata is that you do not need to have an article for every monument you know about.

#Wikipedia - Edward Copleston (1776–1849) Bishop of Llandaff

According to the information about the painting of Mr Copleston, this picture can be copied to Commons. This is a lot of work and it involves copying a lot of information about this picture to yet another Wiki page. It has to be done "exactly right" because when you fail to do this, it may mean that the picture gets lost because of the not so obvious bureaucracy at Commons.

There are many pictures like this. It would be more convenient when Commons shares its meta data about media files with all the other projects. Moving an image to Commons would be nothing more than indicating the available for general use.

That would make much of the administrative hassle go away.

Wednesday, March 26, 2014

#Wikidata - the best tools

The best tools are the tools that are used. This tool by Sany14 ticks many important boxes. Its intention is to manage Wikidata items that are linked to only one Wikipedia. There are some 8,609,483 such items. Many of these items should be merged with other items. Rahimanuddin Shaik informed me that many of the lonely links to the Telugu Wikipedia should be merged.

The question is if there is an easy way to identify potential matches. The next question will be, what to do when you have identified a match. When an item is in Telugu only, the first thing to do is to add labels in other languages. When you do, it makes sense to compare existing items in that other language. They are the potential duplicate items.

It would be awesome when people who develop tools share technology. When Sandy14 has a way to query a dump for lonely items, it may be possible to do this on a database that is kept up to date. In this way users of the tool will notice the results of their work. When shared components are used, a hover box can be used that shows known information in a language. Automated descriptions will not only inform our readers but also help our tool users. The tool users, they are us.

I am excited about this tool and I hope it will get more exposure and becomes part of a tool set that brings out the best of Wikidata.

#Commons - Picture of the year

#Reasonator statistics

The Reasonator statistics are constantly updated. Most attention is given to the overall page views but the break down per language is probably as interesting.

The percentage of visitors in Spanish is practically the same for Reasonator and Wikipedia: 14.4% and 14,5%. Of equal interest is the growth for Persian, it adopted extended Wikidata search and now the numbers are going up.

When you compare Chinese with Tamil, it is obvious that the Chinese are not yet aware of Reasonator and the power of Wikidata. More involvement of the Chinese in Wikidata would be awesome because many of the items exclusive to Chinese can do with statements, with labels in other languages and often with merging.

Equally interesting is that some languages are lacking in interest. I represent probably most of the page views in Dutch. The Germans are also very much underrepresented.

What I would love to know is to what extend these numbers are similar to the numbers of people adding to Wikidata.

Tuesday, March 25, 2014

#Wikidata #statistics

It takes an effort to understand where Wikidata is taking us. It is not clear, it is not obvious and all we have are trends. The latest statistics of Wikidata show that items with statements and the items with labels are growing nicely. It is encouraging to notice that the number of items with multiple statements and to a lesser degree the items with multiple statements are growing more quickly than the number with none or one. What it means is that Wikidata becomes more usable. With more statements there is more information to provide. With more labels more information is available in more languages. Given the multiplier involved because of the use of items in other statements, adding labels has a profound effect on the usability.

When you look at the number of links per item, you will find that this year the number of items linked to only one project has grown. One reason is that bots have run to add missing articles to Wikidata. This makes it easier to add statements about all entries in a category for instance. It makes Wikidata more complete and consequently more valuable. Having to merge items can be easy and obvious with a bit of experience in Wikidata.

The statistics that break down the number of labels and links per project provide a different perspective. It indicates to what extend we are able to share in the sum of the knowledge that is available to us. For English, the language we support best, 41.6% of the Wikidata items have a label. We can provide information about the missing items because of tools like the Reasonator. It does provide fall back to labels in other languages.

It is fun to write about subjects like Reasonator and Wikidata. Opportunities arise, issues arise and it is wonderful to notice that many are addressed. It is wonderful to experience how useful Wikidata is becoming. At some stage these blog posts will be the subject of research because what is happening is not what is foreseen.

Monday, March 24, 2014

#Wikipedia - Piet Zwart

According to his peers, Piet Zwart was the most relevant Dutch designer of the 20th century. Sadly, no picture illustrates his article. He was awarded several awards; they were all unknown at Wikidata.

He received for instance the Quellinus award in 1959. The award was named after Artus Quellinus and, it is still being awarded.

There are several opportunities; you can spend time finding more people who were awarded the award. You can add more statements to Piet Zwart. You can spend time on the Quellinus family. You can even introduce the Redwd template to the Dutch Wikipedia.

No reason to be bored.

#Wikipedia MOS:DABRL#Red_links

If the title is immediately clear to you, it may be that you have immersed yourself in Wikipedia lore to an extend that may not be healthy. For the rest of us, it is about the use of red links. Wikipedia policy has a lot of red tape and it is often not clear and obvious. For instance: "A link to a non-existent article (a "red link") should only be included on a disambiguation page when an article (not just disambiguation pages) also includes that red link." The problem here is that a link to a disambiguation page is not a red link.

The "Redwd" template is a game changer when you are considering the whole notion of red links. Technically it is not a red link. It is a template that includes the functionality of a red link. It refers to both Wikidata and to Reasonator and consequently it demonstrates that information exists on a similar level as a "stub". Like a stub, the information can be expanded by adding statements in Wikidata.

The difference between a stub and a Reasonator page on the same subject is, that when statements have been added, new information will show in all Wikipedias that use this functionality. It will show when an item is selected when "expanded Wikidata search" is used.

There is an additional opportunity; disambiguation pages are comparable to the result of a search using the Reasonator. Often Reasonator has more results than what can be found on a disambiguation page. The opportunity exists to add a link to Reasonator from each "disambiguation" page.

Oh and by the way, the policy needs a makeover.

Sunday, March 23, 2014

How about a #Cactus, it is prickly

The #nomenclature of species has its history. Understanding it takes an effort. The family of the Cactaceae is a nice one to show some of the problems involved.

The latest update of the Reasonator copes with one problem that you will find in Wikidata. What is meant by Cactaceae has changed over time. There are four "parent taxons" for this family and, Reasonator will now only show the preferred one. This limits the list of higher order taxons a lot because all these parent taxons have their own parent taxons.

The problem is that lower level taxons end up all being part of the same family of the Cactaceae. Technically all these lower level taxons are "valid". What is understood by the family differs greatly and showing the preferred parent taxon is the best we can do.

Old descriptions like Cactus ficus-indica exist in Wikidata. You will appreciate that the old gets merged with the new. For you and me, it has always been a "prickly pear".

Mark Davis is the co-founder and president of the #Unicode Consortium

Mark is one of those persons who is associated with what is now "basic functionality". He is still working in the field of Unicode and language support and is therefore one of the movers and shakers in the computer industry.

When you search for Mark Davis, he is one of 28 items that show up. They are many people, a house, two computer games and a movie made by a Mark Davis. These 28 items represent the combined knowledge of Wikipedia, Wikisource and Wikivoyage.

When you hover over any of the items in the list, you will get a box with pertinent information. When available, a picture shows up. I asked Mark if I could use a picture of him to be used in Wikidata and Wikipedia. I could and have the chat log to prove it.

When I told Mark that Wikidata is harvested, among others, by DBpedia, he realised that the picture used will have an impact on the semantic web. He was impressed and, is now considering if he has an even better picture to share.

Mark Davis wrote "Shama Lama Ding Dong"


One of the items in Wikidata is the song "Shama Lama Ding Dong". This song was first performed in the National Lampoon's Animal House and, the Band of Oz, won People's Choice Song of the Year at the 1995 Carolina Beach Music Awards.

Mark Davis does not have a Wikipedia article. But the trouble has been taken to provide him with a Wikidata item. Given that "best practice" has the link go to the disambiguation page where Mark is not to be found either, it was a fine moment to introduce the "Redwd" template. It does link to Wikidata and the Reasonator while keeping the link red.

This is a good way to revive the function of the red link. It does provide information that is specific to this Mark Davis and, it could be the start for someone to write the missing article.

I want to thank Magnus for creating the template and Multichill for adding the documentation.

Friday, March 21, 2014

#Reasonator - #Cambridge revisited V

When Reasonator presents Cambridge in Tamil, you want a map to be in Tamil as well.

By making sure that the information you get is as much as possible in *YOUR* language, it becomes increasingly interesting to make sure that *YOUR* language is well served in Wikidata.

It pays to have many labels in *YOUR* language because your reward will be that you will be better serviced in *YOUR* language.

#Wikipedia and the impact factor of #Nature magazine

As Nature has a Wikipedia article, it has a #Wikidata item. It can be presented in Reasonator and as my friend John has an interest for scientific publications, he had a look at it.

He made several comments, one of them was that there were many more statements to make. We added thinks like the publisher, a few external sources and the impact factor. It was 38.597 for Nature in 2012.

There is a problem with the impact factorThomson Scientific, the company that fabricates this number, has the tendency to deny others to use the impact factor. To quote from the "properties for deletion" page on Wikidata: "Thomson Scientific Inc. remind academics and universities that they do not permit any republication or re-use of their Impact Factor lists".

When Thompson Scientific does not want its impact factor on Wikipedia, it has the opportunity to say so. It has not done this for many years. When it does, the impact factor will be removed and, another more nuanced factor will be found to qualify the relative merit of publications, authors and papers. The proposal for deletion has it that it is "derived from a curated dataset (not all journals are included), it involves subjective decisions (which papers are citable and valid citations), the creators of it do apply their own non-automated adjustments".

I am sure that when Thompson Scientific does decide to send a cease and desist letter to the Wikimedia Foundation, many people will remember this famous American who also went to court.

#Reasonator - #Cambridge revisited IV

Reasonator is very much user-centred; you are in the driving seat. You decide in what language you want to see the information. You decide if you want to look at any of the sources provided. You decide if you want to look at a map.

The latest improvement to Reasonator includes an icon for a map in the "hover boxes". Hover boxes are everywhere in Reasonator and maps are now one click closer.

The logo we use for a map is the best we could find. Our current choice is the only one at Commons that does not have a black background. An alternative would have been the OpenStreetMap logo, we use OSM to project our Wikidata items on. If you have a suggestion, it has to be recognizable as a globe/map at 16px height.

Thursday, March 20, 2014

Friends - Where they live

Lately #maps are more often the topic of conversation. A friend of mine send me this to indicate where he works. When you compare it with where I live, you get an idea of how much bigger Australia is compared to the Netherlands. Both maps show items that are within a radius of 15km.

Words are expressive but maps, illustrations are really powerful.

Wednesday, March 19, 2014

#Commons and #Wikidata - the architecture of the #Wikimedia storage of the meta data of #media files

Commons was created to centralise the storage of media files. When it started, MediaWiki was not able to use any media files in any Wikipedia, that came later. Only those files that could be used everywhere were included. As a result, strict rules are applied and the copyright laws of the whole world are considered.

Commons is scheduled to be included in the scheme of Wikidata. This is easy and obvious for much of the meta data that is involved. It will include for for instance the Mona Lisa who the artist is and, what institution takes care of that most famous painting. This information will be available in every language and when the labels have been added, it will be properly readable and searchable in that language.

With Wikidata integration, the data is no longer strongly associated with the "page" of the media file. Effectively it does no longer matter where the media file is located. What matters is that media files are annotated with Wikidata technology. As this strong association will no longer exist, it is possible to change the outlook from a Commons integration project to a media file integration project.

Some use-case scenarios:
  • a file has been marked for inclusion into Commons on the English Wikipedia because it fits the Commons criteria
  • a file has been marked for deletion on Commons as it no longer fits its criteria. It does fit the criteria for "fair use" on the English and other Wikipedias
  • a GLAM is interested to share its meta data and welcomes viewers of its media files that cannot be used in Wikimedia projects because of copyright restrictions
With a Wikidata integration it is just a matter of managing the values in the meta data. Many more media files will become available as a result of one common approach. The management of all this data will become easy and much more effective.

From a storage point of view, things will also become easier; all media files will need to be stored only once. The meta data of those files becomes available in any language and, the meta data will remain available even when the media file no longer is. Files restricted in a Wikimedia context may still be available through external sources.

Effectively a global approach to all media files will make us more effective in sharing them as part of the sum of all knowledge.

#GSOC - #DBpedia and #Wikidata

The most interesting Google Summer of Code proposal I have seen for 2014 is this one.
4.7. Clean DBpedia datasets and import in Wikidata
The student who will take this task will be responsible for two things:
Clean up the DBpedia errors based on the output of Databugger ( or similar. With this the student will generate a more sparse but cleaner dump of DBpedia that will be of general use.
Communicate with the Wikidata community in order to coordinate the import of (parts of) the cleaned datasets and re-use the connections of DBpedia to fetch additional data for Wikidata import.
Mentors: Dimitris Kontokostas, Magnus Knuth (co-mentor)
As far as I am aware they are still looking for a candidate to choose this project. This project is totally relevant and it is exactly the kind of repeatable process that Wikidata should have. By having this project done to DBpedia standards, it is ensured that the baseline will be stable and the results will be repeated regularly.

Tuesday, March 18, 2014

#Wikidata - the latest statistics featuring *quantities*

Statistics make a point, they show a trend. Given the latest, over due statistics, there are reasons to be cheerful. Slowly but surely we are getting to the point where half of the Wikidata items have 0, 1 or 2 statements. What is good about it is that the number of items with 0, 1 and 2 statements are all decreasing.

Other statistics are also quite healthy; the number of labels is going up steadily as well. Here too you find that the more popular items grow their reach. They are likely the items people most look for and consequently a tool like Reasonator will happily show its information in *YOUR* language.

In these latest statistics, the quantities make their first appearance. They start with 17,832 statements.

There is a fly in the ointment. We are getting the impression that not that many articles are finding their way into Wikidata. Magnus tweeted a moment ago that "311,479 articles on English Wikipedia have no item". It would not be surprising when the number of all articles without a link to an article in another language is close to ten million items. That is a lot and, it spoils the party to some extend.

#Wikidata - the replication to #Labs works again

The #Reasonator uses Wikidata data and to make sure that its data is always close to "up to date", a replication processupdates Reasonator every 10 minutes or so. To do this, it needs a Wikidata dump and, incremental backups are really nice to have. Once the latest available back ups are included, replication will pick the later changes up and, the result can be something like all the members of the Lok Sahha.

The Lok Sabha is the lower chamber of parliament in India. Reasonator showss the first 500 members known in Wikidata and, the rest you can see using AutoList.

Saturday there is a edit-a-thon in India. They will write about the female members of the Lok Sabha. We can make a query for this as well. As labels in the languages of India are added to Wikidata, these parliamentarians will be found when you search for them because all the Wikipedias of India make use of Wikidata extended search.

In this way, we make another step in sharing in the sum of all knowledge.

Monday, March 17, 2014

Catalogue of paintings in the #Louvre Museum

#Wikipedia has an article by that name. It is the result of a lot of work based on the catalogue of paintings of the Louvre done by Jane. Her aim was to have a list with all its painters. She published it as an article on the English language Wikipedia. It has its Wikidata item and from the Reasonator page, you can get to its "concept cloud".

This concept cloud shows all the painters on the list that have a Wikipedia article AND have a Wikidata item. The list does contain red links and the painters may exist in Wikidata. For instance a Russian painter may have an article in Russian, a Dutch painter an articles in Dutch.

When Jane works on her painters, she knows them by her "JaneID". One random painter is Jules Robert Auguste. He is known by seven identifiers already, one of them is the Wikidata identifier.

When all the painters Jane knows have a Wikidata identifier, she does not need to maintain her own identifier any more. When someone adds data to a painter in Wikidata, the information about him or her will become richer for everybody. All it takes is to add a Wikidata item. When a painter has its work in the Louvre, he or she is notable by default. I suppose that any painter Jane considers notable is notable because he or she is part of a Wikimedia project.

Sunday, March 16, 2014

#Reasonator - #Cambridge revisited III

Putting #Wikidata #data on a #map is one way of providing #information. Collecting geo-coordinates and not giving it a use is at best "anticipatory". Putting items on a map has been done before but it has all been incidental. It showed off the promise of "Yes, we can". Reasonator does go beyond the promise; it delivers. When an item has a geo-coordinate, you will have the option to look for all the items that are in a range of 15km of that item.

It works for any place on any continent. You will get a map and, you can alter the radius, you can zoom in and zoom out. You can do this for Rio de Janeiro, Cambridge, Almere and Disney land.

The magic of maps is brought to Reasonator and consequently, it is for you to make the most of it. You can put more items on the map by adding geo-coordinates where they are missing. That is obvious. What we are interested in what you will come up with.

As always with a first iteration of functionality, it could do with improvements.. Any ideas??

Saturday, March 15, 2014

#Wikidata - #Cambridge revisited II

When you are interested in #maps, projecting the results of a Wikidata query on a map will become increasingly exciting. It seems obvious but only those items that have a geo-coordinate will find their place on a map. When I blogged about Cambridge revisited, the message was very much that we can find all items within a certain radius and, that we can process them with AutoList and WD-Fist.

Magnus is now providing us with something new. The results of a WDQ query are projected on a map. When you run a query, it could result in every item with geo-coordinates in a municipality, a county. It could be all the castles of the United Kingdom.. Just give your imagination some room.

As always, this is the first iteration of functionality. It makes use of components that we have been using for a long time. What will be interesting for us is to learn if the map functionality is able to cope. We learn by trying things out and at that, this is the perfect environment for you.

Friday, March 14, 2014

#Wikidata - #Cambridge revisited

When you are interested in #maps, it helps that Wikidata includes more and more geo-coordinates. So yes, you can put Cambridge on a map. This is old news, but somewhere hidden in the documentation of WDQ, it says that you can query for all items that are within a specific radius (in km) from a given coordinate.

When you apply this to Cambridge, the API result can be found here. The query definition can be used to produce an AutoList but you may also be interested in adding a missing image.

For me these tools and Wikidata is where we are "getting a job done". The job is being able to share the sum of all knowledge with our public. Wikipedia is great as far as it goes. Wikidata has the potential to inform in any language about any subject whether there is an article or not. It is becoming less of a potential as more data becomes available because Wikidata is increasingly potent.

#Wikidata - The Headless Horseman

As you can see, James Sheridan featured in the movie "The Headless Horseman". Eh, a James Sheridan featured in this movie. On the item for that disambiguation page, there are links to two Wikipedias. Disambiguation to the correct Mr Sheridan happens on the Italian not on the English Wikipedia.

It is not hard to find information for Mr Sheridan. He was the son of an actress and the brother of an actor and director. It would make sense when an item for Mr Sheridan would feature on a disambiguation page. That could be a precursor to smart red links.

#MediaWiki #Webfonts - real problems and luxury problems

The prime objective of MediaWiki is to serve Wikipedia pages. The prime objective of Wikipedia is to share in the sum of all knowledge. In order to achieve this goal, the content has to be readable and get to its audience. Once this is accomplished, any further improvements are the icing on the cake. They are the nice problems to have.

The Webfonts functionality was introduced to prevent "tofu". Tofu is named after tofu, the soya based ingredient, because it is typically served in rectangular blocks. When you are served tofu, you do not get to see the meat of the matter. You cannot read what is on a page. Consequently, Webfonts were introduced to prevent this epic failure.

As mentioned before, when the same technology can be used for cosmetic reasons, it will be used for a secondary goal. When the requirements of such secondary goals prevent the primary goal of serving data, the priorities are dead wrong.

When the use of Webfonts for cosmetic reasons is considered, the bandwidth it uses is a luxury, the performance it asks of servers is a luxury. This does not make Webfonts a luxury, its primary purpose is still to prevent tofu and consequently serve our prime directive. Consequently, it is not a question for any operations staff to consider if Webfonts should run or not. At most it is for them to decide if Webfonts in its current incarnation is good enough to serve fonts to make Wikipedia look pretty.

This issue is not understood by many people who dabble with the Wikimedia infrastructure. This becomes clear when you read some of the responses to the excellent explanation by Niklas. To put it bluntly, the decision if primary functionality can be removed is not an operational but a business decision. Given the consistent failures to protect this primary functionality, it has become an issue for the highest level of the Wikimedia Foundation.

Thursday, March 13, 2014

#Wikipedia is static ... that is ok-ish

Dobrodole is a village in Montenegro. There is no article in the English language Wikipedia. All that was known at Wikidata was the existence of an article in three languages and its coordinates.

That is enough to present these three maps. Actually, it is the same map but each has a slightly different orientation. When you want to use a map, all you need are coordinates. It is that simple.

Theoretically it is not much different in a Wikipedia. It starts with ... do I want to show a map. When you do, you can. When you want to do it "right" and scalable, it may be that you want to be able to cache the maps. However, before that becomes a problem, the wish for a map has to be expressed in some way in an article.

Yes, there are templates that can turn this on rather quickly. The question is not can this be done but how long does it take for a community to agree to anything.

#Wikidata - કારવાઇ (તા. કડાણા) a village in #Gurjarat

This is just one of the villages in the Indian state of Gujarat. It has the property "is in the administrative-territorial entity" and consequently it is really easy to create a query that gets you everything that is in Gujarat. For your information it shows that currently there are 15,974 known in Wikidata.

It is not known for this village what its name is in any other language. Many of the items do have names in English. Many items do not. I expect that many items can be merged. The question is who will be interested in doing this. It must be someone who knows Gujarati I assume.

Wednesday, March 12, 2014

#Wikidata search and automated descriptions

Reasonator uses #Wikidata wherever possible. There is no point in improving what is already good enough. Wikidata search is good enough so the results are used in Reasonator. It is used and the results are presented with automated descriptions.

To the right is a picture of the gate of Plötzensee prison. It was quite notorious during the Nazi regime; many people were executed by beheading or otherwise. There is a Plötzensee monument, a Plötzensee neighbourhood, a Plötzensee lake and, it helps when you can find the correct Plötzensee item when you are searching for it.

When you look at the lake, you will find that someone died there. Actually he died at the prison. When you look at the neighbourhood, you will find 185 people who probably all died at the prison. For the prison itself only 116 people are known to have died there. Several of these 116 have died at two locations.

The results of a search for Plötzensee need to be different in every language if the results are to be understood. It is not obvious to everyone that a "See" is a lake and, a "Denkmal" is a monument in English. With a location it is indicated that we know "it" is somewhere but it is not known to Wikidata what "it" is.

Thanks to automated descriptions it is possible to provide clarity in what the search results are. Obviously, when rich information is available in Wikidata, the results improve.

Tuesday, March 11, 2014

#Wikidata - the lord of the wikis

Like #Commons, Wikidata is intended to provide a service. All Wikipedias, Wikivoyages, Wikisources are serviced at this time for inter language links. They all have the option to use data from Wikidata as information in their project. In a similar way they can use media files from Commons.

The plan is to make use of Wikidata for media files. So far this is only considered for the media files at Commons. Expanding the scope to consider all media files seems obvious.
  • only one technology to manage all media files
  • improvements in technology and data will be shared everywhere
  • search will become possible in all supported languages
  • media files need to be stored only once
  • it will simplify the administration of media files a lot
  • it will compartmentalise any license and community issues
The existing technology for managing media files is functional for using media files in articles. It is good enough for uploading new images and it is nasty when you actually want to find something. What Wikidata will bring to the management of media files is the ability to use multiple languages, finding images based on multi-lingual tags and potentially a hierarchy of licenses that determines its use in different projects.

When it is the considered opinion at Commons that a file is not freely licensed, that image will become unavailable everywhere. Often there are good arguments why that same file can still be used; fair use is just one valid reason. With Wikidata it will be just a matter of appropriate settings to make a file generally available or not.

When for whatever reason a media file cannot be used anywhere, it is still important to recognise this file whenever someone decides to upload it again. When the data about all files is stored in one place, we can recognise such an upload easily and apply whatever policy to that file.

Technically the current management of media files will be surpassed by what a Wikidata approach will offer. People will expect the same quality of service for media files in any Wikimedia project. Wikidata has the potential to forge a much improved user experience. There is no alternative that will bring an improved, multi lingual user experience that is so badly needed to media files.

Monday, March 10, 2014

#Wikipedia - The Dr. Martin Luther King Jr. Achievement Award

When you search for the Dr. Martin Luther King Jr. Achievement Award in Wikipedia, you will find no article, no category and no list. You may find that it was awarded to Mr Oscar Peterson. When you search the web, there is not much to be found except that the Center for Black Culture and Research currently organises and presents the award.

There are many awards that are presented to people and, they are all considered to be notable by their peers. Another award that does not have an article, a category of a list is the Loyola Medal. It is just another award presented to Mr Peterson.

When people want to express the best what a specific demographic has to offer, awards are a wonderful thing. They express notability, gratitude, appreciation. Wikidata is a proper place to register awards and the people who receive awards. It is easy, it is obvious and the relevance of an award grows when the information about the people or organisations who were awarded becomes more complete.

NB the information about awards given to Mr Peterson is still incomplete.

Sunday, March 09, 2014

#Reasonator - bugs and a different kind of awesome

Unlike Pallas Athena, Reasonator did not rise fully armed out of the head of Zeus. It has developed incrementally and every now and then it has its moments where bug manifest itself. It is software and, Magnus does a splendid job at terminating bugs.

The typical workflow is as follows:
  • build a feature
  • test in test
  • promote to live environment
  • bug Magnus with a bug (reiterate process)
This works well enough. Typically the environment Reasonator operates in is stable and bugs are obvious. Sometimes there are issues. Issues like a major re-write of functionality and an upgrade of the Labs environment. This is when the unexpected happens and bugs have an impact on functionality.

At this moment the update process kills Reasonator. But as the show must go on, it is better to do away with updates for some time until there is a bug fix. At this moment the daily backups are not available in the new Labs environment. The problems with updates cannot be limited to only one day.. BUMMER

The good news is that one bug has been found and fixed. It was late so it was time to sleep. Today it will be time to test. When things turn out to be fine, the update process will run again and both Reasonator and WDQ will provide near real time results from Wikidata.

It is when all the small things that happen every day that amount to so much start to show up again.

Saturday, March 08, 2014

#Wikidata - Elcot, Swydd #Berkshire

Elcot, is a hamlet in Berkshire. According to the English article, its claim to fame is the four star Elcot Park Hotel.

When you search for Elcot, you find the hotel, the hamlet twice and a disambiguation page. The other item in the list is called "Elcot, Swydd Berkshire". swydd probably means "county" and, when you search for swydd, the result is interesting. It is not that there are 1721 results, it is the impression that you get that so many of them because items for Welsh articles co-exist with items for English articles.

PS there is no picture for the Elcot Park Hotel. Thanks to Geograph there is a picture for the Halfway Inn.

#Wikidata - #Shahrud County, a sharestan from #Iran

When attention is given to detail, details like what refers to Shahrud County, you will find that much of it is in either English or in Persian. People who know both languages can assess the quality of the information.

More things that can be located on a map are being placed on the lowest level of an "administrative entity". As a result the number of items associated with that entity are manageable. You can identify what is missing, you can identify what has duplicates.

It may be of interest to see what is on the map for the "administrative entity" you are living in.

Friday, March 07, 2014

#Wikimedia #Deutschland is not an "instance of" "list of Wikimedia chapters"

Wikimedia chapters are notable. Consequently several of them have articles and therefore Wikidata items exist. Above you see how Reasonator presents the data for Wikimedia Deutschland. You may also notice that it is a "list of Wikimedia chapters".

This is due to a peculiarity in Wikipedia; there is no article that describes "Wikimedia chapter", there is only an article that lists them all. It is a pattern that is repeated all over for many topics.

Wikidata prefers to have a "subclass" for a subject. Lists are ... multiple instances of such an subclass. The problem is that many Wikipedias expect that the item they refer to has relevant statements. Statements that can be expected on a subclass.

The solution may be controversial but Wikidata does no longer consider that when an article says "List of ..." it should be part of an item that has "instance of" "Wikimedia list article". That is reserved for instances where a Wikipedia has both an article for the subclass and a separate list article.

Thursday, March 06, 2014

#Wikidata - the building of the Nationalbank in #Bern

The building of the Nationalbank in Bern is a monument. It does not have a Wikipedia article and yet, it is known to Wikidata.. When you check out the external source, you find that the source is running on the Toolserver. This probably means that this tool needs to be migrated to labs in the near future.

The database is on the Toolserver probably because of Wiki loves monuments. When you look at the data, you find that among other things it refers to a list on the German Wikipedia. You will also find that much of the data can be included in Wikidata. When all the data is hosted on Wikidata, it is not even necessary to migrate the data to labs.

When monuments like this building have their data on Wikidata, it will make it easier to integrate Commons in Wikidata. As you may know, this is scheduled for the second half of 2014.

#Wikimedia #Research - plenty of #opportunity, there is even #money to be had

When numbers are crunched, when analysis are made, the results reflect the questions that are asked. The Wikipedia page views for instance show clearly that the numbers are down in absolute numbers (-3%) and the numbers for the English Wikipedia is down even more (-7%).

When you look at the existing research, it is biased towards the biggest Wikipedias and it is biased towards the countries where the most Wikipedia readers are. Arguably this is where Wikipedia has saturated its markets. Improvements made to the Wikipedia experience will help us at best to retain our existing public.

As the "other" Wikipedias have a potential for growth, it is obvious that research can be done to determine what is hampering growth. Growth does not happen because of all kinds of reason; lack of content, not enough quality, no NPOV,  it is what people are not looking for, poor network performance, competitors that do a better job. Proper research will find more issues and, research may rank them differently for particular markets.

We do not know what people are looking for in their Wikipedia and consequently we cannot work on providing the missing information. We do not know what prevents people from contributing in many languages so we can not make the right improvements. We do not know if and where network problems make using any of the Wikimedia projects unattractive or even impossible.

This is why we need research. There are universities in every country where students can study these issues for their language(s), their country. There are grants available for such projects. What we need is research that aims to highlight existing issues and help us find ways to address these issues.

Wednesday, March 05, 2014

#Wikidata - the seventeen townships of Boone county #Iowa

There are multiple Boone counties and this is about Boone county in Iowa. Recently there were some issues with the reliability of the data used by Reasonator. Data was missing; it did not show. For several days we have been getting the data right that should show in the "Category:Townships in Boone County, Iowa". Seventeen townships should show up. Initially none showed up.

Now several days later, the migration of Wikimedia labs has happened. The dump of Wikidata finished but it did not find its way to where it was expected to be, the IP address for the proxy for the changes to Wikidata had changed, the incremental backups of Wikidata have been suspended during the move and, life goes on regardless.

The big thing in all this is the migration of Wikimedia labs. It is under way it has been well prepared and a lot of TLC is given by Coren and the other labs staff to the people who want their tools migrated. They are doing an impressive job. The incremental backups are one of those jobs that were created on the fly and gained unexpected relevance. At some point in time they will be running again at their new home.

Some of the missing townships did not even show when everything was supposed to be good. Further analysis indicated that they have to be both a township and be in Boone County, Iowa. Now that they are and do, they show up all right.

#Reasonator - Mrs Vimla Varma in #Telugu

In #India an edit-a-ton is under way to write about notable women. One of them is Mrs Varma. An article was written about her in English and from the text many statements were added in Wikidata.

This was given attention on the Indian mailing list and as a result you can observe that most of the information about Mrs Varma can now be properly presented in Telugu. One additional benefit is that where ever the Wikidata search functionality has been added, you will be able to search for విమల్ వర్మా and you will find information about Mrs Varma.

Tuesday, March 04, 2014

#Wikidata - external sources

Mr Yamamura makes his living working hard at playing a game. Good luck to him. Many people are interested in the game of rugby so much so that ESPN has a website dedicated to the heroes of the game. It is where you can find all the fine details you do not find at Wikidata.

The good news is that when a new property like the "ESPN Scrum ID" is added, the relevant information ends up on the talk page of the new property. Typically it includes a link to the external source. It is not hard to use this information and make Reasonator aware of yet another external source.

When you like your Rugby, we make it easy for you to find the relevant information at ESPN as well.

#Wikimedia and growth

Growth is one of the development projects of the Wikimedia foundation. Its article states: "The purpose of the team is to find ways to attract and retain Wikipedia editors". It focusses on the English language Wikipedia and this is problematic.

It is problematic for several reasons. The most obvious one is, the English Wikipedia has the most developed community and infra structure and this does not compare easily with most other Wikipedias. When en.wp is compared to a tree, it takes away the light and nutrition from the others. Once this is understood and accepted as an issue, the fact that less than 50% of the traffic is in English and that this percentage is decreasing makes it obvious that growth is happening elsewhere.

When the purpose of growth is to attract and retain Wikipedia editors and when growth is not happening where the focus of the growth project is, it is likely that tools are developing that are not considered as part of the growth project. Take for instance the Reasonator and Concept clouds, both provide accessible information based on information that is largely based on information from any and all Wikipedias. When people spend time adding labels to Wikidata items they will quickly make the information provided of a higher quality in their language. It will show up in search results and it will provide not only basic information on a subject but also information on related subjects.

The classifications of editors that have been validated for the English Wikipedia are not valid for many of the other Wikipedias. When there are few articles on any subject, basic information is key and not so much the featured articles of the en,wp that are all to often too long to read and not that relevant. 

This is however not what this growth project is about. It is well intentioned but it puts the needs of the projects where growth is happening in the shade.