Friday, January 31, 2014

#DBpediaAmsterdam - a confererence update

Speaking about #Wikidata at the #DBpedia conference in Amsterdam was a joy and a privilege. It was a joy because there was a real interest even a practical interest in what can we do to make a difference.

These are some of the points that they found most relevant
  • Wikidata is used in templates in several Wikipedias
  • for DBpedia, the specific properties in use in Wikidata are mostly a mapping issue
  • having more properties in use than are on display in a Wikipedia is in Wikidata not an issue
  • Wikidata is language agnostic while DBpedia has projects concentrating on specific languages
The process of how Wikidata is filled with information by bots was discussed at length. What bots do is harvest information from a Wikipedia and update the relevant items in Wikidata. This process is exactly the same for the update of DBpedia. The innovation that DBpedia can bring is by adding one extra functionality; reporting on the differences found. Obviously this will include new statements but identifying where Wikidata and Wikipedia have differences is really relevant when we work on improving the quality of the information we provide.

The relation of GLAM and Wikidata was another topic that received attention. In one presentation, Abraham Jacob van der Aa was mentioned. I added several statements for Mr van der Aa and showed how it shows in multiple languages in the Reasonator. It is not unlikely that as a result the Koninklijke Bibliotheek may provide us with information on Dutch authors in addition to what is available in DBpedia. The prospect of the Commons / Wikidata integration was also discussed; the existing links from Wikidata to Commons were mentioned but the possibility to link to the "Creator" and "Institution" templates were appreciated as having relevance to current practices.

The DBpedia conference had multiple simultaneous sessions so it should be obvious that much more was discussed. It is also my update of a conference that dealt with so much more and where a really vibrant community showed off what issues they are dealing with.

#Wikidata - the day Pete Seeger was born

Recently Pete Seeger died. The day he was born is random enough to show you the latest improvement in #Reasonator. All kinds of persons and events are on display. Some do not have a label in Russian while others do not have a label in English. You may notice that the Paris Peace conference was under way..

You may even notice events, the death or birth of people that are missing. For these events, just add them in Wikidata and given the vagaries of replication, you may notice them a moment later. Technically, five queries are happening simultaneously. They may take some time when you experiment with longer periods. To do that you click on the calendar in the sidebar. To make the waiting time a bit more dynamic, a waiting bar has been added.

By showing you events that are deemed notable in any and all of our projects, Wikidata truly shares in the sum of all knowledge. Reasonator is happy to visualise the sum of all knowledge as we have it in Wikidata.

Wednesday, January 29, 2014

The #DBpedia experience

DBpedia has been a regular subject on this blog. It is the culmination of the work of many research projects and its data is based on what can be retrieved from Wikipedia. One of the results is that an impressive amount of resources know how to link to DBpedia.

Tomorrow there is a DBpedia conference in Amsterdam. One of the subjects is the future of DBpedia. As far as I am concerned, there is a bright future for the approach that made DBpedia a reality. A future that they will share with Wikidata. In the end, it is the Wikidata project could do with more cooperation.

Reasonator ... reasoning and automating

When you look at "random items" in the "Reasonator", you will get the odd category, templates etc. Typically they have a colon in their name. When such an item has no statements, Reasonator will give you the opportunity to identify them as categories, templates etc.

It makes use of the oAuth technology; you have to authorise Widar to make statements on your behalf. For as long as you are logged on to a WMF wiki you can identify such items with just one click.

#Wikidata – in #taxonomy a #publication is not a #source

In the taxonomy of species, there are several  components required to make a “valid” description for a taxon.  They are the “taxon name”, the author, and the publication. The publication itself has its own components, among them is the name of the publication and the date of publication. They map neatly to what Wikidata uses for sources. It is however not a source and it should not be treated as a source.

A source asserts that something is true. There can be many sources that make such assertions. What they all do in taxonomy is refer to the original publication. A publication includes the formal description of a taxon.  There are only two languages allowed for a formal description; they are Latin and English.

When publications are treated as sources, it is hard / not possible to distinguish them from other sources.  Consequently the data loses its relevance as the information that makes up a valid description becomes incomplete.

Tuesday, January 28, 2014

A bit selective, not random

When you are looking for “random items” in the "Reasonator", you want to see items that are of interest. For all kinds of reasons, Wikidata includes references to templates, categories, disambiguation pages and list articles

They do not have much to offer that is of a random interest.

For this reason are all such items filtered from the random results. Sadly many categories, templates, disambiguation pages and list articles are not marked as such. As it does not take much effort to add the relevant statements, we hope that in the future you will see them not as much as you will today.

Are you a living #scientist with a #Wikipedia article?

One of the results of including VIAF records in Wikipedia is that VIAF gained more traffic. You do know that your Wikipedia article attracts attention for you. What is relevant is that people can progress from your article to your publications.

In Wikidata we do support ORCID. It only takes you a moment to add your ORCID identifier and, if you feel good about it, for some of your colleagues as well. Several of my friends have their ORCID identifier included and.. this is how it looks for one of them in Reasonator.

#Taxonomy, where there is nothing ...

When you do a "Random item", you can hit a species out of a genus like the Neotrichoporoides. This is a genus that is the "parent taxon" to sixty species. On the Swedish Wikipedia they have a category with all sixty species.

In "Widar" we have the technology to add a statement for "parent taxon" to all those sixty items. For all these species, it is their next step up for "Reasonator" to show their taxonomy.

Monday, January 27, 2014

#Taxonomy? "It is complicated"

Taxonomy in Wikidata is .. "complicated" to be polite. Take this bird for instance.. When you follow all the indicated taxons upwards, it belongs both to the kingdom of the Animalia and the kingdom of the Metazoa.

There are many simple solutions to this problem. One is that you only follow what the "parent taxon" has to say. The other is indicating to the taxonomy nerds that their house is not in order.

It is probably safest for me to leave it to the boys.. In the mean time all these taxons are a reason for a good laugh.

#Wikidata - experience the lack of "statements per item"

Wikidata is a young project. It is growing fast. A high number of statements per item ensures that an item is well connected in this web of data. As you can see in the graph, most items are not well connected; they are relatively low in what information can be provided about them.

Typically the items people are most interested in are relatively well served but often incomplete. Mr Rutte for instance held more functions. Considering the rest of Wikidata, he is well served.

If you are interested in experiencing the lack of statements, you can clink on the "Random item" button in the Reasonator. While you are at it, you can make more connections in Wikidata.

#Reasonator - the need for speed

If Reasonator is to work well and has to continue to work well, it has to scale and be as efficient as possible. Magnus has done a lot of work recently to improve its responsiveness. As it is, it does get its data from elsewhere and it is all about efficiently handling requests for information.

These are three results of the recent improvement in performance. 
Johann Sebastian Bach (Q1339)Wikidata : 18 sec (no images, no related items)
Old Reasonator : 12 sec (14 with images)
New Reasonator : 6 sec (9 sec with images)
Mammillaria (Q311120)Wikidata : 5 sec (no images, no taxon tree, no related items)
Old Reasonator : 8 sec (11 with images)
New Reasonator : 7 sec (10 sec with many more images)
Cambridge (Q350)Wikidata : 7 sec (no images, no location tree, no maps, no related items)
Old Reasonator : 8 sec (9 with images)
New Reasonator : 7 sec (10 sec with many more images)
The data is retrieved from Wikidata using the available APIs. There are two ways to look at that; you either accept what you are given OR you know that WDQ is able to provide the same data but faster. WDQ is as you read on Magnus's blog all of Wikidata but running completely in memory. At start time it is only 868MB RAM in size.

Help is welcome to make Reasonator and WDQ scale and to improve its user experience. Its page view statistics indicate a bight future and therefore an urgent need to consider all of this. Things that are considered are:

  • using puppet for better management and multiplexing the services provided
  • fixing memory leaks
  • packaging WDQ and Reasonator
We expect that Wikidata will be able to cope with us in bringing the best information available to us.

Sunday, January 26, 2014

#Reasonator - Better #RtL #language support

Several more languages are now identified as "Right to Left" languages. Arabic, Urdu, Divehi, Persian and Hebrew are supported. We certainly missed a few languages. Let us know..

When you look at the information for Golda Meïr in your language, you are bound to find a big issue in the information that is provided. It is one of those things where decisions made in a Wikipedia context are problematic in Wikidata.

#Wikidata - the case for including #Scopus references

Scopus is "the largest abstract and citation database of peer-reviewed literature". It is used a lot. Recently a publication looked at the coverage of scientists in Wikipedia. The scientists that were considered, all have their Scopus identifier.

To facilitate follow-up research, it makes sense to include Scopus identifiers in Wikidata. It makes sense to add other identifiers as well. For instance ResearchGate identifiers are more friendly to people looking for information. It is not pay-walled.

I am not suggesting that every person identified in Scopus or ResearchGate is notable. The point is that references to external sources are useful and allow for more effective research.

#Reasonator #statistics

I am pushing the use of the "Reasonator" because it is at this moment the tool that provides the best information based on the Wikidata data.

This is the first time any data relating to something that is similar to "page views" is published for Wikidata. This data excludes the use of the "test" environment; as the Reasonator is frequently updated  it regularly breaks.

You can bring a horse to the water but will it drink. It is really fortunate that we have the data to create statistics about the use of the Reasonator. So we can share with you what an impact Reasonator has. As you can see it is growing nicely but Wikipedia does not have to fear for its existence any time soon.

Saturday, January 25, 2014

#Wikidata weekly summary #94

Every week, without fail, there is the Wikidata weekly summary. In this weeks summary, Cactaceae is the "showcase item". Cacti are really fascinating plants and there is a lot to say about them. When you look at their family taxon through the Reasonator, you will find several small improvements.
  • new external resources have been identified
  • the taxon name and its qualifiers are now prominently on display
The one thing that is puzzling is that there are no taxons known to Wikidata that have this family as its "parent taxon". The theoretical implication is that there are no species that are part of the Cactaceae family.

#Wikidata - more #data provides more #information

Working on Wikidata is addictive. Particularly when you see the information improve before your eyes. I mentioned to a friend how Sukarno looks in the "Reasonator". It was not bad. It could however be improved upon.
  • all the persons now have gender information
  • the signature was added
  • information was added for several of his wives
  • information was added about husbands and children of his children
  • qualifiers were added about his presidency of Indonesia
There is a place for adding the same statements in an automated fashion. There is also a place for improving the quality of information. It is particularly important for those items that are likely to be seen by many people.

Friday, January 24, 2014

Six million #English labels in #Wikidata

Wikidata has six million labels in English. This gives it almost twice as much coverage as the number two language German. It is not that known nor relevant to know which label was number 6,000,000.

It is more interesting to compare this number with the number of articles of the English Wikipedia; they currently boast 4,431,974 articles. It is very likely that most if not all of the items in Wikidata are part of the "long tail" of what people are interested in. It would be interesting to learn how often these labels are actually requested.

How #Reasonator can help write a #Wikipedia article

In any well written Wikipedia article there are links to other articles. These links are what makes Wikipedia so powerful; without them it would be a collection of unconnected articles. Obviously, articles about the same subject are likely to connect to the same subjects in any language. In the "concept cloud" you find a list of all the concepts that are linked in any language to the Tasmanian devil.

The concept cloud is now available from the Reasonator. It will show you the labels that are available in your language and the number a concept is used in another language. You can appreciate a concept cloud as suggestions of what to include in an article.

An other use is to add those labels missing in your language.

Thursday, January 23, 2014

#Wikidata - #language support in action

Plattmakers is a dictionary for the Low Saxon language. It is also an item in Wikidata. It has a label in in Low Saxon and all the other labels used have been added as well.

It is really gratifying to notice when work is done in "yet another language" and see how effective it is. Currently there are 23,404 articles and 33,750 labels in Low Saxon. One neat trick is that they are adding labels for people's names as they are written in exactly the same way in many other languages.

#Pinocchio - a #Wikidata #books showcase

The Adventures of Pinocchio was identified as a great example of a book in Wikidata, Many great arguments were made why this particular work is such a great target:
  • it is a very well know story
  • it has been translated in many languages
  • entries for single characters are already available in Wikidata
  • scanned and proofread text is available in Wikisource, Commons, IA
  • it has inspired several derived works
When you search for "Pinocchio", you will find many references; to movies, species as well as the many translations and adaptations of the book by Mr Collodi. Many of them could do with some attention. When you zoom in on the character of the "Coach man", you will find a picture waiting to be moved to Commons.

What also could get some attention is the presentation of authors, books and even editions in the Reasonator itself. Not only is there the layout to be considered, there are also the inferences that can be made. For instance should the "derived works" even "derived characters" be shown..

Wednesday, January 22, 2014

#Wikimedia and site #performance

Every month there is a "WMF metrics meeting". One of the interesting talks starts at 37:46, it is Ori Livneh talking about the site performance of WMF projects.

The bottom line is that page load time matters and, that it can be improved. The type of optimisation Ori is talking about is latency where distance from the WMF servers is not taken into consideration.

A lot of work has already gone into optimisation; one of the reasons for introducing LUA is that it performs so much better.

#Wikidata - it is all in the perspective

Many items in Wikidata do not have a label in English, I hope you will agree that it is more informative to show a label in any language than to show the internal Wikidata number. Q15634732 is a fine example; it is a year that is part of the Telugu calendar. The Telugu knows a cycle of sixty years and, this is one of them.

The year విరోధి is something that recurs and as a consequence, I was told it should be a "subclass of" a "year" and not an "instance of" a "year". When you check this item out, it is made obvious that it is part of the Telugu calendar.

This kind of specificity is needed for any year. Just stating 2014 for instance is not enough for Wikidata; multiple calendars know events that happened in a year that can be identified as "2014". The year 2014 in the Julian calendar for instance is 13 days off from the Gregorian or "common era" calendar.

I understand the use of "subclass off" in this instance but I fail to understand why 2014 and all the other instances do not have a qualification of the calendar they belong to.

The #Wikimedia #language support dance

The "Universal Language Selector" was removed as a default option from WMF wikis. It is a victim of its success because it is (gasp) actually used. If there is a problem, it is that some people consider that they do not need it, think it ugly.

ULS is needed for:
  • people with dyslexia
  • text in "other" scripts than what is default on a device
  • multi lingual wikis
What this whole fracas shows is a problem with communication. It has not been said that the use of ULS has changed much recently. We know the servers are coping. It follows that improvements can be made while ULS is continued to be in use. This would ensure that at least 7% of our public (the people with dyslexia) can access the knowledge that we want to share with them.

For Wikidata the ULS has been enabled. A switch that allows the default use of ULS was created. Communities can request for ULS to be switched on for them as well.

The people who consider that ULS is not needed for them, should not complain when information does not render properly. A switch that enables/disables ULS gives people the choice to rely on their local resources, it is for them to make sure that their hardware is actually able to cope with all legitimate content.

We could help them by make it easy for people to install fonts locally. It would make the use of web fonts redundant. That is actually the optimal solution  :)

#Wikidata - Johann Sebastian without clutter

When there is a lot of information available, it is easy to fill a screen with so much information that it confuses more than that it informs. At the same time information is there to be shared and not making it available is not an option either.

The latest iteration of the "Reasonator" brings us pop up windows that provide all that extra information. It will direct you to Commons, to Wikisource, to Wikivoyage and Wikipedia when applicable. There is another neat trick; when an item is "red lined", it will urge you to add a label in Wikidata.

Tuesday, January 21, 2014

#Reasonator is red lining your #data

#Wikipedia has its red links and following this great example, Reasonator has its red lines.

A Wikipedia red link suggests that there is room for another article. Wikidata already knows much about the missing information, to do this it falls back to whatever fall back language is indicated and ultimately it falls back to English. All this to share in the sum of all knowledge.

What is missing is a label. As you may know Reasonator operates in any language. By red lining a label, it is suggested to add a label in the selected language. Once this label has been added to Wikidata, it will be used everywhere where an item is used.

Now consider that we replace Wikipedia red links with the information served up by Reasonator... Don't you think it is a big step up from serving nothing at all?

Copyright free #books from #India

The West Bengal Public Library Network is a digital repository of copy right free rare books which are under the custody of various libraries of West Bengal, India. The aim of this digital repository is to provide free service to academics, researchers, and students. It is a digital service that collects, preserves, and distributes rare digital material.

The books provided by this library network are excellent candidates to source facts and to make people familiar with their cultural heritage. It will get us for instance a grammar of the Pushto languageHalayudha's Abhidhanaratnamala which is a Sanskrit vocabulary..

The fact that these books have been digitised indicates their notability. By including them in Wikidata we promote the use of these resources. It is an obvious win win situation; the WMF achieves its goal by enabling people to share in the sum of all knowledge.

It is elementary dear #Wikisource

Arthur Conan Doyle is the poster child for the #Wikidata integration of Wikisource. Having such an example is important because it enables to look at all the issues that may arise.

As the process of importing information is under way, it is therefore the best of times to look first for problems in Wikidata. As the Reasonator visualises the information, it is an obvious place to start. Moving the BISYS reference to the external sources is something I am trusted to do so that was an easy one.

What is problematic is the reference of Mr Conan Doyle to so many movies as a "screenwriter". The works he authored have been the basis for the real screenwriter to write his or her script. This practice is not isolated to Mr Conan Doyle, it is suggested that Shakespeare wrote scripts for many movies as well.

#Wikidata - Wendy Hall, a #computer #scientist II

What better subjects for important changes than computer scientists. Mrs Hall has no relatives that have the notability to qualify for a Wikipedia article and consequently Reasonator no longer show you an empty "Relatives section" like we did in the past.

Magnus and Amir are experimenting with "left to right" support like in the above example for Persian.

As always there are many ways to further improve the Reasonator functionality. Some call the current approach to development RAD, Magnus calls it just "programming".

Monday, January 20, 2014

#Reasonator with improved #language support and #search

The amount of data in #Wikidata grows; the challenge is to find the information that is in there. Showing, visualising the information like Reasonator has been doing is great but you first have to get there. Search has now come to Reasonator; in effect it is the same result provided by the Wikidata search. The results are an improvement over the standard Wikidata offering because "automated descriptions" are used where possible.

When a subject is of particular interest to you, you can improve the results by adding statements where needed. The best part is that additional statements do make a difference whenever there are labels in the language that is in use. Currently there are no statements identifying the Saudi kings and it shows.

That language is something that you can select. Is it not wonderful how things are progressing ?

Sunday, January 19, 2014

More #Wikidata tools

Many articles, particularly in the smaller Wikipedias are not linked to any other article. Getting these articles linked to other articles is often something that is done best by someone who understands the language. Finding these "lonely items" is what you do with the lonely items tool.

It takes some effort to link from Telugu or Hindi. If you do not know these languages, Google translate is probably your best friend to make an educated guess.

Saturday, January 18, 2014

#Wikidata - Wendy Hall, a #computer #scientist

Mrs Hall is one of the people who provided a sound byte to a project run with / by the BBC. It is only fitting that her sound byte is available from the Reasonator as well.

Mrs Hall is quite the computer scientist; she has been awarded three honorary degrees by three universities. It is quite fitting that we honoured her achievements by making sure that they actually display in the Reasonator. She has also been awarded a "fellowship of the Royal Society".  All the fellows known to the English Wikipedia should now be known in Wikidata as well thanks to Widar.

These are technical improvements, but as important is the growing interest in adding labels to the items of Wikidata.. Just consider this; adding a label for "Fellow of the Royal Society" has an impact on 502 items. It will take you at most a minute of your time.

What #Wikidata has to offer to languages in the #Incubator

One of the objectives of Wikidata is to provide "information box support". These info-boxes use information from Wikidata on articles that are linked to it. The information is shown in the local language. For Incubator, things are .. complex. For starters, incubator is a multiple language wiki. To make it worse, Incubator includes some languages that will never have a Wikipedia.

Wikidata is more inclusive in its support for languages than any of the other projects. It can support extinct languages, it can support constructed languages, it can support multiple scripts for the same language and, the requirements are minimal. What is required is that the language is recognised in the ISO-639-3 and that a script is actually in use or has been used.

Let us consider the Maithili language. There is an incubator project for this language for a Wikipedia and for a Wikibooks. It can be used in Wikidata. Currently there is one label for one item. When all the labels used on relevant items like "India", "Gandhi", "Delhi", "horse" and "rabbit" have been added, millions of items will show at least a few labels in Maithili. The information for the info boxes for these five items are complete as well and could in theory be used in the Incubator project.

Info-boxes supported in the incubator that use Wikidata data seems to be an achievable goal. Adding Wikidata search support in the Incubator is actually a no-brainer because it just works. Thanks to the Reasonator all that data will be really well presented, it will even be informative. That is a step up from many of the awkward stub articles that are often found in the Incubator.

Wikidata is a great start for providing information in any language. All you need to do is add missing labels. The next step is start a stub with an informative info-box. This can be followed up by taking the time to write a decent article.

You know what, this does work equally well in any Wikipedia.

Friday, January 17, 2014

#Wikidata - making #information from #data

Wikidata contains lots of data and making this into information is a matter of giving it a place, providing a user interface, bringing some order and even visualising the data. The latest version of the Reasonator is pushing all this.

As you may recognise, the user interface is in Dutch, at this time it could also be French, German, Telugu, Malayalam. We would love to have localisation in a few hundred more languages.

When it is about a person, we can show the family tree as well as the names of the near relatives. We provide even better support for external sources and, how do you like the way we show the Wikis connected to this item?

The best way to support your language? Just start adding the missing labels. The effect of one hour of work can be really amazing..

#Wikidata - #Telugu years

The Telugu calendar has a cycle of sixty years. All of them have their own name. For all of them, there is an article on the Telugu Wikipedia. For some of them an item exists in Wikidata and because the user interface has been translated into Telugu, it is quite informative for the people who know that language.

The minimum amount of data for these years should be:
  • a label for that year
  • the fact that it is a year in the Telugu calendar
  • the preceding and succeeding year.
A label in English is nice to have.

Thursday, January 16, 2014

#Wikisource in #Wikidata

Wikisource is finding its way into Wikidata. It is different. The Wikipedia article about a book is something considered relevant in any language.

Wikisource provides an integral text in one language. Much of the information about a source is therefore relevant only in that language. The characters in that source are all in that one language. It may even be that the names of characters are different from one edition to another.

It is early days for Wikisource in Wikidata. It will be interesting to see it develop.

#Wikidata - we link where we can

One thing Wikidata is really good at is indicating where information can be found. You find 20 identifiers for Michelangelo to sources outside of the Wikimedia Foundation as well as the links to the 173 Wikipedias that have an article about him.

When we know how to link to these external source, we enable you to go directly to the page with the information about Mr di Lodovico Buonarroti Simoni. When we do not know how, like for the "BAV" (the Vatican Library) we just provide you with a number.

It is not unlikely that many more resources have information that is related to Michelangelo. If it is not information about the man himself, it is about the large body of work he has left us with.

At Commons you can find a lot of his work. When you drill down to the individual media files, it is not always obvious that the subject of an image is a work by Michelangelo. Sometimes you have to infer it from the title of a picture but it is much better when there is a reference to a template that indicates him as the "creator".

When Commons gets its update with Wikidata technology, these Creator templates will become really valuable; they will make it obvious who the creator of the object of a media file is.

Wednesday, January 15, 2014

#Wikidata supports #Wikisource - and so does WD #Search

The first wiki links from Wikisource have been added to Wikidata. Magnus has already updated WDsearch to support Wikisource.

This is excellent news.

The one thing that really blew my socks is that the English Wikipedia makes use of WDsearch. If anything it is proof that Wikidata adds value. The best part is that it will only get better.

#Wikidata a #description is most often redundant

The graph show how much effort goes into the creation of descriptions. Given the number of changes involved, this can only be done with a bot. Consequently these changes are based either on information in Wikidata or on information from elsewhere.

When the information is based in information in Wikidata, more statements on the items involved will lead to an improved description. This is exactly what you will observe when you have automated descriptions are enabled. Add a few statements and the descriptions is improved in any language, dependent on the availability of labels.

I applaud the effort but I would prefer the automated descriptions functionality to be enabled for everyone. It takes a bit more effort from our servers but it provides a superior experience.

More about calendars

At #Wikidata more and more calendars are identified. Some are based on the moon, some on the sun others are ... complicated. Many are in active use and some are historic.

Some of the issues:

  • a week is not part of a month, weeks can be in multiple months
  • a week is not part of a year, weeks can be in multiple years
  • months in different calendars with the same name need to be separated when they are not the same
  • there are multiple divisions of a year, think months, seasons
  • calendars in use in India are complex
Adding items for months that Wikipedia merged together is obvious; it is the only way to ensure consistency and add statements specific to a calendar. This may prevent the use of info-boxes for now. Such issues will arise in other places as well and there will be a solution in time.

It will help when people with expertise get involved in getting the information right. 

Tuesday, January 14, 2014

Happy new year !! It is the year 2964

It is the year 2964 in the Berber calendar. This calendar is used in Morocco, it is based on the sun and consequently it is older than the "common calendar".

When you read about this calendar it becomes obvious that the article can do with some polishing. The same is true for the Wikidata entry. It is not obvious at all what data we should register for any and all calendars.

Pigeon breeds

When you add content to #Wikidata, you can find that you want to add "another one" and "I want to have two more in this picture". What you see to the right are the illustrations that "Reasonator" shows when you are search for "pigeon breed".

These 18 pictures show really well how diverse a pigeon can look like. These pictures become available on the Reasonator page of an individual breed when you add an "image" to the Wikidata item.

When the Commons integration with Wikidata starts to happen even more pictures will become available in all our supported languages. At this time you will not find the "English owl" on the English language Wikipedia for instance..

Monday, January 13, 2014

#Cats and #dogs

#Wikipedia has articles about all kinds of cats, dogs, horses, sheep, goats and chickens. Commons has pictures about all kinds of cats, dogs, horses, sheep, goats and chickens.

Many of those articles or pictures are about a particular kind of animal, a breed. There is so much information that can be added about all these breeds. Where did they originate, are they recognised in specific countries, what colours and patterns exist...

Wikidata knows about more breeds than any Wikipedia. Widar was used to add statements based on the content of categories about breeds from several Wikipedias. It is much easier to be inclusive in Wikidata and, it is much easier to complete the sum of all such knowledge in Wikidata as well.

#Wikidata - the #discrimination between genders

The occupation of Mr Douglas and Mrs Zeta-Jones is according to Wikidata: "actor / actress". When you look at the picture it is obvious; one is an actor and the other is an actress.

Wikidata knows enough about Mr Douglas and Mrs Zeta-Jones to infer how to label them. One is known as a "male", the other as a "female".

To enable Wikidata to infer labels that are determined by gender, labels need their own statements. The MediaWiki software has the functionality to deal with gender; it is just a matter of applying this logic.

Sunday, January 12, 2014

#Wikdiata - #rabbit breeds

My father used to breed rabbits. His favourite breed is the "Fauve de Bourgogne".  He bred them to the Dutch standard and showed them off regularly at local and national competitions.

When you search using Reasonator for rabbit breed, you will find many breeds that are known in several Wikipedias. At this time three images are found in "Related media". When pictures are added to the 109 different breeds, more pictures will show up.

When labels are added to the "rabbit breeds", more children will find pictures to illustrate their projects.

Saturday, January 11, 2014

#Wikidata - interest by companies

Wikidata is not #Wikipedia; it has its own policies and, many of them are evolving as the project matures. Wikidata is "not about facts at all, it is about what sources say".

At this time you are, like me, forgiven that this is not all too obvious. Not that many statements have sources associated with them. Most existing sources are references to Wikipedia even though Wikipedia proclaims that it is not a source.

In Wikidata we do know sources, many items have identifiers in multiple external sources. These are run by all kinds of organisations including companies and public organisations. Much of their reputation relies on the accuracy of the facts provided.

I received a request from India to help with a short presentation for entrepreneurs. The question is how they can use Wikidata both the data and the technology. For me the answer is simple; the data is freely licensed so they can use it as they see fit. I am an advocate of people improving the content of Wikidata on the subjects they care most about.

Wikipedia is different from Wikidata. Wikidata is interested in what sources have to say. When companies want to contribute and clearly identify themselves and provide quality statements with source information, they are more than welcome as far as I am concerned. When this is deemed to be not acceptable by the powers that be, companies, organisations can provide an identifier to their resource and, our discerning public can gain access to that part of the sum of all knowledge anyway.

#Wikidata - Reasonator; next generation

Magnus did it again;
  • search functionality added to the Reasonator
  • both the generated description and the added description are shown
  • more than 50 results can be shown for every item
Every time when Magnus adds to his tools he provides us with new perspectives. This is the direction where Wikidata will become a power to be reckoned with.

Friday, January 10, 2014

#Wikidata - #Chemistry data

Many facts are known about a compound like benzene. As a result benzene is known in many databases. These databases are relevant enough to provide the identifiers in Wikipedia info-boxes. One of the challenges in the English Wikipedia is to ensure that all these identifiers are verified and not vandalised.

At this time there are sixty nine articles about benzene. All of them could be served with the same information from Wikidata. It is even easier to make sure that information is not tampered with. There are no small textual changes. With so many external sources to pick from it should be possible to write software that compares any of these values with these external sources.

Such software can be used to monitor against vandalism as well. At some point Wikidata may have such high quality data that external sources will verify their data against what is known at Wikidata.