Thursday, July 16, 2015

#Wikidata - failing the #Freebase community

#Google in its infinite wisdom assumed that as Wikidata has a bigger and growing community, it will go beyond what Freebase could do. It had the power to end Freebase and it did in the expectation that the two communities would work together. An assumption that is quite reasonable.

When you work together, you share resources and you have respect for each others achievements. Wikidata may have the bigger community but Freebase has the most data and for whatever reasons the data from Freebase was not accepted at Wikidata.

The consequence is that the people who rely on this data now lost their source. It was lost because the service at Freebase was discontinued. It is therefore great news that the last Freebase dump has a new home at :BaseKB.

Some of the Freebase data is waiting for "approval" hidden in Wikidata. It is part of the Primary sources tool experiment and if you care to "curate" this data, you have to first enable a gadget. When this is not done, it is likely that the same data will be added because there is no other way to learn that the data is waiting to be approved. To add insult to injury, when data is added in this way, it does not report Freebase as the source..

This would not have happened when Wikidata was true  to the Wikimedia mantra of "sharing in the sum of all knowledge". I urge the powers that be at Wikidata to consider the following remedial steps:
  • add Freebase as the source of data imported from Freebase
  • report on the process of "curating" Freebase data
  • when a link between Wikidata and Freebase is known, add missing statements
  • add data from Freebase where there is none at Wikidata, data can be merged later when need be
  • consider tools that allow for the selection of data based on its source for further curation and conversion
  • invest in tools for people working on the inclusion of data from the "Primary sources tool" and from other sources
  • compare data from sources and report on where there are differences. This is where time of our community is well spend
  • consider a tool where the statements of Wikidata are easily available to Wikipedia editors. It may help them with additional information and it will help us in them curating this data
Most important, Wikidata is of particular importance because it links articles from Wikimedia projects. This is where it shines. The quality of its statements is debatable but improving. By including the lovingly maintained data from Freebase, we may expand our community and we do expand both the quantity and quality of the statements. 

Finally Wikipedians love their sources and they are important. When Wikipedians can easily compare the information that is in statements and in the text, we will find them more involved when there is more to curate. Given that Wikipedians get involved in this way, the sources are implicitly known through Wikipedia. This is yet another argument to make haste by including missing data, available from Freebase and other sources.

