#Wikidata needs more data to gain relevancy. At issue is where to get the data from. Some argue that it is to be preferred to retrieve data from the many #Wikipedias. This would be acceptable when the technology exists to retrieve the data.
The current pywikipedia functionality breaks easily and it does not take in all the available data. Consider Mr Andris Bērziņš who is the current president of Latvia. As you can read in the infobox, he was a member of the Communist Party before 1990. The bot will only identify him as a member of the Communist party of the Soviet Union. It does not add the qualifier (Before 1990). Mr Bērziņš is currently not affiliated with any political party.
The best way to present this information in Wikipedia is probably to have the most relevant information at the top. That would have helped the bot. For the bot it would be good if it accepted all the information until the next label. When the information that cannot be parsed goes into an "error file", the qualifiers can be added later by hand.
As it is not possible to reliably retrieve data from the Wikipedias, the arguments against using data from DBpedia lose their relevancy. This data is available for use warts and all and is a big improvement over the current lack of data and the inability to get the data out of the Wikipedias.
Thanks,
GerardM
No comments:
Post a Comment