Saturday, March 16, 2019

#Sharing in the Sum of all #Knowledge from a @Wikimedia perspective I

Sharing the sum of all knowledge is what we have always aimed for in our movement. In Commons we have realised a project that illustrates all Wikimedia projects and in Wikidata we have realised a project that links all Wikimedia projects and more.

When we tell the world about the most popular articles in Wikipedia, it is important to realise that we do not inform what the most popular subjects are. We could, but so far we don't. The most popular subjects is the sum of all traffic of all Wikipedia articles on the same subject. Providing this data is feasible; it is a "big data" question.

We do have accumulated data for the traffic of articles on all Wikipedias, we can link the articles to the Wikidata items. What follows is simple arithmetic. Powerful because it will show that English Wikipedia is less than fifty percent of all traffic. That will help make the existing bias for English Wikipedia and its subjects visible particularly because it will be possible to answer a question like: "What are the most popular subjects that do not have an article in English?" and compare those to popular diversity articles.

In Wikidata we know about the subjects of all Wikipedias but it too is very much a project based on English. That is a pity when Wikidata is to be the tool that helps us find what subjects people are looking for that are missing in a Wikipedia. For some there is an extension to the search functionality that helps finding information. It uses Wikidata and it supports automated descriptions.

Now consider that this tool is available on every Wikipedia. We would share more information.With some tinkering, we would know what is missing where. There are other opportunities; we could ask logged in users to help by adding labels for their language to improve Wikidata. When Wikidata does not include the missing information, we could ask them to add a Wikidata item and additional statements, a description to improve our search results.

This data approach is based on the result of a process; the negative results of our own Search and it is based on active cooperation of our users. At the same time, we accumulate negative results of search where there has been no interaction, link it to Wikidata labels and gain an understanding of the relevance of these missing articles. This fits in nicely with the marketing approach to "what it is that people want to read in a Wikipedia".
Thanks,
      GerardM

No comments: