Saturday, March 16, 2019

#Sharing in the Sum of all #Knowledge from a @Wikimedia perspective I

Sharing the sum of all knowledge is what we have always aimed for in our movement. In Commons we have realised a project that illustrates all Wikimedia projects and in Wikidata we have realised a project that links all Wikimedia projects and more.

When we tell the world about the most popular articles in Wikipedia, it is important to realise that we do not inform what the most popular subjects are. We could, but so far we don't. The most popular subjects is the sum of all traffic of all Wikipedia articles on the same subject. Providing this data is feasible; it is a "big data" question.

We do have accumulated data for the traffic of articles on all Wikipedias, we can link the articles to the Wikidata items. What follows is simple arithmetic. Powerful because it will show that English Wikipedia is less than fifty percent of all traffic. That will help make the existing bias for English Wikipedia and its subjects visible particularly because it will be possible to answer a question like: "What are the most popular subjects that do not have an article in English?" and compare those to popular diversity articles.

In Wikidata we know about the subjects of all Wikipedias but it too is very much a project based on English. That is a pity when Wikidata is to be the tool that helps us find what subjects people are looking for that are missing in a Wikipedia. For some there is an extension to the search functionality that helps finding information. It uses Wikidata and it supports automated descriptions.

Now consider that this tool is available on every Wikipedia. We would share more information.With some tinkering, we would know what is missing where. There are other opportunities; we could ask logged in users to help by adding labels for their language to improve Wikidata. When Wikidata does not include the missing information, we could ask them to add a Wikidata item and additional statements, a description to improve our search results.

This data approach is based on the result of a process; the negative results of our own Search and it is based on active cooperation of our users. At the same time, we accumulate negative results of search where there has been no interaction, link it to Wikidata labels and gain an understanding of the relevance of these missing articles. This fits in nicely with the marketing approach to "what it is that people want to read in a Wikipedia".
Thanks,
      GerardM

Saturday, March 09, 2019

A #marketing approach to "what it is that people want to read in a @Wikipedia"

All the time people want to read articles in a Wikipedia, articles that are not there. For some Wikipedias that is obvious because there is so little and, based on what people read in other Wikipedias, recommendations have been made suggesting what would generate new readers.This has been the approach so far; a quite reasonable approach.

This approach does not consider cultural differences, it does not consider what is topical in a given "market". To find an answer to the question: what do people want to read, there are several strategies. One is what researchers do: they ask panels, write papers and once it is done there is a position to act upon. There are drawbacks; 
  • you can only research so many Wikipedias
  • for all the other Wikipedias there is no attention
  • the composition of the panels is problematic particularly when they are self selecting
  • there are no results while the research is being done
The objective of a marketing approach is centered around two questions: 
  • what is it that people are looking for now (and cannot find) 
  • what can be done to fulfill that demand now
The data needed for this approach; negative search results. People search for subjects all the time and there are all kinds of reasons why they do not find what they are looking for.. Spelling, disambiguation and nothing to find are all perfectly fine reasons for a no show. 

The "nothing to find" scenario is obvious; when it is sought often, we want an article. Exposing a list of missing articles is one motivator for people to write. Once they have written, we do have the data of how often an article was read. When the most popular new articles of the last month are shown, it is vindication for authors to have written popular articles. It is easy, obvious and it should be part of the data Wikimedia Foundation already collects.. In this way the data is put to use. It is also quite FAIR to make this data available. 

For the "disambiguation" issue, Wikidata may come to the rescue. It knows what is there and, it is easy enough to add items with the same name for disambiguation purposes. Combine this with automated descriptions and all that is requires is a user interface to guide people to what they are looking for. When there is "only" a Wikidata item, it follows that its results feature in the "no article" category.

The "spelling" issue is just a variation on a theme. Wikidata does allow for multiple labels. The search results may use of them as well. Common spelling errors are also a big part of the problem. With a bit of ingenuity it is not much of a problem either.

Marketing this marketing approach should not be hard. It just requires people to accept what is staring them in the face. It is easy to implement, it works for all the 280+ language and it is likely to give a boost to all the other Wikipedias but also to Wikidata.
Thanks,
        GerardM