Saturday, September 26, 2009

Wikimedia Staff office hours with Sue Gardner

Yesterday night, the first Q&A session on IRC was held with Wikimedia Foundation Office staff and Sue Gardner, the WMF director was available for questions. Such meetings are interesting because it as informative about our community because of the questions asked as because of the answers given..
" is the largest project and the breadwinner. It can go first"

"is it not a goal of the foundation to try to at least survive?"

I asked one question: "How important is growing traffic for our smallest projects". The answer that was given was more about the language versions of Wikipedia then about the smaller projects.. When you consider that Wiktionary is the biggest lexical resource on the Internet it is clear that an answer is of interest.

The answer I got was interesting: "I think our obligation is to focus our energy, for the most part, on the projects that have the greatest potential. And by "potential," I would mean the projects where there is a very large available readership (speakers of the language, internet-connected, literate). I guess I feel we have an obligation to focus our energy where it will make the most difference -- where there is enormous potential. That is easier said than done.."

My question was about growing traffic and it is interesting that the Russian Wikipedia traffic grew 117% in a year. There was a reaction in the chat that this was due to bot generated articles and while this may be true, the number of new editors is also much bigger for the Russians. The Russian localisation is also complete so as I see it, they are really doing well where it counts.

So when potential is the key consideration, the English Wikipedia grew 13% on a yearly basis in traffic and ranked nbr 155 in growth. The Swahili Wikipedia grew 150% and is ranked nbr 10. Obviously growth in percentage is not growth in absolute numbers this is best understood when you realise that all Wikipedias together grew 15%.

There is a potential for growth for the Wikipedias in other languages because they do not cover the most wanted articles yet. We do not even know what people want to read. Concentrating on what people want to read is in my view with improved localisation what will realise the growth of our other Wikipedias.


Erik Zachte said...

Gerard, you bring up this point again and again, that we don't know what people want to read, that is not already there. (We do know what they want to read through Domas' page requests counts, your point is we don't know which page requests do not lead to an article view).

It would be interesting to know, no doubt, but then? Most editors write about things that interest them, not about things because other people are interested in them. In fact the lists of most requested articles based on red links serve your purpose in large wikis, don't they?

On small wikipedias even more I would welcome any editor without telling them all the time what they need to do. So why do you bring this up so often?

GerardM said...

Erik, when you know:
* What the most people did not find last month
* What new articles were created last month
* What new articles were most popular in the first month
*When you aggregate the traffic of new articles sorted by author

With all this information you can rank people by traffic of new articles. Such data allows people to work on be competetive in a positive way. Positive because it shows who makes the biggest positive benefit to the project using a metric that people will recognise as relevant.

Erik Zachte said...

I can see your point a bit better now. Although it turns out to be part of a more complex issue than what you disclosed so far.

I wonder would people inclined to enter in this competition rush to be the first to create an article after a major world disaster ;-)

I expect people would ask why award creation of an article more than major expansions, etc. necessitating an evaluation scheme to rate edits.

In its entirety this stat would involve a lot of scripting and processing: check for each article request existence against an offline list of all articles (use of api would be throttled very soon ; for each article request page view count (match huge files) ; for each author assign points to their edits ; mix and match.

It terms of costs vs benefits I would not advocate putting this high on a priority list for stats work, but that is my private opinion. Cheers, Erik