Sunday, October 17, 2010

Googlers, how about doing a 100% job as a 20% job

Biographies of Living Persons is one of the hot spots on #Wikipedia. Having it wrong for some of them happens regularly if only because of vandals and trolls. A lot of effort goes into reverting such nonsense. It gave rise to the development of Flagged Revisions, a project whose consensus based specifications are considered schizoid.

The university of Amsterdam found that changes in Wikipedia do not get reliably into the Google search engine. This is made worse by the fact that updates happen for many days. This is particularly bad when the work of a troll or a vandal is available on Google or one of the other search engines.

Neil Kandalgaonkar, a contractor who IS on the Wikimedia Foundation staff page, answers that "it is not clear to me that this is where the foundation or the community should spend its (very limited) time and resources".

There are however two sides to this story and, Google itself is one. They can be even more active in scanning the changes at all our Wikipedias. This may not provide us with an optimal solution; our computer costs do come at a price as well. Neil indicates that implementing existing protocols will provide a partial solution.

A proper solution could be a more direct connection between Wikipedia and search engines. This would improve a more synchronised coverage of Wikipedia. Given that Wikipedia articles are often selected by people searching the web, it is in the interest of Google, Bing et all to make this happen.

1 comment:

TheDJ said...

Well there is a very easy solution to this, and it has been suggested before.

When a page is edited, a signal is sent to all the squid servers to invalidate our current cached copies of a page. This is a multicast UDP feed to the squid servers. With some work, that signal could be bridged to google and bing, who could then use it to "more accurately" determine when it would be a good time to update their search engine.

In a similar way, you could create a "priority feed", where wiki admins could send a specific "poke" to the search engines, to let them know that a specific page really needs to be updated. This would require a collaboration of multiple partners, but would not be particularly difficult.