Sunday, March 02, 2008

Dump of the English language Wikipedia

After a long wait, a really long wait, it was announced that the dump of the English language Wikipedia has finally finished successfully. It took 58 days to complete and yielded a 17 gigabyte compressed file that will become some 2 terabyte when unpacked.

This news has led to a flurry of activities; Erik Zachte now has the opportunity to run his fantastic statistics again, the dbpedia people now have the opportunity to run their software on something new. Many studies are under way to analyse Wikipedia that have asked where they PLEASE can find the latest data ....

It is important that a dump has successfully finished. It is equally important that this process proves to be repeatable. Many people have said that Wikipedia is not really open content when you cannot get its data. It is great to see that all the effort to create a dump has finally succeeded and proves that Wikipedia IS open content.


  1. Is Erik Zachte running the numbers? I'm eager to see all the results from the last year and a half.

  2. I have been careful in my wording; Erik has asked for a server that will enable him to do his thing. Given the importance that is given to his work, he is likely to get this. :)
