Friday, November 22, 2013

More of that heady stuff

The blogpost about the "heady stuff of Wikidata" proved interesting. Many people read it and recently there was a lengthy reaction from Mike Bergman. It is quite interesting because he indicates that the work he does may be used for Wikipedia and Wikidata as well.

So have a read of what Mike had to say. <grin> Maybe there will be a follow up </grin> ...

Hi Gerard,
Sorry my comments on this are coming in a bit late; I only recently stumbled upon your thoughtful post. 
For some time, as one of the editors of UMBEL, I have been convinced that much could be done to improve the category structure of Wikipedia. In fact, our most recent formal release of UMBEL has mappings to about half of the content of Wikipedia, most done by hand or using heuristics. Our hope in doing this work is to provide consistent organization, faceting, inference and semantic search over Wikipedia's contents. The extension into Wikidata is only raising the importance of a tractable upper structure. 
Though there are literally hundreds of hours in our effort to date, it is not easily repeatable nor scalable. Further, there are many of the longer-tail concepts in Wikipedia for which heuristics and manual approaches simply are not appropriate. 
Our efforts in this area have led us to look at many alternative mapping approaches, many of which we have tried ourselves and tested internally. 
The best that we encountered that moved in the right direction was Aleksander Pohl's work with OpenCyc and Wikipedia. After about a year of discussions, we have just reached a formal agreement with Aleksander and his team to extend his prior efforts, only focusing on UMBEL as the mapping target this time. (Though the same new approaches may again be extended to OpenCyc.) 
The market will likely tell us if small structures (as you cite), medium structures (such as the 28 K concepts in UMBEL), larger structures (the 300K or so in OpenCyc), or other structures (such as SUMO or whatever), will best fit the bill of better organizing Wikipedia content and making it tractable. We are committed to placing our approach into the public domain soon. 
We should have some results to share on our UMBEL-Wikipedia mappings shortly (weeks, not months). We will do what we can to announce this availability, and we look forward to your and the community's comments and response to this effort. We think we will have a winning approach, but only the market will tell! 
Good luck on your own efforts, and let me know if we can be of any assistance. 
Best, Mike
