Sunday, April 20, 2014

#Wikidata - its sex ratio

In a perfect world, Wikidata knows the sex for each person where Wikipedia has pertinent information; every Wikipedia. In a perfect world you query Wikidata for the sex ratio of each Wikipedia.

As we know, the world is not perfect; Wikidata currently knows about 1,332,383 "humans"  760,616 are male and 154,455 are female. This makes for 57% males,  12% females and 31% unknowns. Many items still need to be identified as human as well.

With a selection like the 12,800 known Harvard alumni, we find that there are 5,359 males and 840 females. This is 42% male, 7% female and 51% unknown. Before we compiled these numbers missing items were created for each known alumni and all of them were made human and a Harvard alumni as well.

The problem Wikidata faces is not only with the under representation of women, the problem is with the lack of data about the gender of known humans. The nice thing about statistics is that now that we have some numbers, we can track how Wikidata evolves in its information about the sexes.
