Thursday, December 29, 2016

#Wikidata - Khagan of the Rouran

A great sign that Wikidata gains traction in other languages; much of the data for Yujiul├╝ Shelun, Khagan of the Rouran from 402 to 410, does not have labels in English. When the idea is to include all these Khagans of the Rouran it becomes a challenge. The English article does have many names but do they fit what is already there for other languages.

The challenge is to do good and bring things together. It is relevant to have all the right items properly connected. One thing that is missing; the item for Khagan of the Rouran. That is easily fixed.

Tuesday, December 27, 2016

#Wikimedia and the "official point of view"

One of the pillars of #Wikipedia is its Neutral Point of View (NPOV). The point is that we should not take sides in an argument but should present arguments from both ends and thereby remain neutral. The problem is what to do when arguments are manifestly wrong. When science repeatedly shows that there is no merit in a point of view.

What to do when it is even worse, when science is manipulated to show what is of benefit to some. When the Wikimedia Foundation had its collaboration with Cochrane, it was onto something important. Cochrane is big on debunking bad science.

The new government of the USA has a reputation that precedes its actions. It already states that science is bad. It will state its point of view. They will argue that it is good for all but how will they substantiate this? In the mean time much of what science said so far will remain standing. The snake oil salesmen will try to sell you their product and I wonder how it will find its way in Wikipedia. Will we look at science and will we resist the snake oil?

Monday, December 26, 2016

#Wikidata - #caste and how to include it in Wikidata

With all respect to cultural heritage, forcing people to be included in any caste is a form of discrimination. In an article about Nangeli, a woman of the Nadars, it becomes clear how important it is to understand its history

The Nadars are a heterogeneous group, comprising people of diverse standing. When in a school curriculum the story of Nangeli was included, it did not do justice to this diversity.

The problem with discrimination is that it has to be simple or it is not understood. It is how I interpret why it was pulled from the curriculum. This whole notion of the impossibility of there being one simple caste system is expressed well in the Wikipedia article on a historic article on the Nadars; the Sivakasi riots: "This belief, that the Nadars had been the kings of Tamil Nadu, became the dogma of the Nadar community in the 19th century". It casts doubt on schema where castes are expressed in a simple way.

What we can do is linking what we know is related. Link historic facts associated with class and castes. But it starts with making the effort.

Sunday, December 25, 2016

#Wikidata - the grandson of King Thibaw

When the BBC writes an article about royalty, it makes sense for both Wikipedia and Wikidata to have correct information available.

Descendants of king Thibaw Min it helps when it is known that this king was part of the Konbaung Dynasty and that a dynasty is not a country.  This is relevant because any claim to Myanmar is based on being part of that dynasty.

It is simple; dynasty is family. It is why Mr Trump and his offspring are factually a business dynasty.. When we are to get our facts straight, it makes sense to understand such basics. A dynasty can lose control over its "assets" but it remains a family.

Historically there have been many families with claims to a crown. Understanding such a claim is of interest and it is relevant to know the history of the whole world. History is not only lived in the western world.'

Tuesday, December 20, 2016

#Wikidata - a country is not a dynasty

When a "country" comes into being, it is after a struggle. In the same way when a "country" comes to an end, it is after a struggle. The same is true for dynasties; when a royal line comes to a start or an end, it is not without a struggle. However sometimes in a country there is continuation and one dynasty follows a previous one. Several dynasties succeeded each other in the Delhi Sultanate. The "country" finally ended with the last of the Lodi dynasty.

So when a country knows only one dynasty and starts and ends with that dynasty, it does not make the dynasty the country. Making up a name for a country is easy; when these monarchs are called "king" it is a kingdom, when they are a "sultan", it is a sultanate.

If there is one drawback, it is that there might be a name for that country in the languages of the people who were linked to it. For this reason all the countries that I am about to create may be prime suspects for a merger.. The item, not the country :)

Saturday, December 10, 2016

#Wikidata - Sembiyan Mahadevi - is it a title or is she a queen?

Queen Sembiyan Mahadevi was the spouse of  Gandaraditya, her son was Uttama Chola. Many of the Chola queens who followed her used "Sembiyan Mahadevi" as a title. This is what the English article tells us.

To really accept that it was a title, a source would help. It would be cool to have a list of all the people who used the title and it would be good to separate the person from the title in separate articles. It seems that the Tamil article is more substantial but as I do not read Tamil and Google translate does not help me sufficiently to understand what it says. 

Queen Sembiyan Mahadevi matters not only because she is important in the Chola dynasty but also because of the relevance she has in Tamil culture. Her father was a Mazhavarayar chieftain but Wikipedia does not know about them. 

When Wikidata knows about Indian nobility, its dates and connections, it becomes a resource that is helpful. Once her father has a name and it is clear what is meant by a "Mazhavarayar chieftain", slowly but surely it becomes clear who ruled where and who were contemporaries. It would be cool when Wikidata allows for a query that shows a "monarch" and shows fellow monarchs in neighbouring countries. 

Thursday, December 08, 2016

Was Cezhiyan Cendana a Pandyan king?

There is no way for me to find out if Cezhiyan Cendan was a Pandyan king or not. The only source I can find is a blog saying so. The problem is that texts in Wikipedia make me doubt. The text in the article for Maravarman Avani Culamani states that he is succeeded by his son Jayantavarman.

One fun fact is that templates do not have sources. It is however what I base information on when I add information to Wikidata. The other interesting point is that dates given are overlapping to the extent that they are not reliable.

So this is where we get into a problem. When information is good enough for a Wikipedia, is it good enough for Wikidata. More importantly is the question how do we curate information like this in a way that helps us all?

Wednesday, December 07, 2016

A Pandya King did not rule #India

The Pandyan Kingdom existed for some fourteen centuries; for many of the kings not much is known; A template contains much of what is known about them; not much.

Arguably; having this information in Wikidata serves a purpose. The information can be curated by people who know about the Pandyan kings and there are several things that they could do.
  • Some of the names of kings seem to be incorrect, certainly inconsistent.
  • The names of these kings can be added in the original language
  • Dates may be added to the period these kings were king
  • The data can be used in one of the other Wikipedias that are relevant in India.
One funny fact is that for all these kings it is impossible to have been a citizen of India. They were citizens of the Panyan kingdom. Many of such facts were added by bot and, it reflects factoids that exist in Wikipedias. It is just wrong.

Tuesday, December 06, 2016

#Research to help #Wikipedia do better

It is one thing to bemoan everything that is problematic with research, it is another to do better. For research on Wikipedia to be published, it has to be about "English" OR it has to be linked to English OR publication is not the end goal.

At the Dutch Wikimedia Conference Professor de Rijke gave the keynote speech. He spoke about the kind of research he is into and he spoke about "Wikipedia" research performed at the University of Amsterdam. He challenged his audience to cooperate and his challenge resulted in me formulating ten proposals for research. The point of these proposals is that I hope they do provide more worthwhile insight and includes a link to “English” in order for it to be published.
  1. Previous research, studied how long it took for a subject to appear in English Wikipedia after it was first mentioned in the news / social media. The new question would be: how long does it take for the same subject to appear in any Wikipedia and, how long does it take and to what extend does it happen for those articles to get corresponding articles in other Wikipedias and how long does it take for the English Wikipedia to take notice?
  2. In the search engine for Wikidata we use the description to help differentiate between homonyms. There are two approaches to a description; many existing descriptions are not helpful and hardly any items have texts exist in all of the 280 languages. There are however automatically generated descriptions. The question is: what do people like more, the automated descriptions or the existing questions? Is there a real difference for people who use Wikidata in English as well?
  3. Many people know their languages, this is obviously true for readers of Wikipedia. For the regulars there is a “Babel” template that allows them to indicate what languages they know. For the others for some purposes geo-location is used to make a guess. Do people find it useful to have it indicated that articles exist in the languages they know in search requests? Does it make a difference that a quality indicator is set for those other texts on the same subject?
  4. Many people make spelling errors when they search for a subject or when they create a wiki link to another subject. Google famously suggests what people may be looking for. We can expand the search and include items from Wikidata (40% increase in reach) but we can also use Google or any other search engine to help people get to the sum of all knowledge. We can ask people to answer some questions after they are done. Are people willing to do this and how does it expand our range of subjects that we know about. Are people willing to curate this information so that we can expand Wikidata and at least recognise the subjects we have no articles about?
  5. When we show the traffic for the articles people edited on in the last month, we gain an insight in what people actually read. We also congratulate people on the work they did and show appreciation. Does this kind of stimulus stimulate more articles? How do you stimulate for subjects that people hardly read (eg Indian nobility).. Do you compare with existing articles in the same category?
  6. There have been several Wikipedias that include bot generated texts. It is a famously divisive issue in the Wikipedia community. There has been no research done on this. With Wikidata there is an alternative way to exploit the underlying data. When the data is included in Wikidata, it is possible to generate text on the fly. This data may be cached for performance issues but there are two main advantages; both the script and the data can be updated. The question is: does it serve a purpose for our readers? Will editors update the data or the script to improve results or will they use the text as a template for new articles? Will it take the heat of the argument of generated texts? How will it affect projects that were not part of the existing controversy and does it work for them?
  7. Wikidata does not allow for the dating of its labels. It follows that it is not easily understood what the relation is between Jakarta and Batavia. How are such issues generally stored as data and what alternatives exist for Wikidata. How does it improve the usefulness of Wikidata as a general topic resource?
  8. Wikidata now includes data from sources like Swiss-Prot. What are the benefits to both parties? Does it make for people editing this data at Wikidata and what is the quality of such edits? Does it get noticed by Swiss Prot and is there a cooperation happening? How is this organised and to what extend does “the community” interfere with the notions of academia? Do such communications exist or are these groups doing “their own thing”?
  9. What is the effect on the ultra small Wikipedias when generated texts are available based on available labels.. Does it mean more interest in creating the templates for articles and work on labelling? What does it mean when such generated articles are available to search engines?
  10. At this time many articles in the English Wikipedia are written by students, university students. The result is positive on many levels but the question is, is what they write understood by Wikipedia readers? When students write their articles, it is mostly based on literature. It is well known that the bias in scientific papers is huge. Negative results are not published and many results from studies are ignored. The question would be: is sufficient weight given to debunking studies or are they put aside with an argument of a “neutral point of view”. This would make sense when students are graded on what they write given accepted fact on the university.