Monday, June 24, 2013

Homo sapiens anyone?

There is a big thing about bot generated articles based on the taxonomy of species. My take is that this whole discussion is of the rails. It is off the rails because people do NOT understand the vagaries of taxonomy. It is off the rails because people forget what we aim to do; providing knowledge by sharing information.

I loved what Erik Zachte had to say; he wrote about uploading images to Commons and finding that there are no articles in any Wikipedia about the subject of the pictures. It changed his perspective on bot generated articles. When enough information is available to a bot, it can generate articles on 280+ Wikipedias. The alternative is not providing information on the subject in any language.

Another loud argument was about taxonomy; article nbr 1,000,000 on the Swedish Wikipedia is about a species that was recently renamed. As some people would have it, the information was no longer “valid”. One counter argument is, when people know a specimen by the “old name” there would be no information to be had. Another counter argument: from a taxonomy point of view validity of a name is only in the quality of the publication and as a consequence, the old name is valid. To make this point abundantly clear; Homo sapiens is what most people know for the taxonomical name for a human being. I am not completely sure and I care not that much but I seem to remember that “Homo sapiens sapiens” is what has been used more recently in taxonomy for us "thinking men".

Let’s cut the crap and analyse the situation:
  • Many Wikipedians hate stubs without any consideration for the opinion of others
  • Stubs, particularly well-designed stubs are an invitation to edit them
  •  Our prime objective is to provide information
  • In all the recent huha there has been little talk about technical possibilities

One solution for machine generated stubs is to have them in their own namespace and move them with the first human generated edit. This will not shut up all the detractors but it removes their arguments from them.

Another solution is to have the bots generate the information only when requested. It does not need to be saved, it only needs to be cached. Given that it is a bot generating the information, the script it uses can be translated for use in other languages as well.

Yet another solution is to have such scripts associated with Wikidata items. The information provided in this way would be truly complementary to what is available in Wikipedia. An added bonus would be that it will take away any room for Wikipedians to complain. Hm.. possibly, probably not.

Saturday, June 15, 2013

Red links in #Wikipedia categories are possible… use #Wikidata

I have been looking for lists on a given subject. My requirements are “simple”;
  • I want a complete list
  • I want to know if there is an article in a language on a Wikipedia for the items on the list
The first requirement excludes categories.

The second requirement is provided to some extend by categories. In this case I used information from the English language Wikipedia. There was an article on the subject as well and it contained a list and this list contained items that were not in the category. I read many of the articles to add statements on these persons and based on the articles I had to conclude that some of the category items and list items were wrong.

All this is to be expected; Wikipedia does not claim to be 100% correct. It is for the people working on the content to refine and improve the content. The funny thing is that by reading articles I found candidates for other “categories”.

To get a more complete list, you can iterate the process on other Wikipedias. In addition you can add items to Wikidata based on “external” information. It takes some effort and, in a perfect world you would be aware of any items that are already known in Wikidata.

A list that is compiled in this way is superior to what categories offer; you would have red links in many places but you can provide information using info-boxes using the Wikidata statements. 

Sunday, June 09, 2013

Thanks for all the fish

Milosh is leaving the movement. It takes me aback. Last time I met him was at the Amsterdam hackathon, we discussed several things that might help our shared dream of more linguistic diversity in the Wikimedia movement.
  • Wikidata is to be opened to any and all languages that are recognised as a language. This includes artificial languages and dead languages. Some people may find it controversial that we consider Wikidata to be a project in its own right and, the connection to Wikipedia incidental.
  • Wikisource should be one project like Commons and Wikidata. It would be best when all the instances of Wikisource are merged. Wikisource as we know it, is very much a workbench. Important is that people know how to use the tools. It makes no difference what language your user interface has, what matters is the language of the text. This can be any language
  • When the work is "done" on Wikisource, it should be advertised, it should be marketed, it should find a public. That is very much something best done OUTSIDE of Wikisource. 
Ah well, these things make sense, they will improve linguistic diversity and they will create an environment that will even stimulate the creation of more Wikipedias. The question is how to find the energy to make this happen.. The Dutch are known for their windmills, just like the Spanish. We will miss Milosh for this fight.

Tuesday, June 04, 2013

#OmegaWiki supports #OpenDyslexic

#OmegaWiki aims to be useful as a platform. Much of its functionality is in the extension that allows us to be special. The core functionality however is provided by MediaWiki. The Universal Language Selector is an extension of MediaWiki waiting in the wings to become available on all the wikis of the Wikimedia Foundation.

Part of the ULS is the use of webfonts and, OpenDyslexic is a font available for many languages that use the Latin script. Languages like English, French, German, Dutch... The font looks odd to most people but for many people who are dyslexic it makes text actually .. readable.

Monday, June 03, 2013

#Defending the Wiki in #Wikidata

When I was younger, #Wikipedia was the encyclopaedia everyone could edit. I could because it was a wiki. I could start new articles, make changes to articles and be happily productive in this way. Sure, sometimes I made mistakes but I was not the only one around and they were as happily productive as I was fixing things after me and others.

With the call for citations things became more formal but it is still possible to some extend to just write and edit articles.

Wikidata is about assertions. I love the definition of assertion: "A positive statement or declaration, often without support or reason". I love the definition because it is what allows Wikidata to be a Wiki. When an assertion can at first be added in Wikidata without support or reason, people can add assertions freely. Certainly when you assume good faith you will be happy when people do exactly this.

When people add information using bots, they have a source that provides them with the assertions. Quite often these assertions originate in one of the Wikipedias. Alternatively they come from sources that are happy to share their information. While I applaud the sharing of data between sources, I believe quite strongly that importing the structure and limitations of external sources will hamper the development of Wikidata.

I routinely add "main type (GND)" with the value "person" when an item in Wikidata is about a person. However, the other values that are associated with this "main type (GND)" are absolutely horrible to the extend of unusable. Adding a link to the GND database is how Wikidata can add value to its usefulness.

As Wikidata is a wiki, I use the attributes available to the extend they make sense to me. Many attributes are lacking and the procedures for getting attributes are not exactly easy or obvious. (I did request an additional attribute called "rada"). The point is that these lengthy procedures make Wikidata less of a Wiki.

Wikidata is a Wiki and consequently people are free to add "statements". Adding a requirement of sources to any and all assertions are absolutely counter productive because we can only improve assertions once they have been made. External requirements like this will effectively kill a Wikidata community. It will also ensure that the Wiki part of Wikidata is a lie.