Words and what not: October 2017

Tuesday, October 24, 2017

#Wikipedia - Student or Athlete or both ?

College football, soccer, basketball, whatever is a USA phenomenon where young people attent a college or a university on a sports scholarship.

Wikipedia has categories for the different sports and members of such teams are these categories are typically a subcategory for the alumni of a college or university.

For Wikidata the alumni are typically harvested from the specific alumni catalogs and as a consequence it is as if all the athletes did not have an education.

My question, how can we best associate these college/university categories with the alumni categories?
Thanks,
GerardM

Sunday, October 22, 2017

#Katherine - When the Cebuano issue is no longer about #Wikipedia

Dear Katherine, I loved your presentation at the Berkman Klein Center for Internet and Society. It has much to think about and <grin> it is great that you answer the question you want to answer </grin>.

You address questions like "will we let external organisations use our data for their own purposes". My suggestion to you, us all, is why not use our own data for our own purposes.

The Cebuano Wikipedia is seen as problematic on many levels. It is one of the biggest Wikipedias in number of articles and one of the smallest in the size of its community. Like any Wikipedia, its articles are harvested for use in Wikidata and that brings us to several problems but more importantly in the light of your presentation, opportunities.

Problem: the data used the Cebuano articles are based is problematic
Opportunity: import the data in Wikidata first and first do some curation there.

Problem: the data is licensed under a CC-by-sa license and Wikidata is CC-0
Opportunity: collaborate with the copyright holder and ask their permission to include the data in Wikidata

Problem: when text is generated by a bot, the text when saved in an article is fixed
Opportunity: do not save it as an article but generate the text and maybe cache the text

Problem: other organisations use our data to generate information
Opportunity: we generate information in all the 300 languages where Wikipedia does not have an article

Problem: we have information that has no article in any language
Opportunity: we generate the text and maybe cache the text

Problem: Wikimedia officials indicated that issues like the Cebuano Wikipedia are not relevant
No opportunity; opportunities for all our projects are missed

Katherine, we already generate texts using bots, we already cache our data, we do it for English, we do it for Swedish, Cebuano. Why leave it for the companies of our world to generate text where there is already so much? We can do better, do the same and do it for all our languages as well.
Thanks,
GerardM

Saturday, October 21, 2017

#Wikidata - just an award winner: Mr Shuming Nie

Mr Shuming Nie is the 2007 winner of the Heinrich Emanuel Merck Prize. As such he was notable for inclusion in Wikidata.

A Wikipedia stub article was created. The article makes it plain that Mr Nie was a serial awardee and when you google Mr Nie, you find for instance the picture you see above. Mr Nie is one of many award winners that are "waiting for the recognition" of a Wikipedia article. By having these award winners in Wikidata, it becomes more easy to find people like someone you care for waiting for an article.
Thanks,
GerardM

Sunday, October 15, 2017

#Wikidata - motivation; thank you #Magnus

I added a Baratunde A. Cola to Wikidata because he won the Alan T. Waterman Award. This month a Wikipedia article was written and I wanted to add some data to the item.

I did not because functionality that is key to me was broken. A new property was added and all the work that I had done on categories no longer showed in Reasonator. There was no willingness to consider the consequential loss of functionality and the result was a dip in my motivation.

Wikidata is important to me and I asked Magnus if he would help out and change Reasonator. He did.

Now I have added information to Mr Cola based on his categories. It matters that a category like this one reflects all the people known to have played in the Vanderbilt Commodores football team.

The issue is that at Wikidata, we have lost sight of these collaborative aspects. Everybody does his own thing and we hardly consider why. It is why user stories are so important; they tell you why something is done and what the benefit is. In the end without a benefit there is no reason to do it.
Thanks,
GerardM

Thursday, October 12, 2017

#Wikisource - the proof of the pudding

A user story for Wikisource could be: As Wikisourcerers we transcribe and format books so that our public may read these books electronically.

The proof of the pudding is therefore in the people who actually read the finished books. To future proof the effort of the Wikisourcerers, it is vital to know all the books that are ready for reading. It is vital to know this for books in any and all languages supported.

There are two issues:

The status of the books is not sufficiently maintained in all the Wikisources
There is no tool that advertises finished books

To come to a solution, existing information could be maintained in Wikidata for all Wikisources in a similar way as done for badges. With the information in Wikidata a queries can be formulated that shows the books in whatever language, by whatever author.

Currently there are Wikisources that do not register this information at all. This does not prevent us from making the necessary steps towards a queriable solution. After all adding missing badges at a later date only adds to the size of the pudding, not to the proof of the pudding.

Thanks,

GerardM

Tuesday, October 10, 2017

#Wikipedia discovers #OpenLibrary

On Facebook, Dumisani Ndubane posted his discovery of Open Library:

I just discovered that The Internet Archive has a book loan system, which gives me access up to 5 books for 14 days. So I have a library on my laptop!!! This is awesomest!!!

And it is. Anybody can borrow books from the Open Library (is is part of the Internet Archive). What Dumisani did not know at the time is that there are books in other languages to be found as well.

Dumisani found out by accident; he googled for an ebook called "Heart of darkness" by Joseph Conrad. What Dumisani did not know at the time is that the Open Library includes books in many languages. His next challenge: find the books in Xitsonga, and tell his fellow Wikipedians about it.
Thanks,
GerardM

Wednesday, October 04, 2017

#Wikimedia - A user story for libraries

The primary user story for libraries is something like: As a library we maintain a collection of publications so that the public may read them in the library or at home .

Whatever else is done, it is to serve this primary purpose. In the English Wikipedia you will find at the bottom for many authors a reference to WorldCat. WorldCat is to entice people to come to their library.

It does not work for me.

My library is in Almere and, I have stated in my profile in WorldCat that I live in Almere, I have indicated that my local library is my favourite. WorldCat indicates that the Peace Palace Library is nearby.. It isn't.

When it does not work for me, it does not work for other people reading Wikipedia articles and consequently it needs to be fixed. So what does it take to fix WorldCat for the Netherlands; for me. WorldCat is used for a wordwide public and all the libraries of the world may benefit when WorldCat gets some TLC.
Thanks,
GerardM

Monday, October 02, 2017

#Wikipedia - A user story for WikipediaXL: an end to the Cebuano issue

The user story for #Wikimedia is something like: As a Wikimedia community we share the sum of all knowledge so that all people have this available to them.

As an achievable objective it sucks. The sum of all knowledge is not available to us either. To reflect this, the following is more realistic: As a Wikimedia community we share the sum of all knowledge available to us so that all people have this available to them.

When all people are to be served with the sum of all knowledge that is available to us, it is obvious that what we do serve depends very much on the language people are seeking knowledge in. What we offer is whatever a Wikipedia holds and this is often not nearly enough.

To counter the lack of information, bots add articles on subjects like "all the lakes in Finland". This information is not really helpful for people living in the Philipines but it does add to the sum of available information in Cebuano.

The process is as follows: an external database is selected. A script is created to build text and an infobox for each item in the database. This text is saved as an article in the Wikipedia. From the article information is harvested and it is included in Wikidata. One issue is that when the data is not "good enough", subsequent changes in Wikidata are not reflected in the Wikipedia article.

Turning the process around makes a key difference. An external database is selected. Selected data is merged into Wikidata. This data is used to generate only new article texts that are cached in all languages that have an applicable script. As the quality of the data in Wikidata improves, the cached articles improve.

With Wikipedia extended in this way, WikipediaXL, we become more adept at sharing the sum of our available knowledge. With caching enabled in this way, any language may benefit from all the data in Wikidata. It is considered important to consider the quality of new data. Data may come from a reputable source or from a source we collaborate with on the maintenance of the data. What is to be preferred is for another blogpost.