According to the #ISBN standard, the "format/means of delivery are irrelevant in deciding whether a product requires an ISBN". However, it is often assumed that a publication requiring an ISBN number is a commercial publication. In the USA and the UK for instance you have to buy your ISBN number or bar code while in Canada they are free because Canada stimulates Canadian culture.
When a standard is not universally applied, it loses application. When all publications are not registered a national library will have to maintain its own system when it is to collect a copy of all publications. As a result the ISBN is dysfunctional as a standard because it does not function as a standard.
When the Wikisourcerers finish the transliteration of a book, it deserves an ISBN number and, national libraries should be aware of these publications. This recognises and registered the work done in the Open Content world. When these books are registered, all the Open Content projects may know that they can concentrate on another book or source.
As the ISBN does not register all publications, it does not do what it is expected to do; function as a standard.
Thanks,
GerardM
Saturday, June 30, 2012
Friday, June 29, 2012
Learning another language II
Learning to vocalise #Arabic can be done on the Internet. There are websites like Mount Hira where you find Arabic to the right, a transliteration to the left with an explanation in English underneath. What really helps is that you can listen to the recitation of a surah by line.
It is great but there is room for improvement. Improvement that can happen at Mount Hira but also at a Wikisource.
It is great but there is room for improvement. Improvement that can happen at Mount Hira but also at a Wikisource.
- Show all the text as text and not as graphics
- Allow for the Arabic text to be shown in fonts representing different writing styles
- Allow for the explanations to be shown in a language that can be selected
Learning to vocalise what you read is one use case, learning Arabic is another and learning the Koran is a third. Wikisource has the potential to be a place for all three objectives.
There are always people who are interested in reading the source documents about a religion, any religion. The great thing of Wikisource is that you can include the wikilinks explaining the terms that are ambiguous or obscure.
Typically these source documents are readily available and out of copyright. Including them in Wikisource will gain it more public. Wikisource is an obvious place because it has a reputation to keep up; the reputation that original documents are original.
Thanks,
GerardM
The Hebrew #Wikipedia is bigger than all of #Wikisource
When you compare the statistics for Wikipedia and Wikisource, the Wikipedia in Hebrew gets more eyeballs than all Wikisources combined. When you consider the potential public for both projects, the public for Wikisource is bigger by orders of magnitude.
On the main page of the English Wikisource it says: "We now have 783,122 texts in the English language library". It is less clear how many of these texts are in a final form; meaning ready for casual readers. The pages linked to the proof read tool indicate 691 completed pages and 3588 pages ready for proofreading. This does not imply that there are 691 text that are ready.
For a reader, the sources at Wikisource are works in progress. A text featured on the Wikisource main page, like the Celtic Fairy tales, is very much a Wiki page. The presentation of the text and illustrations is very much accidental. It is a bit sad is, but the pages where easier to read when they still needed to be proofread.
As you can see, the long lines below are not easy to read and the font used in the original book made the text more legible.
The Hebrew Wikipedia is easy to read. We can and should do better for Wikisource.
Thanks,
GerardM
On the main page of the English Wikisource it says: "We now have 783,122 texts in the English language library". It is less clear how many of these texts are in a final form; meaning ready for casual readers. The pages linked to the proof read tool indicate 691 completed pages and 3588 pages ready for proofreading. This does not imply that there are 691 text that are ready.
For a reader, the sources at Wikisource are works in progress. A text featured on the Wikisource main page, like the Celtic Fairy tales, is very much a Wiki page. The presentation of the text and illustrations is very much accidental. It is a bit sad is, but the pages where easier to read when they still needed to be proofread.
As you can see, the long lines below are not easy to read and the font used in the original book made the text more legible.
The Hebrew Wikipedia is easy to read. We can and should do better for Wikisource.
Thanks,
GerardM
Thursday, June 28, 2012
#ImpactOCR - A #font using #unicode private use characters
When really old historic texts are digitised and OCR-ed, the images of letters found are mapped to the correct characters. Characters are defined in Unicode and when a character is NOT defined, it is possible to define them in the "private use" space.
As part of the Impact project, really old texts have been digitised, texts in many languages. At the recent presentation it was mentioned by two speakers that there were characters used in the Slovenian and Polish language that are not (yet) defined in Unicode. As part of their project, the missing characters were defined in the Unicode private use area and the scanning software was taught to use them.
With the research completed, with the need for all these characters and their shape defined, it will be great when these characters find their way in Unicode proper. When the code points for the missing characters are defined and agreed, the OCR software can learn to recognise the characters at the new code points, a conversion program can be written for the existing texts and it will be more inviting to include these characters in fonts.
Now that the project is at its end, it is the right moment to extend the Latin script in Unicode even further.
Thanks,
GerardM
As part of the Impact project, really old texts have been digitised, texts in many languages. At the recent presentation it was mentioned by two speakers that there were characters used in the Slovenian and Polish language that are not (yet) defined in Unicode. As part of their project, the missing characters were defined in the Unicode private use area and the scanning software was taught to use them.
With the research completed, with the need for all these characters and their shape defined, it will be great when these characters find their way in Unicode proper. When the code points for the missing characters are defined and agreed, the OCR software can learn to recognise the characters at the new code points, a conversion program can be written for the existing texts and it will be more inviting to include these characters in fonts.
Now that the project is at its end, it is the right moment to extend the Latin script in Unicode even further.
Thanks,
GerardM
#ImpactOCR - Citing a #newspaper
#Wikipedia has this rule: "Citation needed". Much of the news is first published in newspapers. When a citation is needed about something that happened, it is in the newspaper where you will find it mentioned and there may be many chronological entries on the same subject describing how something evolves.
A lot of research and development has gone in the optical character reading of newspaper of the Impact project. As this project has ended and has evolved into a competence centre, its last conference was very much a presentation of what the project achieved.
From my perspective, it produced a lot of software much of it open sourced and all of it is implemented and embedded in the library, archive and research world. It is a world that finds its public for the work done in the Impact project very much in the research world. The general public can benefit as much, what has to be clear is how it could benefit.
Though Europeana newspapers some 10 million newspaper pages will be made accessible. These pages are scanned and to make them really useful they undergo optical character recognition. This is exactly where the Impact project has its impact; as the OCR technology improves, more words are correctly recognised and consequently more content of the newspapers can be discovered.
The results can be improved even further when the public helps train the OCR software recognise characters for specific documents. As citing sources for Wikipedia is an obvious use case for historic newspapers, there are many people who are willing to teach OCR engines to do a better job. For those articles that are found to be particularly useful, proofreading can improve the results even further.
With a public that is involved in improving the digitised and OCR-ed texts everybody will be a winner including the scientific research on these texts.
Thanks,
GerardM
A lot of research and development has gone in the optical character reading of newspaper of the Impact project. As this project has ended and has evolved into a competence centre, its last conference was very much a presentation of what the project achieved.
From my perspective, it produced a lot of software much of it open sourced and all of it is implemented and embedded in the library, archive and research world. It is a world that finds its public for the work done in the Impact project very much in the research world. The general public can benefit as much, what has to be clear is how it could benefit.
Though Europeana newspapers some 10 million newspaper pages will be made accessible. These pages are scanned and to make them really useful they undergo optical character recognition. This is exactly where the Impact project has its impact; as the OCR technology improves, more words are correctly recognised and consequently more content of the newspapers can be discovered.
The results can be improved even further when the public helps train the OCR software recognise characters for specific documents. As citing sources for Wikipedia is an obvious use case for historic newspapers, there are many people who are willing to teach OCR engines to do a better job. For those articles that are found to be particularly useful, proofreading can improve the results even further.
With a public that is involved in improving the digitised and OCR-ed texts everybody will be a winner including the scientific research on these texts.
Thanks,
GerardM
Monday, June 25, 2012
Learning another language
Learning to read #Arabic or more correctly Classical Arabic or "al fusha" is fun. I enjoy it for the challenge it provides, it helps because the many transliteration schemes suck and often confuse more than help.
When I started on this road to learn Classical Arabic, I was told that there is only one Arabic and I am also told that when you speak Classical Arabic in Arabic speaking countries, you are thought to be weird and often you find yourself not understood. I do not point out as often that the second assertion negates the first and consequently there are many different Arabic languages. I do not yet understand to what extend the Koranic Arabic language is different from what is labelled "standard" Arabic language.
At this time I am learning to pronounce fully annotated texts. When you are used to Latin script, one of the lessons is that a space is not necessarily the dividing line between words. This is hard because it clashes with how you have learned to perceive a word at a glance. Al Arabiya or العربية shows two spaces but for me it is one word. To complicate it further, the vowels are missing and you need to know the Arabic grammar and vocabulary to know how to pronounce it with certainty.
Add to this the different ways Arabic is written and printed and you will appreciate learning the Arabic script for the intellectual challenges it presents.
Thanks,
GerardM
Monday, June 18, 2012
#Batak - getting a #script ready for the #Internet
Several languages from Sumatra, #Indonesia were originally written in the Batak script. This script was encoded in Unicode and the waiting was for a freely licensed font. Thanks to a grant by the Wikimedia Foundation, a project is under way that will produce a font for Batak and will transcribe sources from Dutch museums like the Tropenmuseum.
For a script to be encoded into Unicode, a lot of research goes into describing the font and how it functions. For Batak you can find this documentation here.
When you read this, you find details like how the combination of a consonant and a vowel is expressed in the Batak script. When smart algorithms are used only the valid combinations will be expressed.
Given that multiple languages use the Batak script, the rules that are implemented in a font need to allow for every Batak language.
The first iterations of the Unicode font for Batak are being tested. When this is done, the font will be available and it will be possible to transcribe original Batak sources including a book on sorcery.
Thanks,
GerardM
For a script to be encoded into Unicode, a lot of research goes into describing the font and how it functions. For Batak you can find this documentation here.
When you read this, you find details like how the combination of a consonant and a vowel is expressed in the Batak script. When smart algorithms are used only the valid combinations will be expressed.
Given that multiple languages use the Batak script, the rules that are implemented in a font need to allow for every Batak language.
The first iterations of the Unicode font for Batak are being tested. When this is done, the font will be available and it will be possible to transcribe original Batak sources including a book on sorcery.
Thanks,
GerardM
Saturday, June 16, 2012
Wikis waiting to be renamed
Many a #Wikipedia was created before the language policy was created. One of the requirements of the language policy is that any request for a new Wikipedia is in a language that is recognised in the ISO-639-3 standard.
At that time several Wikipedias were created that were not recognised as a language. In the mean time several of these have been recognised as a language and as a consequence have their own code.
With the deployment of bug 34866, an important improvement has been realised; the content of the Wikis involved do now indicate correctly in what language they are written. This helps because there is now more content correctly available on the Internet.
It is a relevant step in the direction of giving many wikis the name we would like them to have.
Thanks,
GerardM
At that time several Wikipedias were created that were not recognised as a language. In the mean time several of these have been recognised as a language and as a consequence have their own code.
bat-smg -> sgs (wikipedia) fiu-vro -> vro (wikipedia) zh-classical -> lzh (wikipedia) zh-min-nan -> nan (wikipedia, wiktionary, wikibooks, wikiquote, wikisource) zh-yue -> yue (wikipedia)
With the deployment of bug 34866, an important improvement has been realised; the content of the Wikis involved do now indicate correctly in what language they are written. This helps because there is now more content correctly available on the Internet.
It is a relevant step in the direction of giving many wikis the name we would like them to have.
Thanks,
GerardM
Thursday, June 14, 2012
#Wikidata can get more facts out
The big discussions that are raging on mailing lists about the infoboxes that can be populated with data from Wikidata are amusing. Amusing because it has been made plain that this is not something that is planned for in the near future. Amusing because there is more to it.
Consider the following scenario; two Wikipedias do not agree on what source to use for a specific bit of information. The result is different data in the infobox. What is passed as a solution is to override things locally. Nice thought but it is the wrong approach; it is reasonable to assume that multiple Wikipedias opt for either option. Wikidata as a data repository is agnostic to such issues. It can happily store information from multiple sources and have multiple info boxes for the same category of subjects.
You may be able to override and maintain data on a local level. Doing so does not make it a best practice. Far from it, Wikipedia has this neutral point of view and Wikidata does change the rules. It has always been recognised that the NPOV is set by a language community and its prevailing wisdom is not necessarily neutral. The info boxes will not only bring consistency to the data, it will necessarily bring home the notion that facts have sources and many sources have a point of view.
Thanks,
GerardM
Consider the following scenario; two Wikipedias do not agree on what source to use for a specific bit of information. The result is different data in the infobox. What is passed as a solution is to override things locally. Nice thought but it is the wrong approach; it is reasonable to assume that multiple Wikipedias opt for either option. Wikidata as a data repository is agnostic to such issues. It can happily store information from multiple sources and have multiple info boxes for the same category of subjects.
You may be able to override and maintain data on a local level. Doing so does not make it a best practice. Far from it, Wikipedia has this neutral point of view and Wikidata does change the rules. It has always been recognised that the NPOV is set by a language community and its prevailing wisdom is not necessarily neutral. The info boxes will not only bring consistency to the data, it will necessarily bring home the notion that facts have sources and many sources have a point of view.
Thanks,
GerardM
Wikisourcery
Fairly often when you complain about issues Wikimedia, there is somebody who notices, who cares. This is the sourcery of the crowds, the magic of our communities. Compare the picture above with the picture below..
What a difference a day makes ..
Thanks,
GerardM
Wednesday, June 13, 2012
The eye on the prize
Noodlot, in translation "Fate" is a book by Louis Couperus. Couperus is one of the literary giants of the Dutch literature and, it makes excellent sense to make this book available to a reading public.
To this end, someone took the djvu file from the Gutenberg project and started the proof reading process at the Dutch Wikisource. This process is still ongoing, I did two pages and I regret it.
The regret is because the book has already been proofread at project Gutenberg. As the book is in the public domain, the proof read transliteration is also in the public domain.
What is the point ? Why not do another book ?
The aim of transliteration and possibly the editing of the lay out of the book serves one aim. Making it available to readers. It makes sense to do the proofreading once. There are many other books that are waiting to be digitised for a first time. There are plenty of other sources that are waiting, waiting in a library a museum an archive.
Let us not waste our efforts. Let us do things once and let us do them well. When we are done with the proofreading, the formatting we need to find a public appreciative of the work done. To make it attractive it helps when there is a lot to choose from and when it is available in the format expected by our intended public.
Thanks,
Gerard
To this end, someone took the djvu file from the Gutenberg project and started the proof reading process at the Dutch Wikisource. This process is still ongoing, I did two pages and I regret it.
The regret is because the book has already been proofread at project Gutenberg. As the book is in the public domain, the proof read transliteration is also in the public domain.
What is the point ? Why not do another book ?
The aim of transliteration and possibly the editing of the lay out of the book serves one aim. Making it available to readers. It makes sense to do the proofreading once. There are many other books that are waiting to be digitised for a first time. There are plenty of other sources that are waiting, waiting in a library a museum an archive.
Let us not waste our efforts. Let us do things once and let us do them well. When we are done with the proofreading, the formatting we need to find a public appreciative of the work done. To make it attractive it helps when there is a lot to choose from and when it is available in the format expected by our intended public.
Thanks,
Gerard
Playing the Wikisourcerer
For any #Wikisource, the proofread page extension provides a must have functionality. Given that it is used on every Wikisource, it is reasonable to expect consistent functionality. The screenshot below shows the the proofreading page for "Noodlot" a book by Louis Couperus. As you can see, the page numbers are in blue or red and they show the classic MediaWiki behaviour. When you compare this with any index page for proofreading on the English language Wikisource, you will find the numbers in multiple colours indicating its position in the proofreading work flow.
As Wikisource is primarily a workflow environment, it is crucial to have a complete implementation of the tooling. When asked, it was indicated that many of the small fixes happen exclusively on the English Wikisource. This lack of support for the proofreading extension elsewhere is an additional argument for doing away with all the single language Wikisources. When one Wikisource provides adequate tooling for the workflow, another wiki can be used to publish its finished content.
A Wiki publishing content in a final form can publish on behalf of any and all open content project that create finished products. Such an expanded project where finished content is marketed to our public will achieve multiple goals:
As Wikisource is primarily a workflow environment, it is crucial to have a complete implementation of the tooling. When asked, it was indicated that many of the small fixes happen exclusively on the English Wikisource. This lack of support for the proofreading extension elsewhere is an additional argument for doing away with all the single language Wikisources. When one Wikisource provides adequate tooling for the workflow, another wiki can be used to publish its finished content.
A Wiki publishing content in a final form can publish on behalf of any and all open content project that create finished products. Such an expanded project where finished content is marketed to our public will achieve multiple goals:
- it will make finishing projects more attractive
- with an enlarged catalogue it will attract more public
- it helps us achieve what the Wikimedia Foundation is committed to
Thanks,
GerardM
Tuesday, June 12, 2012
#Gutenberg; #Unicode is the new #ASCII
Project Gutenberg is the original project that makes books that are out of copyright available for reading to you. At this time they made some 39,000 books available for you to read. Through affiliated organisations, including Wikibooks and project Runeberg, there are some 100,000 books to read.
When the project started in 1971, ASCII was how text was digitally stored on a computer. It has been a guiding principle of project Gutenberg to store text in ASCII ever since.
When a book was written in Greek, the characters of the Greek alphabet were transliterated into ASCII and, at the time this made perfect sense. It made sense because ASCII represented the standard every computer understood.
Nowadays, modern computers use Unicode for the encoding of text. The first notions of Unicode came into being sixteen years after the start of project Gutenberg. Most scripts are defined in Unicode and modern software expects Unicode. Now, twenty five years later there are free fonts for all the important scripts. Transliteration into ASCII for all these languages makes the resulting product unusable for many people. Most people do not know how to read a text that is transliterated.
Is there still a point to stick to ASCII or can ASCII be safely replaced by Unicode?
Thanks,
GerardM
When the project started in 1971, ASCII was how text was digitally stored on a computer. It has been a guiding principle of project Gutenberg to store text in ASCII ever since.
When a book was written in Greek, the characters of the Greek alphabet were transliterated into ASCII and, at the time this made perfect sense. It made sense because ASCII represented the standard every computer understood.
Nowadays, modern computers use Unicode for the encoding of text. The first notions of Unicode came into being sixteen years after the start of project Gutenberg. Most scripts are defined in Unicode and modern software expects Unicode. Now, twenty five years later there are free fonts for all the important scripts. Transliteration into ASCII for all these languages makes the resulting product unusable for many people. Most people do not know how to read a text that is transliterated.
Is there still a point to stick to ASCII or can ASCII be safely replaced by Unicode?
Thanks,
GerardM
Thursday, June 07, 2012
Scanning sources; the conference
Who is going to he Impact event at the Koninklijke Bibliotheek in the Hague? This was a question on one of the GLAM mailing lists. Living close by and having an interest in the digitisation and transliteration of source material made it obvious for me to volunteer. From what I understand from its website, the conference will inform me on the latest developments and many of the best practices. Obviously, I will blog about my understanding of them.
At the Alexandria Wikimania, many Wikimedians visited the scanning operation at the Library of Alexandria. They had an "assembly line" where books were scanned in an industrial process. This is quite different from the scanning operations employed by Wikisourcerers. Scanners targetted for home use are used often combined with the OCR software that came with them. At the Berlin hackathon I learned that it is best to OCR a document at the latest moment. OCR software is getting smarter and the improvements are noticable. I also learned that OCR is available from within many of the Wikisource projects.
Going to a conference is one thing, going well prepared is another. It will help when I have a better understanding in the operation of Wikisource, when I know about what we do well and where we can improve. It will help me understand the relevance of what will be presented in the Hague.
To be prepared, I will have to learn more about Wikisource and I will blog about what I learn.
Thanks,
GerardM
Remember #Compuserve ?
Some content can exclusively be found in #Facebook. This makes Facebook a "walled garden". You have to have a Facebook profile or every now and then you find yourself excluded.
For me CompuServe was the original walled garden. It hosted information that I needed for my work. I had to use CompuServe, I had to pay for that privilege and I hated it because it was not that great an experience.
So far, Facebook finances itself by selling stocks and by selling adverts. There is a lot of speculation about future revenue streams following its recent IPO. Premium services that is services you have to pay for are already popping up and really what is more obvious than providing access to restricted content ?
At the time, people were afraid that the CompuServe model would make the Internet less open. Facebook is probably relatively as big as CompuServe was at the time. The best way to deal with walled gardens is to ignore them as much as possible and actively support diversity on the Internet.
Thanks,
GerardM
For me CompuServe was the original walled garden. It hosted information that I needed for my work. I had to use CompuServe, I had to pay for that privilege and I hated it because it was not that great an experience.
So far, Facebook finances itself by selling stocks and by selling adverts. There is a lot of speculation about future revenue streams following its recent IPO. Premium services that is services you have to pay for are already popping up and really what is more obvious than providing access to restricted content ?
At the time, people were afraid that the CompuServe model would make the Internet less open. Facebook is probably relatively as big as CompuServe was at the time. The best way to deal with walled gardens is to ignore them as much as possible and actively support diversity on the Internet.
Thanks,
GerardM
The use case for #OpenID indicated by the #LinkedIn hack
LinkedIn was hacked; all the passwords in use a few days ago are no longer secret. The advice people get is to change their password everywhere where they use the same password.
This blog post is not about LinkedIn. It is about the lack of security provided by passwords as seen from a user point of view. Any organisation that thinks it can not happen to them is delusional. From a user point of view, any website that wants you to create a user with a password that is maintained on that website is a potential security risk. A risk you are exposed to because any site can be hacked and, you do not remember passwords that are unique to each website.
For a user, it is more secure to rely on one place where all the authentication to any website is done. The advantage becomes clear when a website is hacked; there is no password for you to abuse. When the authentication server is hacked, all that is required is to change the password at that central server.
LinkedIn was compromised and as a result many people with a Wikimedia account have an account that is compromised as well. Many of these people will not change their password because they cannot be bothered or because they are not aware of the risk.
As a consequence disruption by "trusted users" is a potential and realistic scenario. This risk can be mitigated by accepting the use of authentication through an OpenID service.
Thanks,
GerardM
This blog post is not about LinkedIn. It is about the lack of security provided by passwords as seen from a user point of view. Any organisation that thinks it can not happen to them is delusional. From a user point of view, any website that wants you to create a user with a password that is maintained on that website is a potential security risk. A risk you are exposed to because any site can be hacked and, you do not remember passwords that are unique to each website.
For a user, it is more secure to rely on one place where all the authentication to any website is done. The advantage becomes clear when a website is hacked; there is no password for you to abuse. When the authentication server is hacked, all that is required is to change the password at that central server.
LinkedIn was compromised and as a result many people with a Wikimedia account have an account that is compromised as well. Many of these people will not change their password because they cannot be bothered or because they are not aware of the risk.
As a consequence disruption by "trusted users" is a potential and realistic scenario. This risk can be mitigated by accepting the use of authentication through an OpenID service.
Thanks,
GerardM
Wednesday, June 06, 2012
#Wikimedia #swag
My key chain is probably the best #Wikipedia related object that I have. It is big, it is sturdy and before I had business cards it was a nice gimmick introducing the organisation I work for.
My key chain was given to me at a conference and now you can add it to your cart and have your own. The impression of the Wikimedia shop is one of a work in progress. There are already fifteen things you can choose from. The Wikipedia mug for instance is not yet available nor are the computer bags or baseball caps.
The shop started small and it is in English only. The good news is that there is so much room for growth. They could sell more things and there could be things to do with other projects. Internationalisation of the software used is under way and as the software is open source, translatewiki.net is an option.
Thanks,
GerardM
Publishing final versions
On #Wikipedia, an article never finds its final form. There is no final form, there is always a potential next iteration, never mind the quality of the current version. The same is true for Wiktionary; there is always room for another translation, another attribute.
When you want final forms, you produce something static. There are many reasons why this is done. One reason is to create a final version and create a collection of articles and calling it a book. Such books can be priced and it is quite legitimate to sell them on ebay. This has been done and when another such publication is found there are usually some people who complain about it.
For Wikisource and Wikibooks however there are finished products. In Wikisource a project can be considered finished once the proof reading of a digital text has been completed of what started as a scanned text. Only when it is finished, it is ready for easy public consumption.
What Wikisource does is provide the tools for this process. Essentially Wikisource serves a community that is engaged in a workflow. What it does not do so well is publish the finished products and find an audience for them.
The work at Wikisource is important. Its finished products are relevant and they are worthwhile. They fit in our aim to share in the sum of all knowledge. We can give things away, we can ask for a contribution but we owe it to our public that we make a best effort in making all this knowledge we are the custodian of easily accessible.
We may need a new project .. "Wikipublish" where we publish final versions of our "Wikisources". Where we publish collections of Wikipedia articles. Where we publish our "Wikibooks". We do not need much marketing, we need to make our finished goods available to a public and provide it in a format that it finds easy to digest.
Thanks,
GerardM
When you want final forms, you produce something static. There are many reasons why this is done. One reason is to create a final version and create a collection of articles and calling it a book. Such books can be priced and it is quite legitimate to sell them on ebay. This has been done and when another such publication is found there are usually some people who complain about it.
For Wikisource and Wikibooks however there are finished products. In Wikisource a project can be considered finished once the proof reading of a digital text has been completed of what started as a scanned text. Only when it is finished, it is ready for easy public consumption.
What Wikisource does is provide the tools for this process. Essentially Wikisource serves a community that is engaged in a workflow. What it does not do so well is publish the finished products and find an audience for them.
The work at Wikisource is important. Its finished products are relevant and they are worthwhile. They fit in our aim to share in the sum of all knowledge. We can give things away, we can ask for a contribution but we owe it to our public that we make a best effort in making all this knowledge we are the custodian of easily accessible.
We may need a new project .. "Wikipublish" where we publish final versions of our "Wikisources". Where we publish collections of Wikipedia articles. Where we publish our "Wikibooks". We do not need much marketing, we need to make our finished goods available to a public and provide it in a format that it finds easy to digest.
Thanks,
GerardM
Monday, June 04, 2012
#wmdevdays - #accessibility
At the Berlin #hackathon 2012 many people were hacking on many subjects. Kai Nissen worked on the accessibility of MediaWiki by people with a visual impairment. At the end of the hackathon several bugs were squashed and ready for review in Gerrit.
It is wonderful when reports on defects are actionable and when something gets done.
Thanks,
GerardM
It is wonderful when reports on defects are actionable and when something gets done.
Thanks,
GerardM
How was the Berlin Hackathon 2012 for you
I participated in the Hackathon for the very first time and was quite amazed about so many people coming together and actually work productively on feature enhancements or bug fixes. I had a lot of talks with people who gave really helpful feedback about the project I'm currently working on.
You have been working on accessibility for the blind ... how did you get into this subject
I was pointed to an analysis report concerning accessibility in Wikipedia that was carried out by the Swiss initiative "Access for all". While reading this I realized that a lot of the mentioned issues were
quite easy to solve. A lot of the content available on the web seems to be designed without considering accessibility aspects, although a little tweak can always have a high impact.
How do you know what to focus on
The analysis report was quite thorough and included recommendations, so it ended up to be something like a task list.
You identified a number of issues to work on this weekend .. how did it go
The issues I was working on were quite easy to fix, whenever I had problems with something there was always somebody around to help out.
Did having all these other hackers make a difference ?
The gathering of all those experienced MediaWiki developers is a really helpful thing. That applies to having certain questions answered rightaway as well as just sharing experience in whatever topic might
come up.
Brion Vibber helped you with the parser tests ...
Brion figured out what the problems was in no time. There has been a language version related bug in another patch which he simply reverted.
Are there many more accessibility issues in MediaWiki people can help with
The accessibility analysis report mentions more issues that need to be fixed to make Wikipedia and all other projects based on MediaWiki more accessible. It is clearly written and points out lacks of accessibility very
How do you continually test for good accessibility of our software
Most of the time one seldomly notices lacks of accessibility when not being affected by disabilities. Whatever issue I fixed for improving accessibility I have to keep in mind to apply that again in a similar case.
Are there best practices for coding for accessibility
There are guidelines defined by the WAI, which should be considered when coding for accessibility.
Do you have thoughts on what the Visual Editor will do for accessibility ?
Since screen readers will read what is written on the screen it also reads the wikitext as is. That might be hard to understand, especially for newcomers. Reading out a headline as a headline instead of
"equals-equals-headline- equals-equals" can be very helpful to focus on the subject itself.
I participated in the Hackathon for the very first time and was quite amazed about so many people coming together and actually work productively on feature enhancements or bug fixes. I had a lot of talks with people who gave really helpful feedback about the project I'm currently working on.
You have been working on accessibility for the blind ... how did you get into this subject
I was pointed to an analysis report concerning accessibility in Wikipedia that was carried out by the Swiss initiative "Access for all". While reading this I realized that a lot of the mentioned issues were
quite easy to solve. A lot of the content available on the web seems to be designed without considering accessibility aspects, although a little tweak can always have a high impact.
How do you know what to focus on
The analysis report was quite thorough and included recommendations, so it ended up to be something like a task list.
You identified a number of issues to work on this weekend .. how did it go
The issues I was working on were quite easy to fix, whenever I had problems with something there was always somebody around to help out.
Did having all these other hackers make a difference ?
The gathering of all those experienced MediaWiki developers is a really helpful thing. That applies to having certain questions answered rightaway as well as just sharing experience in whatever topic might
come up.
Brion Vibber helped you with the parser tests ...
Brion figured out what the problems was in no time. There has been a language version related bug in another patch which he simply reverted.
Are there many more accessibility issues in MediaWiki people can help with
The accessibility analysis report mentions more issues that need to be fixed to make Wikipedia and all other projects based on MediaWiki more accessible. It is clearly written and points out lacks of accessibility very
How do you continually test for good accessibility of our software
Most of the time one seldomly notices lacks of accessibility when not being affected by disabilities. Whatever issue I fixed for improving accessibility I have to keep in mind to apply that again in a similar case.
Are there best practices for coding for accessibility
There are guidelines defined by the WAI, which should be considered when coding for accessibility.
Do you have thoughts on what the Visual Editor will do for accessibility ?
Since screen readers will read what is written on the screen it also reads the wikitext as is. That might be hard to understand, especially for newcomers. Reading out a headline as a headline instead of
"equals-equals-headline-
--- Kai
Sunday, June 03, 2012
#wmdevdays - finished a book a source, then what?
Both #Wikibooks and #Wikisource do a terrible job promoting their finished product. The Wikibookers and Wikisourcerers move on to the next book or source. A new source may get a moment of glory as a featured text or as a new text. A new book may be featured.
The problem is not so much with the Wikibooks and Wikisource projects, it is with what these projects actually are. They provide a workflow for the transcription, the proofreading and the final touches for the creation of digitised books and sources. As long as these are works in progress, they are not the finished product the general public is looking for.
The finished product of both projects is beautiful, lovable and deserves attention. The finished products deserve the attention of the public, they should be provided it in formats like EPUB and ZIM and it should be really easy for people to find and use them.
It is important to bring the best of what Open Content has to bring in sources and books to the public. Finding a big public will motivate many of our volunteers and it may bring us more volunteers. As we publish more titles, our projects gain relevance. As the final product becomes a more polished product people will love what we do. We already do.
Thanks,
GerardM
The problem is not so much with the Wikibooks and Wikisource projects, it is with what these projects actually are. They provide a workflow for the transcription, the proofreading and the final touches for the creation of digitised books and sources. As long as these are works in progress, they are not the finished product the general public is looking for.
The finished product of both projects is beautiful, lovable and deserves attention. The finished products deserve the attention of the public, they should be provided it in formats like EPUB and ZIM and it should be really easy for people to find and use them.
It is important to bring the best of what Open Content has to bring in sources and books to the public. Finding a big public will motivate many of our volunteers and it may bring us more volunteers. As we publish more titles, our projects gain relevance. As the final product becomes a more polished product people will love what we do. We already do.
Thanks,
GerardM
#WMDEVDAYS - #OmegaWiki vs #Wikidata
It has been wonderful to see the enthusiasm in Wikidata. It is wonderful to notice how Wikidata is developing rapidly. It is interesting to look at what it does and, how it compares to the first, the original Wikidata project, OmegaWiki.
The short answer is, that there are three main differences;
The short answer is, that there are three main differences;
- OmegaWiki uses a lexical perspective where Wikidata uses a concept like perspective
- OmegaWiki is more mature
- Wikidata has the support of the Wikimedia Foundation
Wikidata starts with the existence of Wikipedia article(s). Each one of those represents a concept. Berlin (Germany) is an example and Berlin (Connecticut) is another. In OmegaWiki they would both be known as Berlin while in Wikidata they may both have an additional label for the English language with "Berlin" as its content.
This difference or perspective gives Wikidata an immediate application; it will replace the "interwiki links" for Wikipedia. One important benefit will be that there will be only one instance where these links are maintained. In the case of Berlin (Germany) a new article about this city will not see an update in 205 other Wikipedias.
Both Wikidata and OmegaWiki are able to present details on Berlin. All the details at OmegaWiki have been added by hand. Most of the details Wikidata will know are likely to be harvested from info boxes or from DBpedia. What will also make a deciding difference is that the community interested in the Wikidata content will be huge.
So OmegaWiki is more mature... in practical terms it means that it started a long time ago and is now a bit dated. The Wikidata folks know OmegaWiki, they benefited from the experience and while they do not include all the data OmegaWiki has, there is nothing stopping them at a later date.
Many people at the Berlin hackathon are keenly aware of the disturbing potential of Wikidata; no more need for the pywikipedia interwiki bot. Information in info boxes change without a trace in the local project. But they also see the possibility of providing information in a language where its Wikipedia does not have an article... information with an infobox..
It is the support of the Wikimedia Foundation and more importantly its communities that will make Wikidata a success. This is a good thing because the inclusion of structured data in Wikipedia will prove a big improvement in data quality and accuracy.
Thanks,
GerardM
Friday, June 01, 2012
#WMDEVDAYS - Redlinks in any language
In #Wikipedia, when you consider Berlin, the capitol of Germany is not the only Berlin in existence. Currently there are 205 Wikipedia articles about the German capitol with an interwiki link and as there are 285 Wikipedias, there are 80 articles either missing or they do not have a link.
For Berlin Connecticut, there are 10 Wikipedia articles. It is likely that the Berlin part is written exactly the same as the German Berlin. It follows that a label for both Berlins is exactly the same. There are many more cities called Berlin in the United States of America...
When a label with a description has been added to Wikidata, it means that the existence of a subject has been registered for that language. As a consequence Wikidata knows it has to disambiguate Berlin. It knows that it can show attributes stored for the subject in Wikidata like the seal shown here to the right. This may even be further refined by showing the attributes using the same template as used on the English Wikipedia.
What we do know for Berlin Connecticut is in what 10 languages we have an article. When one of those languages is a language you understand, you may even want to read it ...
Thanks,
GerardM
For Berlin Connecticut, there are 10 Wikipedia articles. It is likely that the Berlin part is written exactly the same as the German Berlin. It follows that a label for both Berlins is exactly the same. There are many more cities called Berlin in the United States of America...
When a label with a description has been added to Wikidata, it means that the existence of a subject has been registered for that language. As a consequence Wikidata knows it has to disambiguate Berlin. It knows that it can show attributes stored for the subject in Wikidata like the seal shown here to the right. This may even be further refined by showing the attributes using the same template as used on the English Wikipedia.
What we do know for Berlin Connecticut is in what 10 languages we have an article. When one of those languages is a language you understand, you may even want to read it ...
Thanks,
GerardM
#WMDEVDAYS - Gadgets
Many of the gadgets people use in #Wikipedia are awesome. They make things possible. Once a good one is developed, they are copied to other wikis to do its job.
There are a few problems with this model.
Once a gadget is finding its use, it becomes more clear how to best use it. Some of the changes change the text used in the software others change the functionality itself. Because of this every instance of a gadget is different and consequently, when the MediaWiki software is updated, a gadget may or may not fail. This is not knowable in advance.
The next iteration of gadgets is going to bring three major improvements:
There are a few problems with this model.
Once a gadget is finding its use, it becomes more clear how to best use it. Some of the changes change the text used in the software others change the functionality itself. Because of this every instance of a gadget is different and consequently, when the MediaWiki software is updated, a gadget may or may not fail. This is not knowable in advance.
The next iteration of gadgets is going to bring three major improvements:
- A gadget can be enabled on any wiki but exists only in one place
- A gadget can be localised to support your language
- A gadget can be made more future proof by implementing the latest technology
Currently, Wikimedia developers visit wikis ahead of an upgrade of the MediaWiki software. They check many gadgets and other local functionality to ensure that everything will still work after the upgrade. Even though they make a darn good best effort, things still fail. Because of the numbers involved, there is no chance for them to succeed. With "Gadgets 2.0" gadgets will need to be checked only once and that gives them a fighting chance and our communities improved continuity.
Thanks,
GerardM
#WMDEVDAYS - disambiguating and redirecting
Once #Wikidata is used for real, all the articles in Wikipedia are known to it. People will add "additional names" to the existing label of an article and these will be used when searching for the right article.
These additional names are essentially what redirect pages are. Wikidata knows all the redirect pages so why have them?
When all these aliases, these additional names are known, there will be instances where disambiguation is needed. As Wikidata is about data, it will know when the provision of a disambiguation page is needed. When Wikidata has the information to provide such disambiguation, why have disambiguation pages?
Getting rid of redirect pages should be uncontroversial. Getting rid of existing disambiguation pages may prove controversial. Providing disambiguation where there was none before should be fine.
It is fun to theorise how Wikidata will change our practices.
Thanks,
GerardM
These additional names are essentially what redirect pages are. Wikidata knows all the redirect pages so why have them?
When all these aliases, these additional names are known, there will be instances where disambiguation is needed. As Wikidata is about data, it will know when the provision of a disambiguation page is needed. When Wikidata has the information to provide such disambiguation, why have disambiguation pages?
Getting rid of redirect pages should be uncontroversial. Getting rid of existing disambiguation pages may prove controversial. Providing disambiguation where there was none before should be fine.
It is fun to theorise how Wikidata will change our practices.
Thanks,
GerardM
#WMDEVDAYS - Using #Commons
Commons works best when you use known pictures. As it says on Commons, it is "a database of 12,959,488 freely usable media files to which anyone can contribute". Anyone can and, it is becoming ever easier to contribute even more media files. The problem is not so much contributing files, it is finding them and using them. Many people I know prefer to find pictures elsewhere. I tried to find pictures of a bunch of keys on Commons for a recent blog post but after too many pictures of Alicia Keys, I gave up.
At the Berlin Hackathon 2012, it is the kind of thing that merits some shared consideration. The existing categories on Commons just do not help. When looking for keys, you want to choose from all keys and not navigate a structure. It makes more sense to have tags and possibly disambiguate those tags. Combine them with other tags and you have a result that will be more satisfying.
The value of a collection is in its use.
Thanks,
GerardM
#WMDEVDAYS - Single signon
One of the #Wikidata presentations posed a question: "We need a much improved single signon". For Wikidata it is important to be able to edit Wikidata itself while working at the same time on a Wikidata client. The presenter wanted to understand if this is an issue that needs to be addressed soon.
When you will be using Wikidata on one of the Wikimedia projects, you will typically use one user on all projects. This user is authenticated once when he signs on with his global user account and, this authentication serves him well when he works on other projects.
This will work well for Wikidata. The only issue left is that they also want to be usable and editable for users who use Wikidata from their own server. This however is an issue that they do not have to solve immediately.
There is existing functionality like OpenID that may provide a solution for this. It is great that the Wikidata people consider how the data can be used from outside the Wikimedia Foundation. It is wonderful that they provide a use case that makes a case for implementing OpenID.
Thanks,
GerardM
When you will be using Wikidata on one of the Wikimedia projects, you will typically use one user on all projects. This user is authenticated once when he signs on with his global user account and, this authentication serves him well when he works on other projects.
This will work well for Wikidata. The only issue left is that they also want to be usable and editable for users who use Wikidata from their own server. This however is an issue that they do not have to solve immediately.
There is existing functionality like OpenID that may provide a solution for this. It is great that the Wikidata people consider how the data can be used from outside the Wikimedia Foundation. It is wonderful that they provide a use case that makes a case for implementing OpenID.
Thanks,
GerardM
Subscribe to:
Posts (Atom)