Wednesday, October 23, 2013

''#Pywikipediabot'' ''20:00 Thursday, October 24, 2013 - Sunday, October 27 22:00'' UTC

The one tool used by most bots has not only stood the test of time, it is getting ready to complete its rejuvenation. It moved closer to the tools used in the Wikimedia Foundation. The road map has it that a triage is needed.

This interview happened largely on the pywikipedia mailinglist. Another great resources for bot runners. What you read is a compilation, more takes can be found in the list archive.. :)
Enjoy,
      GerardM

What is pywikipedia?
  • A Python-based framework to manipulate Mediawiki installations. Any installation, not only those run by the Wikimedia Foundation, can be worked on. You can change everything, you can change in the Wiki as on  an editor also per API. Thus, pywikipediabot can be used to create so called bot tasks, to do the same change to a lot of pages and do it fast.
  • A huge bunch of scripts which use the framework above for all tasks you can think of: One script for example for mass uploading pictures (like scans of a book), one script for cleaning up page source code (like removing <br>, multiple empty lines, reorder  interwikilinks,...), scripts for fixing common errors, and so on. change everything you can change in the Wiki as an editor also per API. Thus, pywikibot can be used to create so called bot tasks, to do the same change to a lot of pages and do it fast.
As I understand it there are two versions, "core" and "compat". What is pywikipedia core and what is compat?
Compat is old. Core is the redesign with complete new and cleaner data structures. Most new API functions (like those to modify Wikidata) are only or much better supported by core.
Why keep them both, it must be a lot of work to have to maintain them both..
There are so many working scripts which do their job now for years - thus, there is not much pressure to move them to core. Nonetheless, if you plan to do something new, use core, it's much more appealing.
Recently all the bugs have been moved to bugzilla ... What is it that you hope to achieve by this?
Handling bugs will be a lot more easier, there will be more eyes to keep on debugging process, tasks will be centralized, it will be more wikified by merging in WMF infrastructure, you can see more in this suggestion page.  
At Sourceforge you have your code repository and - unlinked - your bug and feature request system which over the years get filled with a lot of dormant, outdated bugs, which were fixed a long time ago. We hope to get a clean list of bugs.
Recently all the code has been moved to GIT ... What is it that you hope to achieve by this?
GIT allows you to work much more naturally and doesn't break everything when you merge your different ideas again. So many new features compared to SVN that I do not want to outline this here. In short: In SVN you can branch your code if you do something different. But merging does never work, therefore, noone uses this important feature. With GIT, merging works (as it stores much more information as SVN). Thus, whenever you start a new task, you branch, you have a new idea.
What is the biggest challenge running the pywikipedia bot?
On the pywikibot side, learning its limitation and working around them. On the larger, bot side of things, performance. You have to always balance thread-safe generators and better support for sections (such as retrieve the whole page but submit a section or vice versa). The API/server performance with the local computer's performance (mostly disk access for me, might be RAM limitations when running on a VPS) and the network performance. While not related to pwb, this is something you have to consider for each new bot one writes. pwb could help by providing more advanced support for threads, such as retrieve the whole page but submit a section or vice versa)
How many people are using the pywikipedia bot and how many people are developing code for pywikipedia bot
It has been widely used since 2003, and It has more than 100 authors but now they are just five active developers. After the launch of Wikidata many bots lost its work :-) e.g. one bot was active on all wikipedias, now it works only on 2-4 and the wiktionaries.
It is possible to use the pywikipedia bot in so many ways... Is it easy to learn what it can and cannot do?
Yes: Just join the irc channel and ask. For nearly all tasks, some "swiss army knife" exists, and the developers are extremely helpful. Just ask.  
When script have documentation inside, its easier. Some scripts have documentation on mediawiki.org, but nobody really knows all about it.
How long does it take before pywikipedia bot supports a new Wikidata data type ?
Usually it takes a week or two based on how much the developers are willing to do. The following datatypes are already supported: item, string, URL, coordinates. Time is nearly done. - > ~ 2 month
Tomorrow there will be the pywikibot triage... What is it and who can participate?
The bug day will focuse mainly on categorizing and prioritizing and closing non-reproducible bugs and fixing them if the bug is not a big deal. Because we just migrated there are ~700 open bugs (some of them are really old and it's fixed long time ago) so we need to clean them up. It's really necessary for us to have this "bug triage".
The good news is that anybody can participate.

No comments: