Purging whole namespaces of pages in MediaWiki

So, I was asked to purge all the pages in several categories. The smaller categories are relatively easy to do using the API sandbox.

  1. Visit the Special:ApiSandbox page on your wiki..gnome-shell-screenshot-QY4TO0.png
  2. Select the action purge. purge.png
  3. Select action=purge from the sidebar.gnome-shell-screenshot-EBGKO0.png
  4. Look for the generator option and then select allpages from the drop down.gnome-shell-screenshot-YX5XO0.png
  5. Return to the top of the page and select generator=allpages from the sidebar.gnome-shell-screenshot-65JRO0.png
  6. Look for the gapnamespace option and select the namespace you want to purge. gnome-shell-screenshot-JZZOO0.png
  7. Execute the request using the “Make request” button at the top of the page. gnome-shell-screenshot-XHXJO0.png
  8. When the request is complete, there may be the opportunity to repeat the request with the next batch of pages. You’ll see a button at the bottom of the JSON output that says “Continue”. Click it until the entire namespace has been purged. gnome-shell-screenshot-D14WO0.png

The API sandbox will let you play around with different parameters. For example, in the last screenshot, I set gaplimit (under generator=allpages) to 3 but I could have set it as high as 500 if I wanted.

So for namespaces that don’t have too many pages (say, less than 1000), this is do-able. But for your average-sized wiki, a namespace is likely hold tens of thousands of pages. Something more is needed.

Next, purging namespaces programatically.

MABS status report: Updating MediaWiki::API

For the past couple of weeks, I’ve had a significant amount of time to spend on Multilateral, Asynchronous, Bidirectional Synchronisation of wikis or MABS for short.

This is all built on the git remote for MediaWiki work that was started almost a decade ago by some students. Since the initial effort there have been some significant changes in the MediaWiki API and, in the meantime, the MediaWiki::API Perl module that is doing a lot of heavy lifting in this project hasn’t seen a lot of work. For example, the last commit on the GitHub repository was to fix a typo in 2015.

So, I’ve been working this past week on updating the Perl module. This has been a lot of fun since I used to be quite the Perl snob—and by that I mean I looked down on people who didn’t love Perl, not that I looked down on Perl. Times have changed for me in the past ten or eleven years, so I’ve acquired some humility and begun doing a lot of work in what I would have considered to be the bottom-of-the-barrel language: PHP. Coming back to Perl is a lot of fun.

That said, Perl has continued to grow while I’ve been gone and I need some advice. I’ve become a huge fan of linters, so one change has been adhering pretty closely to almost every criticism Perl::Critic throws at me. I’ve gone as far as adding “smx” after almost every regular expression and incompatibly changing use constant to Readonly. You might say I’m getting a little carried away.

This and fixing the tests to use a docker instance (if available) rather than just sending every tester to the testwiki, as well as fixing some bugs I found along the way, has helped me understand this vital piece of the MABS project.

Still, coming back to Perl has made me realize just how ad hoc Perl’s object system is. I’ve heard of Moose and Mus (which I’m leaning towards), but I was wondering what best-practices the Perl community has for updating an existing code base.

Update 1: I asked for some feedback on the Perl object system to use and got some great feedback.

Update 2: I contacted the original author (Jools Wills) of the MediaWiki::API module and talked to him about what direction to take with it. I’ll have to do some more work on it to make it work well with for my purposes, but I may end up sending him a bunch of pull requests.

Photo by Roger McLassus [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0/)], via Wikimedia Commons