Purging whole namespaces of pages in MediaWiki

So, I was asked to purge all the pages in several categories. The smaller categories are relatively easy to do using the API sandbox.

  1. Visit the Special:ApiSandbox page on your wiki..gnome-shell-screenshot-QY4TO0.png
  2. Select the action purge. purge.png
  3. Select action=purge from the sidebar.gnome-shell-screenshot-EBGKO0.png
  4. Look for the generator option and then select allpages from the drop down.gnome-shell-screenshot-YX5XO0.png
  5. Return to the top of the page and select generator=allpages from the sidebar.gnome-shell-screenshot-65JRO0.png
  6. Look for the gapnamespace option and select the namespace you want to purge. gnome-shell-screenshot-JZZOO0.png
  7. Execute the request using the “Make request” button at the top of the page. gnome-shell-screenshot-XHXJO0.png
  8. When the request is complete, there may be the opportunity to repeat the request with the next batch of pages. You’ll see a button at the bottom of the JSON output that says “Continue”. Click it until the entire namespace has been purged. gnome-shell-screenshot-D14WO0.png

The API sandbox will let you play around with different parameters. For example, in the last screenshot, I set gaplimit (under generator=allpages) to 3 but I could have set it as high as 500 if I wanted.

So for namespaces that don’t have too many pages (say, less than 1000), this is do-able. But for your average-sized wiki, a namespace is likely hold tens of thousands of pages. Something more is needed.

Next, purging namespaces programatically.

MABS status report: Updating MediaWiki::API

For the past couple of weeks, I’ve had a significant amount of time to spend on Multilateral, Asynchronous, Bidirectional Synchronisation of wikis or MABS for short.

This is all built on the git remote for MediaWiki work that was started almost a decade ago by some students. Since the initial effort there have been some significant changes in the MediaWiki API and, in the meantime, the MediaWiki::API Perl module that is doing a lot of heavy lifting in this project hasn’t seen a lot of work. For example, the last commit on the GitHub repository was to fix a typo in 2015.

So, I’ve been working this past week on updating the Perl module. This has been a lot of fun since I used to be quite the Perl snob—and by that I mean I looked down on people who didn’t love Perl, not that I looked down on Perl. Times have changed for me in the past ten or eleven years, so I’ve acquired some humility and begun doing a lot of work in what I would have considered to be the bottom-of-the-barrel language: PHP. Coming back to Perl is a lot of fun.

That said, Perl has continued to grow while I’ve been gone and I need some advice. I’ve become a huge fan of linters, so one change has been adhering pretty closely to almost every criticism Perl::Critic throws at me. I’ve gone as far as adding “smx” after almost every regular expression and incompatibly changing use constant to Readonly. You might say I’m getting a little carried away.

This and fixing the tests to use a docker instance (if available) rather than just sending every tester to the testwiki, as well as fixing some bugs I found along the way, has helped me understand this vital piece of the MABS project.

Still, coming back to Perl has made me realize just how ad hoc Perl’s object system is. I’ve heard of Moose and Mus (which I’m leaning towards), but I was wondering what best-practices the Perl community has for updating an existing code base.

Update 1: I asked for some feedback on the Perl object system to use and got some great feedback.

Update 2: I contacted the original author (Jools Wills) of the MediaWiki::API module and talked to him about what direction to take with it. I’ll have to do some more work on it to make it work well with for my purposes, but I may end up sending him a bunch of pull requests.

Photo by Roger McLassus [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0/)], via Wikimedia Commons

Rule #0 of any checklist

The Checklist Manifesto.jpg

A while back I mentioned Atul Gawande‘s book The Checklist Manifesto. Today, I got another example of how to improve my checklists.

The book talks about how checklists reduce major errors in surgery. Hospitals that use checklists are drastically less likely to amputate the wrong leg.

So, the takeaway for me is this: any checklist should start off verifying that what you “know” to be true is true. (Thankfully, my errors can be backed out with very little long term consequences, but I shouldn’t use this as an excuse to forego checklists.)

Before starting, ask the “Is it plugged in?” question first. What happened today was an example of when asking “Is it plugged in?” would have helped.

Today I was testing the thumbnailing of some MediaWiki code and trying to understand the $wgLocalFileRepo variable. I copied part of an /images/ directory over from another wiki to my test wiki. I verified that it thumbnailed correctly.

So far so good.

Then I changed the directory parameter and tested. No thumbnail. Later, I realized this is to be expected because I didn’t copy over the original images. So that is one issue.

I erased (what I thought was) the thumbnail image and tried again on the main repo. It worked again–I got a thumbnail.

I tried copying over the images directory to the new directory, but it the new thumbnailing directory structure didn’t produce a thumbnail.

I tried over and over with the same thumbnail and was confused because it kept telling me the same thing.

I added debugging statements and still got no where.

Finally, I just did an ls on the directory to verify it was there. It was. And it had files in it.

But not the file I was trying to produce a thumbnail of.

The system that “worked” had the thumbnail, but not the original file.

So, moral of the story: Make sure that your understanding of the current state is correct. If you’re a developer trying to fix a problem, make sure that you are actually able to understand the problem first.

Maybe your perception of reality is wrong. Mine was. I was sure that the thumbnails were being generated each time until I discovered that I hadn’t deleted the thumbnails, I had deleted the original.

(Photo CC-BY 2.0 David Pursehouse: Earthquake survival kit checklist from Japan.

Translation:
– 1.5 liter bottle of water
– Canned bread
– Rice
– Pack of disposable toilet bags
– Foil sheet (to keep body dry/warm))

Go and do likewise

While many people are becoming more comfortable with single payer healthcare–thanks to Bernie Sanders–many of my Christian compatriots live in a socially conservative milieu that has so totally embraced the myth of the bootstraps that it has turned the call for personal responsibility (an inarguable good) into an excuse to escape caring for other people when we have the means.

This was made clear to me when I shared Jessica Kantrowitz‘s post on twitter:

Understandably, some people objected.  For example, my mother, a careful reader of scripture, commented: “???? Never read that.”  In the discussion that followed, she said Christians are to be personally involved, “A real neighbor sees a need and gets personally involved.

And I totally agree with that.

However, it ends up being an excuse not to use taxes for social welfare since there is no “personal involvement.” But, the story of the Good Samaritan does not say the only way we are to help others is through personal involvement.

So, let me return to the original statement that provoked this discussion. It is a hyperbolic statement.  Jesus did not literally say “Pay for other people’s healthcare.”

But it would be a valid conclusion to draw from the story of the Good Samaritan.

Jesus was asked “who is my neighbor?” by a man trying to make sure he met all the legal requirements the command to “love your neighbor as yourself.” He was trying to make sure he would merit eternal life.

In response, Jesus told a story that ended with a Samaritan paying for the care of the man he rescued (after two other “holy” men before him had passed by) and then promising to pay for any further costs when he was able to return.  After this story, Jesus asked, in the Socratic style of teaching, “Which of these three do you think was a neighbor to the man who fell into the hands of robbers?”

So, yes, Jesus didn’t say “Pay for other people’s health care” but he also did not say “Go be personally involved.” In fact, the story clearly shows the opposite: the Samaritan was personally involved, but when he couldn’t stay and personally take care of the man, he left him with someone else and left money to care for him.

And in the end, Jesus didn’t give the man asking him for spiritual advice an easy answer. He didn’t give any explicit direction. He said “Go and do likewise.” What that is in any situation differs.

Sure, like the Good Samaritan, Christians are called to get dirty helping others.

But, also like the Good Samaritan, we have to continue with our own business.

This doesn’t excuse us from caring for others when we cannot be personally involved. When we have other pressing matters we can give others the resources to care in our place, just as the Good Samaritan left the man with Innkeeper.

(Photograph by jean-louis Zimmermann from Moulins, FRANCE [CC BY 2.0], via Wikimedia Commons.)

Creating an external auto-completion provider for Page Forms

(The picture on this post is from Pilgram’s Progress as Christian stuggles in the slough of despond.  I feel his pain here.)

I have a couple of use cases that require pulling from an external data source into MediaWiki. Specifically, they need to pull information such as employee data in Active Directory or a company-wide taxonomy that is maintained outside of MediaWiki. Lucky for me, there is thePage Forms extention.

Page Forms provides a couple of ways to do this: directly from an outside source or using the methods provided by the External Data extension.

Since I was going to have to write some sort of PHP shim, anyway, I decided to go with the first method.

Writing the PHP script to provide possible completions when it was given a string was the easy part. As a proof of concept, I took a the list of words in /usr/share/dict/words on my laptop, trimed it to 1/10th its size using

sed -n ‘0~20p' /usr/share/dict/words > short.txt

and used a simple PHP script (hosted on winkyfrown.com) to provide the data.

That script is the result of a bit of a struggle. Despite the fact that the documentation pointed to a working example (after I updated it, natch), that wasn’t clear enough for me. I had to spend a few hours poking through the source and instrumenting the code to find the answer.

And that is the reason for this weblog post. I posted the same thing earlier today to the Semantic Mediawiki Users mailing list after an earlier plea for help. What resulted is the following stream-of-conciousness short story:

I must be doing something wrong because I keep seeing this error in the
js console (in addition to not seeing any results):

    TypeError: text is undefined 1 ext.sf.select2.base.js:251:4
        removeDiacritics https://example.dom/w/extensions/SemanticForms/libs/ext.sf.select2.base.js:251:4
        textHighlight https://example.dom/w/extensions/SemanticForms/libs/ext.sf.select2.base.js:258:23
        formatResult https://example.dom/w/extensions/SemanticForms/libs/ext.sf.select2.base.js:100:15
        populate https://example.dom/w/extensions/SemanticForms/libs/select2.js:920:39
        populateResults https://example.dom/w/extensions/SemanticForms/libs/select2.js:942:21
        updateResults/<.callback< https://example.dom/w/extensions/SemanticForms/libs/select2.js:1732:17
        bind/< https://example.dom/w/extensions/SemanticForms/libs/select2.js:672:17
        success https://example.dom/w/extensions/SemanticForms/libs/select2.js:460:25
        fire https://example.dom/w/load.php:3148:10
        fireWith https://example.dom/w/load.php:3260:7
        done https://example.dom/w/load.php:9314:5
        callback https://example.dom/w/load.php:9718:8

The URL http://example.dom/custom-autocomplete.php?l=lines&f=words
shows all the lines from the source (in this case, every 10th line from
/usr/share/dict/words) that matches “lines”. This example results in:

        {"sfautocomplete":
            {"2435":{"id":"borderlines",
                            "value":"borderlines",
                            "label":"borderlines",
                            "text":"borderlines"},
                            …

In my php script, I blatted the value over the keys “id”, “value”, “label” and “text”
because I saw each of them being use, but not why.

Anyway, PF is configured to read this correctly, so I can see that when
the user types “lines” an XHR request is made for
https://example.dom/w/api.php?action=sfautocomplete&format=json&external_url=tempURL&substr=lines&_=1494345628246
and it returns

    {"warnings": {
        "main": {
              "*": "Unrecognized parameter: '_'"
        }
    },
     "sfautocomplete": [
        {
          "id": "borderlines",
          "value": "borderlines",
          "label": "borderlines",
          "text": "borderlines"
        }, ....

So far, so good.

I’m instrumenting the code for select2.js (console.log() is your friend!) and I can see that by the time we get to its populate() method we have a list of objects that look like this:

Object { id: 0, value: "borderlines", label: "borderlines", text: undefined }

Ok, I can see it substituting its own id so I’ll take that out of my
results.

There is no difference. (Well, the ordering is different — id now comes
at the end — but that is to be expected.)

Now, what happens if I take out text?

Same thing. Ordering is different, but still shows up as undefined.

Output from my custom autocompleter now looks like this:

        {"sfautocomplete":
            {"2435":{"value":"borderlines",
                     "label":"borderlines"},
                     …

and the SMWApi is now giving

    {"warnings": {
        "main": {
              "*": "Unrecognized parameter: '_'"
        }
    },
     "sfautocomplete": [
        {
          "value": "borderlines",
          "label": "borderlines"
        }, ....

Still the same problem. Let me try Hermann’s suggestion and make my
output look like:

        {"sfautocomplete":
            [
                {"borderlines":”borderlines”},
                ....

Still, no results. The resulting object does look like this, though:

Object { id: 0, borderline: "borderlines", label: "borderlines", text: undefined }

Looking at my instrumented code and the traceback, I have found that the
transformation takes place in the call

options.results(data, query.page);

at the success callback around line 460 in select2.js. This leads us back to ajaxOpts.results() at line 251 in ext.sf.select2.tokens.js (since this is the token input method I’m looking at) and, yep, it looks like I should be putting something in the title attribute.

And, yay!, after changing the output of my custom autocomplete script to:

        {"sfautocomplete":
            [
                {"title":”borderlines”,
                 “value”: ”borderlines”},
                ....

the autocompletes start working. In fact, putting

        {"sfautocomplete":
            [
                {"title":”borderlines”}
                ....

is enough.

If you made it this far, you’ll know that I should have just copied the example I found when I updated the page on MW.o, but then I wouldn’t have understood this as well as I do now. Instead, I used what I learned to provide an example in the documentation that even I wouldn’t miss.

(Image is public domain from the Henry Altemus edition of John Bunyan’s Pilgrim’s Progress, Philadelphia, PA, 1890. Illustrations by Frederick Barnard, J.D. Linton, W. Small, etc. Engraved by Dalziel Brothers. Elarged detail from Wikimedia Commons uploaded by by User:Dr_Jorgen.)

 

More tragic middle aged white men

Just as I hit middle age (my 44th birthday is this year), stories start coming out about how tragic white, middle-aged men’s lives are becoming. And, unlike many other people who have lived with their whole lives fighting, this is a new experience for many middle-aged white men in the States.

It started shortly after Bill Clinton helped Republicans in Congress enact a bunch of welfare reforms in the mid-90s.  Of course, those reforms targeted people that white men with jobs would see as moochers.

We started seeing the effects a few years later as disability claims more than doubled in the 10 years after 2000 and the mortality rate for middle-aged white folks went up.

Tragedy begats tragedy and, into this environment, a divorced middle aged and isolated, church-going white man falls on desperate times. His criminal background probably didn’t help, but he saw an opportunity in targeting romantic men in their 50s who had been divorced and become isolated and desperate.

The Atlantic story with the click-bait title “Murder by Craigslist” does a great job of telling the story of these middle-aged guys in a compassionate way. It manages to use the story of a serial murderer in a depressed area of Ohio to help us see the tragedy in his life and that of his victims.

When I read stories like this, they hit close to home. I’ve been very, very blessed, but I still see that I am but a step or two away from being one of the romantic white guys described in this story.

(Image of Craigslist World Headquarters in San Francisco‘s Sunset District from Wikimedia Commons by User:Calton. CC-BY-SA-3.0. Used by permission.)

 

How I chased down a PHP mode indention bug in Emacs

(When I posted this to reddit, someone pointed out that I could have gotten the same information from c-show-syntactic-information by hitting C-c C-s. Awesome!)

I was getting irritated by emacs refusing to indent PHP code properly when I was working on a MediaWiki extension.  I’ve run into this before, but, today, in order to procrastinate a little, I ran into it again with the following code:

try {
    $object->method( ARG );
}

It kept trying to put $object the same level as try, so it would end up with:

try {
$object->method( ARG );
}

So I chased it down. I used C-h k to find the function being called. After a bit of edebug, I found that the code in the function c-indent-line being called was essentially:

(c-get-syntactic-indentation (c-guess-basic-syntax))

In fact, doing M-; (c-get-syntactic-indentation (c-guess-basic-syntax)) RET when point was sitting on $ gave the result 4 when it tried to indent and 0 when it didn’t.

(Ok, the code had two more levels of indention than the above, so it was giving 12 and 8, but let’s not get carried away with details.)

Now, running M-x php-mode RET (i.e. none of the added configuration in .dir-locals.el) gave the proper indention. In my .dir-locals.el, though, I had set up a c-offsets-alist that mostly worked with bits and pieces copied from all over.

Running just M-; (c-guess-basic-syntax) RET returned ((statement-block-intro 1379)) so I figured I needed to add (statement-block-intro . +) to my c-offsets-list.

I added that, and it worked. And now I know how to chase down indention bugs.

(Header image by Ivo Kruusamägi [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons.)

Yet another take on the 2016 election

A friend of mine who is an ordained minister, someone I first met in real life, is now primarily a Facebook friend and has posted a link to If You Don’t Like Either Candidate, Then Vote for Trump’s Policies.

As you can probably predict, it caused some discussion, but just as I was getting ready to comment, it looks like he took it down.  That’s fine.  This works as a blog post and I’ve been meaning to write one.

Last week my friend posted a link to A Wretched Choice. I said then that the anti-Clinton prejudice was palatable. I’m struck, again, by the built in prejudice against Hillary. She has her faults but so much of what is said assumes you agree that she is satan incarnate.

In making the argument for voting for Trump’s policies, Wayne Grudem seems eager to take Trump at his word but ready to doubt Clinton no matter what she says. I think he’s forgotten Jesus admonition to be “wise as serpents and innocent as doves.”

In the meantime, he is ignoring Trump’s words — he contradicted his wife and said he didn’t apologize to her for his comments in the 2005 tape — and hearing only what he wants to hear. If you look at Trump looking for contrition and humility, you’re looking in the wrong place.

Mr. Grudem points to the Republican and Democrat’s respective platforms which he says guide the actions of elected officials. I would point out that Trump has shown a profound antipathy towards the Republican leaders. Why would you expect that he wouldn’t show that same antipathy towards the “shackles” of the Republican platform?

Look, I think it is fine to say “both candidates are awful and I just feel better about Trump”. At least, then, you’re honest about your prejudice. But if you want to present a rigorous case for Trump and against Hillary don’t assume that I agree with all your preconceived notions about Hillary.

For example, give me a reason to think that Trump’s past support for abortion is past, or, better yet, convince me that he has real compassion for those who are less fortunate for him. I’ve at least seen hints of that from Clinton.

Finally, I’ll quote Jesus again: “You will know them by their fruit.” The best example of fruit that I see are their eponymous foundations. Both have had problems, but, from what I can see, the Clinton Foundation is focused on spending money on what the Clinton’s see as real problems, while the Trump Foundation seems to have more room for self-dealing and bribes.

Image from Wikimedia Commons by DonkeyHotey (CC BY-SA 2.0).

Late night hacking and Italian Hospitality

Today is the second day at Wikimania in Esino Lario, a small town several kilometers up a narrow road full of switchbacks in northern Italy.

During the first day, I met some Kiwix developers.  CScott, especially, was talking about a use case that Kiwix doesn’t address well right now — offline editing.  We talked a bit about that, the International Space Station, WikiEM and I mentioned the mediawiki remote for git.

I had never actually tried it out, though.  I found several bugs and got caught up in fixing those rather than the other work I had planned.  More about this later in a separate blog post.

I worked on it too late.

When I finally decided to go to bed at 1:00 in the morning, I couldn’t find the house where I all my luggage was and where I was supposed to sleep.

I walked up and down the roads around the hills here, but finally had to give up at 2:00.  I walked back to the polizia station and asked for help.  I was hoping they had shared a list of the places that the registered attendees.

They had not shared the list.

It was 2:00 in the morning and a man I later found out was the vice mayor of Esino Lario was driving me around while I talked to his niece on his cell phone — the only person he could find that could find at that time that could translate.

Because of the absurd comedy involved — a stupid American didn’t know where he was supposed to sleep at 2:00 in the morning — his niece, Amelia, and his 80 year old mother quickly made up a bed for me and gave me her apartment while she went to stay with her uncle.

As I told them, it was a great privilege to experience Italian hospitality.  I could tell that Amelia’s grandmother (who reminded me of my own grandmother) was really pleased by this — even through translation.  She jokingly told me I should let everyone know about her bed and breakfast.

Improving watchlists for corporate MediaWiki use

I’ve learned, from listening to corporate users of MediaWiki, that watchlists are very important for maintaining the quality of the wiki.

The guys running the EVA wiki at NASA, for example, have done a lot of work on Watchlist Analytics extension to ensure that the articles on their wiki have watchers spread out over the entire wiki.

Installing this extension for a client increased their awareness of the usefulness of watchlists and, since they had been using WhoIsWatching in the past, they asked me to fix a problem they had encountered.

The client takes a little more proactive approach to managing their wiki. It might not seem like “the wiki way” to people who have only used MediaWiki on Wikipedia, but they wanted to use the ability of WhoIsWatching to put pages on editor’s watchlists.

In the past, when they used the page, it showed a list of users and allows those with permission to add the page to anyone’s watchlist. It limited the list to those people who had provided an email address, which made that more manageable.

Since then, I’ve implemented Single Sign On for them and auto-populated their email address from Active Directory. As a result, the number of users with an email address has jumped from a handful to over 10,000.

So, now WhoIsWatching was trying to load thousands of rows and display them all at once on a single page.

It was trying, but the requests were timing out and the page was un-usable.

The extension had other problems. It practiced security-through-obscurity. While you could disable the ability to add pages to other people’s watchlists, the only thing to keep the anyone from adding pages was the fact that its administrative page (or “Special Page” in MediaWiki parlance) was not on the list of special pages. If you knew about the page, you could visit it and add an article you were hard at work on to everyone’s watchlists, thus spamming them with notifications from the wiki of all your changes.

That, and if you visited the special page without providing any arguments, you’d get a cryptic “usage” message.

To address the first problem, I decided to put an auto-complete form on the page so that a user could start typing a username and then MediaWiki would provide a list of matching usernames. I wondered how I would do this until I noticed that the Special:UserRights page was now providing this sort of auto-completion. Adding that functionality was as easy as providing an text field with the class mw-autocomplete-user.

I addressed the security issue by adding a couple of rights that could be given to users through the user interface (instead of by updating the value of global variables in a php file).

Finally, the frosting on the cake was to make WhoIsWatching’s special page useful if you visited it all by itself.

I already knew that the search bar provided auto-completion for article names and, based on what I discovered with mw-autocomplete-user, I thought I might be able to do something similar with page completion.

I was right. Thanks to a bug in a minor skin discovered back in 2012, you can add the class mw-search-input to a text field and works.

I haven’t been aware of all the great auto-completion work that MediaWiki developers like krinkle and MetaMax have been doing, but I’m pleased with what I see. And the improvments that they implemented made adding the features I needed to WhoIsWatching about a thousand percent easier.

Oh, and I did miscellaneous code cleanup and i18n wrangling (with Siebrand’s guidance, naturally). Now many changes sit ready for review.

There are still things I’d like to fix, but those will have to wait.

Image credit: Livrustkammaren (The Royal Armoury) / Erik Lernestål / CC BY-SA [CC BY-SA 3.0 or Public domain], via Wikimedia Commons