Creating an external auto-completion provider for Page Forms

(The picture on this post is from Pilgram’s Progress as Christian stuggles in the slough of despond.  I feel his pain here.)

I have a couple of use cases that require pulling from an external data source into MediaWiki. Specifically, they need to pull information such as employee data in Active Directory or a company-wide taxonomy that is maintained outside of MediaWiki. Lucky for me, there is thePage Forms extention.

Page Forms provides a couple of ways to do this: directly from an outside source or using the methods provided by the External Data extension.

Since I was going to have to write some sort of PHP shim, anyway, I decided to go with the first method.

Writing the PHP script to provide possible completions when it was given a string was the easy part. As a proof of concept, I took a the list of words in /usr/share/dict/words on my laptop, trimed it to 1/10th its size using

sed -n ‘0~20p' /usr/share/dict/words > short.txt

and used a simple PHP script (hosted on to provide the data.

That script is the result of a bit of a struggle. Despite the fact that the documentation pointed to a working example (after I updated it, natch), that wasn’t clear enough for me. I had to spend a few hours poking through the source and instrumenting the code to find the answer.

And that is the reason for this weblog post. I posted the same thing earlier today to the Semantic Mediawiki Users mailing list after an earlier plea for help. What resulted is the following stream-of-conciousness short story:

I must be doing something wrong because I keep seeing this error in the
js console (in addition to not seeing any results):

    TypeError: text is undefined 1 ext.sf.select2.base.js:251:4
        removeDiacritics https://example.dom/w/extensions/SemanticForms/libs/ext.sf.select2.base.js:251:4
        textHighlight https://example.dom/w/extensions/SemanticForms/libs/ext.sf.select2.base.js:258:23
        formatResult https://example.dom/w/extensions/SemanticForms/libs/ext.sf.select2.base.js:100:15
        populate https://example.dom/w/extensions/SemanticForms/libs/select2.js:920:39
        populateResults https://example.dom/w/extensions/SemanticForms/libs/select2.js:942:21
        updateResults/<.callback< https://example.dom/w/extensions/SemanticForms/libs/select2.js:1732:17
        bind/< https://example.dom/w/extensions/SemanticForms/libs/select2.js:672:17
        success https://example.dom/w/extensions/SemanticForms/libs/select2.js:460:25
        fire https://example.dom/w/load.php:3148:10
        fireWith https://example.dom/w/load.php:3260:7
        done https://example.dom/w/load.php:9314:5
        callback https://example.dom/w/load.php:9718:8

The URL http://example.dom/custom-autocomplete.php?l=lines&f=words
shows all the lines from the source (in this case, every 10th line from
/usr/share/dict/words) that matches “lines”. This example results in:


In my php script, I blatted the value over the keys “id”, “value”, “label” and “text”
because I saw each of them being use, but not why.

Anyway, PF is configured to read this correctly, so I can see that when
the user types “lines” an XHR request is made for
and it returns

    {"warnings": {
        "main": {
              "*": "Unrecognized parameter: '_'"
     "sfautocomplete": [
          "id": "borderlines",
          "value": "borderlines",
          "label": "borderlines",
          "text": "borderlines"
        }, ....

So far, so good.

I’m instrumenting the code for select2.js (console.log() is your friend!) and I can see that by the time we get to its populate() method we have a list of objects that look like this:

Object { id: 0, value: "borderlines", label: "borderlines", text: undefined }

Ok, I can see it substituting its own id so I’ll take that out of my

There is no difference. (Well, the ordering is different — id now comes
at the end — but that is to be expected.)

Now, what happens if I take out text?

Same thing. Ordering is different, but still shows up as undefined.

Output from my custom autocompleter now looks like this:


and the SMWApi is now giving

    {"warnings": {
        "main": {
              "*": "Unrecognized parameter: '_'"
     "sfautocomplete": [
          "value": "borderlines",
          "label": "borderlines"
        }, ....

Still the same problem. Let me try Hermann’s suggestion and make my
output look like:


Still, no results. The resulting object does look like this, though:

Object { id: 0, borderline: "borderlines", label: "borderlines", text: undefined }

Looking at my instrumented code and the traceback, I have found that the
transformation takes place in the call


at the success callback around line 460 in select2.js. This leads us back to ajaxOpts.results() at line 251 in ext.sf.select2.tokens.js (since this is the token input method I’m looking at) and, yep, it looks like I should be putting something in the title attribute.

And, yay!, after changing the output of my custom autocomplete script to:

                 “value”: ”borderlines”},

the autocompletes start working. In fact, putting


is enough.

If you made it this far, you’ll know that I should have just copied the example I found when I updated the page on MW.o, but then I wouldn’t have understood this as well as I do now. Instead, I used what I learned to provide an example in the documentation that even I wouldn’t miss.

(Image is public domain from the Henry Altemus edition of John Bunyan’s Pilgrim’s Progress, Philadelphia, PA, 1890. Illustrations by Frederick Barnard, J.D. Linton, W. Small, etc. Engraved by Dalziel Brothers. Elarged detail from Wikimedia Commons uploaded by by User:Dr_Jorgen.)


How I chased down a PHP mode indention bug in Emacs

(When I posted this to reddit, someone pointed out that I could have gotten the same information from c-show-syntactic-information by hitting C-c C-s. Awesome!)

I was getting irritated by emacs refusing to indent PHP code properly when I was working on a MediaWiki extension.  I’ve run into this before, but, today, in order to procrastinate a little, I ran into it again with the following code:

try {
    $object->method( ARG );

It kept trying to put $object the same level as try, so it would end up with:

try {
$object->method( ARG );

So I chased it down. I used C-h k to find the function being called. After a bit of edebug, I found that the code in the function c-indent-line being called was essentially:

(c-get-syntactic-indentation (c-guess-basic-syntax))

In fact, doing M-; (c-get-syntactic-indentation (c-guess-basic-syntax)) RET when point was sitting on $ gave the result 4 when it tried to indent and 0 when it didn’t.

(Ok, the code had two more levels of indention than the above, so it was giving 12 and 8, but let’s not get carried away with details.)

Now, running M-x php-mode RET (i.e. none of the added configuration in .dir-locals.el) gave the proper indention. In my .dir-locals.el, though, I had set up a c-offsets-alist that mostly worked with bits and pieces copied from all over.

Running just M-; (c-guess-basic-syntax) RET returned ((statement-block-intro 1379)) so I figured I needed to add (statement-block-intro . +) to my c-offsets-list.

I added that, and it worked. And now I know how to chase down indention bugs.

(Header image by Ivo Kruusamägi [CC BY-SA 4.0 (], via Wikimedia Commons.)

Late night hacking and Italian Hospitality

Today is the second day at Wikimania in Esino Lario, a small town several kilometers up a narrow road full of switchbacks in northern Italy.

During the first day, I met some Kiwix developers.  CScott, especially, was talking about a use case that Kiwix doesn’t address well right now — offline editing.  We talked a bit about that, the International Space Station, WikiEM and I mentioned the mediawiki remote for git.

I had never actually tried it out, though.  I found several bugs and got caught up in fixing those rather than the other work I had planned.  More about this later in a separate blog post.

I worked on it too late.

When I finally decided to go to bed at 1:00 in the morning, I couldn’t find the house where I all my luggage was and where I was supposed to sleep.

I walked up and down the roads around the hills here, but finally had to give up at 2:00.  I walked back to the polizia station and asked for help.  I was hoping they had shared a list of the places that the registered attendees.

They had not shared the list.

It was 2:00 in the morning and a man I later found out was the vice mayor of Esino Lario was driving me around while I talked to his niece on his cell phone — the only person he could find that could find at that time that could translate.

Because of the absurd comedy involved — a stupid American didn’t know where he was supposed to sleep at 2:00 in the morning — his niece, Amelia, and his 80 year old mother quickly made up a bed for me and gave me her apartment while she went to stay with her uncle.

As I told them, it was a great privilege to experience Italian hospitality.  I could tell that Amelia’s grandmother (who reminded me of my own grandmother) was really pleased by this — even through translation.  She jokingly told me I should let everyone know about her bed and breakfast.

Improving watchlists for corporate MediaWiki use

I’ve learned, from listening to corporate users of MediaWiki, that watchlists are very important for maintaining the quality of the wiki.

The guys running the EVA wiki at NASA, for example, have done a lot of work on Watchlist Analytics extension to ensure that the articles on their wiki have watchers spread out over the entire wiki.

Installing this extension for a client increased their awareness of the usefulness of watchlists and, since they had been using WhoIsWatching in the past, they asked me to fix a problem they had encountered.

The client takes a little more proactive approach to managing their wiki. It might not seem like “the wiki way” to people who have only used MediaWiki on Wikipedia, but they wanted to use the ability of WhoIsWatching to put pages on editor’s watchlists.

In the past, when they used the page, it showed a list of users and allows those with permission to add the page to anyone’s watchlist. It limited the list to those people who had provided an email address, which made that more manageable.

Since then, I’ve implemented Single Sign On for them and auto-populated their email address from Active Directory. As a result, the number of users with an email address has jumped from a handful to over 10,000.

So, now WhoIsWatching was trying to load thousands of rows and display them all at once on a single page.

It was trying, but the requests were timing out and the page was un-usable.

The extension had other problems. It practiced security-through-obscurity. While you could disable the ability to add pages to other people’s watchlists, the only thing to keep the anyone from adding pages was the fact that its administrative page (or “Special Page” in MediaWiki parlance) was not on the list of special pages. If you knew about the page, you could visit it and add an article you were hard at work on to everyone’s watchlists, thus spamming them with notifications from the wiki of all your changes.

That, and if you visited the special page without providing any arguments, you’d get a cryptic “usage” message.

To address the first problem, I decided to put an auto-complete form on the page so that a user could start typing a username and then MediaWiki would provide a list of matching usernames. I wondered how I would do this until I noticed that the Special:UserRights page was now providing this sort of auto-completion. Adding that functionality was as easy as providing an text field with the class mw-autocomplete-user.

I addressed the security issue by adding a couple of rights that could be given to users through the user interface (instead of by updating the value of global variables in a php file).

Finally, the frosting on the cake was to make WhoIsWatching’s special page useful if you visited it all by itself.

I already knew that the search bar provided auto-completion for article names and, based on what I discovered with mw-autocomplete-user, I thought I might be able to do something similar with page completion.

I was right. Thanks to a bug in a minor skin discovered back in 2012, you can add the class mw-search-input to a text field and works.

I haven’t been aware of all the great auto-completion work that MediaWiki developers like krinkle and MetaMax have been doing, but I’m pleased with what I see. And the improvments that they implemented made adding the features I needed to WhoIsWatching about a thousand percent easier.

Oh, and I did miscellaneous code cleanup and i18n wrangling (with Siebrand’s guidance, naturally). Now many changes sit ready for review.

There are still things I’d like to fix, but those will have to wait.

Image credit: Livrustkammaren (The Royal Armoury) / Erik Lernestål / CC BY-SA [CC BY-SA 3.0 or Public domain], via Wikimedia Commons

MediaWiki as a community resource

As is only to be expected, Brion asked:

What sort of outcomes are you looking for in such a meeting? Are you looking to meet with engineers about technical issues, or managers to ask about formally committing WMF resources?

I copy-pasted Chris Koerner’s response:

  • People use MediaWiki!
  • How can we bring them into the fold?
  • What is the WMF stance on MediaWiki? Is it part of the mission or a by-product of it?
  • Roadmap roadmap roadmap

But I couldn’t let it stop there, so I went into rant mode.

Continue reading MediaWiki as a community resource

Emacs for MediaWiki

Tyler Romeo wrote:

If I had the time, I would definitely put together some sort of .dir-locals.el for MediaWiki, that way we could make better use of Emacs, since it has a bunch of IDE-like functionality, even if it’s not IDEA-level powerful.

A client wanted to me to help train someone to take over the work for maintaining their MediaWiki installation. As part of that work, they asked for an IDE and, knowing that other MW devs used PHPStorm, I recommended it and they bought a copy for me and the person I was to train.

PHPStorm has “emacs keybindings” but these are just replacements for the CUA keybindings. Somethings that I expected the keybindings to invoke, didn’t. (It’s been a while since I’ve used PHPStorm, so I’ve forgotten the details.)

In any case, I’ve found that a lot of what I wanted from PHPStorm could be implemented in Emacs using the following .dir-locals.el (which I put above my core and extensions checkouts):

((nil . ((flycheck-phpcs-standard .
     (flycheck-phpmd-rulesets .
     (mode . flycheck)
     (magit-gerrit-ssh-creds . ""))))

The above is in addition to the code-sniffing I already had set up to put Emacs’ php-mode into the MW style.

The one thing that PHPStorm lacked (and where Emacs’ magit excels) is dealing with git submodules. Since I make extensive use of submodules for my MediaWiki work, this set up makes Emacs a much better tool for working with MediaWiki.

Naturally, I won’t claim that what works for me will work for anyone else. I’ve spent 15 years in Emacs every day. I was first exposed to Emacs in the late 80s(!!) so the virus has had a long time to work its way into my psyche and, by now, I’m incurable.

Oh, I see: you are fighting the services war.

This comment in MediaWiki’s task tracker really upset me. Instead of polluting the tracker with my rant, I decided to use my blog for its original intent: a nice rant.

Are insults like this really good to put in phabricator? I’m not fighting a war. If I were, I would be greatly outnumbered.

There are legitimate reasons for the Wikimedia’s engineering and operations teams to move bits of functionality out of the PHP core.

However, MediaWiki is free software and it has other users with other needs besides the Foundation.

For what its worth, the majority of my clients use services outside of PHP. For instance, I’ve set up and packaged Parsoid into RPMs for them and I’m currently working on creating RPMs for PediaPress’s PDF renderer (before I started, I looked at Offline content generator but wasn’t happy with where it was).

But, as a piece of free software, the ability for an individual to deploy mediawiki by themselves without the infrastructure provided by an IT department is vital for its adoption in organisations outside the WMF.

The WMF vision statement reads: “Imagine a world in which every single human being can freely share in the sum of all knowledge. That’s our commitment.”

Making MW free software that more people can can share their bit of the “sum of all knowledge” without relying on the WMF or Wikipedia’s editors to be the gatekeepers.

What is happening with MediaWiki?

There has been some recent active discussion in the MediaWiki (MW) (see “The end of shared hosting” on phabricator) and SemanticMediaWiki (SMW) communities (see this discussion on github) about Service Oriented Architecture (SOA) and the future of MediaWiki.

Part of that discussion revolves around shared hosting which is where many people have deployed their wikis.  I posted some of my thoughts on this for other people in the SMW community, but I don’t want to deprive anyone who loves to read my writing, so I’m copying it here.

I’m not sure I’m the best person to talk about the SOA approach that MW is taking. One thing is clear, though: A SOA approach does not work in shared hosting.

The best place to have this discussion is with Rob Lamphier and the MW architecture committee at the developer summit in January. I encourage anyone interested in this to find their way to San Francisco for that. That said, I do have some thoughts…

While the simpler, nostalgic past has meant that installing a wiki just required access to a single service (typically MySQL) the growing dependence on auxiliary services has made this “download and go” approach harder.

There are several objections to the SOA approach. Here are a few, but please be sure to add to the list:

  • Cost — shared hosting is relatively cheap.
  • Training — shared hosting means someone else is managing the server and you don’t need to have or maintain that skill.
  • Effort — shared hosting allows you to focus on the site, not the running a server.
  • Complexity — related to the previous two, shared hosting means you only need to be concerned with managing one aspect of your site.

These points are addressable, though:

  • Cost should be a relative non-issue. Amazon’s EC2 can be had for free or as little as $10 month. Linode has similar service, a little friendlier UI and other hosters like M5 and Rackspace are providing even cheaper alternatives.
  • I’ve been working on Ansible scripts to set up a server from scratch and James Montalvo and Daren Welsh have been working on Meza to help set up a MW server.
  • During a meeting with the Wikimedia’s Executive Director and leaders of the engineering department that Markus Glaser and I had last year, it was clear they were looking for someone to take a leadership role in developing new forms of distribution for MediaWiki.
  • Software like the Ansible or Meza, combined with new forms of distribution should address the problems of Training, Effort and Complexity.

SOA architecture was first pursued in core MediaWiki with Parsoid‘s node.js implementation. PHP 7 is removing the primary argument that Gabriel Wicke used for writing Parsoid on node.js — speed. Parsoid has a ton of tests and it would make sense to use those tests to rewrite Parsoid in PHP to make deployment easier.There was a really good presentation at Wikimania by Ed Sanders about adapting VisualEditor (VE) for other uses. He pointed to a couple of bits of example code. I haven’t had the chance to look at them yet, but these were apparently clear examples of how to use VE for things in MW besides editing the complete page.I have already been agitating for fewer services. The counter argument is that MW has grown into a monolithic piece of software and using services allows developers to isolate their work and control their interfaces better.That is a good thing, but there is no reason this sort of isolation and control couldn’t be accomplished using the same platform and easy deployment that PHP has offered to many users in the past.

So, what does this all mean for SMW?

I think it is obvious that the status quo isn’t going to work. For one thing, there is already poor communication between the MW and SMW developer communities. As the survey we recently completed clearly shows, many users of MW do not see SMW as a separate or even extra piece of software. I think it is a given that many end users of wiki sites see the functionality that SMW provides as just a normal part of their wiki.

That means we need to get developers who are familiar with SMW to interact with the developers at the Wikimedia Foundation. The developer’s summit would be a good place to start.

Working with the architecture committee — helping them understand the needs of users and developers outside of Wikimedia projects — would help them use their role as leaders of the MW developer community in ways that would help steer MW development so that projects like SMW could continue to depend on the MW platform.

Instead of pining for the past, we need to shape the future by making sure our voices are heard.

MediaWiki Hackathon 2015

I am back from the MediaWiki hackathonRichard Heigl leads a discussion after the #MWStake meeting about actually implmenting our ideas this past weekend.

This is the first time we had some really good participation from non-WMF parties.

A couple of active developers from MITRE, a government-focused NGO, were there. I was also able to get the WMF to pay for a couple of engineers from NASA to go. The organiser of SMWCon Spring 2015 (Chris Koerner) was also there because I encouraged him to apply to get his attendance paid for by the WMF’s scholarship program.

I had planned to spend the hackathon finishing up the HitCounters extension so that we can it would be ready with MediaWiki 1.25 was released. Unfortunately, the conversations with the non-WMF MediaWiki users ended up being too productive. As a result MediaWiki 1.25 was released on Monday without the page view counter functionality. I should have this extension finished by the end of this week.

As an added bonus, I introduced Darren Welsh, one of the engineers from NASA, to the VP of Engineering at the WMF. Our friends at NASA have been doing some really great things to improve the usability and usefulness of a user’s watchlist.  I hope that some of their work shows up on Wikipedia because of this introduction.

Overall, it was a wonderful way for those of us who use MW outside of the Foundation to coordinate our work. I hope to see a lot of good things coming from these sort of meetings in the future.

2014 Summer of Code

Google Summer of Code has ended and, with it, my first chance to mentor a student with Markus Glaser in the process of implementing a new service for MediaWiki users.

At the beginning of the summer, Markus and I worked with Quim Gil to outline the project and find a student to work on it.

Aditya Chaturvedi, a student from the Indian Institute of Technology (“India’s MIT”) saw the project, applied for our mentorship, and, soon after, we began working with him.

We all worked to outline a goal of creating a rating system on WikiApiary with the intention of using a bot to copy the ratings over to

I’m very happy to say that Adiyta’s work can now be seen on WikiApiary. We don’t have the ratings showing up on MediaWiki yet (more on that in a bit) but since that wasn’t a part of the deliverables listed as a success factor for this project, this GSOC project is a success.

As a result of his hard work, the ball is now in our court — Markus and I have to evangelize his ratings and, hopefully, get them displayed on

Unlike some other projects, this project’s intent is to help provide feedback for MediaWiki extensions instead of create a change in how MediaWiki itself behaves. To do this, Aditya and I worked with Jamie Thinglestaad to create a way for users to rate the extensions that they used.

We worked with Jamie for a few reasons. First, Jamie has already created an infrastructure on WikiApiary for surveying MediaWiki sites. He is actively maintaining it and improving the site. Pairing user ratings with current his current usage statistics makes a lot of sense.

Another reason we worked with Jamie instead of trying to deploy any code on a Wikimedia site is that the process of deploying code on WikiApiary only requires Jamie’s approval.

The wisdom of this decision really became apparent at the end when Adiyta requested help getting his ratings to show up using the MediaWiki Extension template.

Thank you, Aditya. It was a pleasure working with you. Your hard work this summer will help to invigorate the ecosystem for MediaWiki extensions.  Good luck on your future endevors.  I hope we can work together again on MediaWiki.