Improving watchlists for corporate MediaWiki use

I’ve learned, from listening to corporate users of MediaWiki, that watchlists are very important for maintaining the quality of the wiki.

The guys running the EVA wiki at NASA, for example, have done a lot of work on Watchlist Analytics extension to ensure that the articles on their wiki have watchers spread out over the entire wiki.

Installing this extension for a client increased their awareness of the usefulness of watchlists and, since they had been using WhoIsWatching in the past, they asked me to fix a problem they had encountered.

The client takes a little more proactive approach to managing their wiki. It might not seem like “the wiki way” to people who have only used MediaWiki on Wikipedia, but they wanted to use the ability of WhoIsWatching to put pages on editor’s watchlists.

In the past, when they used the page, it showed a list of users and allows those with permission to add the page to anyone’s watchlist. It limited the list to those people who had provided an email address, which made that more manageable.

Since then, I’ve implemented Single Sign On for them and auto-populated their email address from Active Directory. As a result, the number of users with an email address has jumped from a handful to over 10,000.

So, now WhoIsWatching was trying to load thousands of rows and display them all at once on a single page.

It was trying, but the requests were timing out and the page was un-usable.

The extension had other problems. It practiced security-through-obscurity. While you could disable the ability to add pages to other people’s watchlists, the only thing to keep the anyone from adding pages was the fact that its administrative page (or “Special Page” in MediaWiki parlance) was not on the list of special pages. If you knew about the page, you could visit it and add an article you were hard at work on to everyone’s watchlists, thus spamming them with notifications from the wiki of all your changes.

That, and if you visited the special page without providing any arguments, you’d get a cryptic “usage” message.

To address the first problem, I decided to put an auto-complete form on the page so that a user could start typing a username and then MediaWiki would provide a list of matching usernames. I wondered how I would do this until I noticed that the Special:UserRights page was now providing this sort of auto-completion. Adding that functionality was as easy as providing an text field with the class mw-autocomplete-user.

I addressed the security issue by adding a couple of rights that could be given to users through the user interface (instead of by updating the value of global variables in a php file).

Finally, the frosting on the cake was to make WhoIsWatching’s special page useful if you visited it all by itself.

I already knew that the search bar provided auto-completion for article names and, based on what I discovered with mw-autocomplete-user, I thought I might be able to do something similar with page completion.

I was right. Thanks to a bug in a minor skin discovered back in 2012, you can add the class mw-search-input to a text field and works.

I haven’t been aware of all the great auto-completion work that MediaWiki developers like krinkle and MetaMax have been doing, but I’m pleased with what I see. And the improvments that they implemented made adding the features I needed to WhoIsWatching about a thousand percent easier.

Oh, and I did miscellaneous code cleanup and i18n wrangling (with Siebrand’s guidance, naturally). Now many changes sit ready for review.

There are still things I’d like to fix, but those will have to wait.

Image credit: Livrustkammaren (The Royal Armoury) / Erik Lernestål / CC BY-SA [CC BY-SA 3.0 or Public domain], via Wikimedia Commons

2014 Summer of Code

Google Summer of Code has ended and, with it, my first chance to mentor a student with Markus Glaser in the process of implementing a new service for MediaWiki users.

At the beginning of the summer, Markus and I worked with Quim Gil to outline the project and find a student to work on it.

Aditya Chaturvedi, a student from the Indian Institute of Technology (“India’s MIT”) saw the project, applied for our mentorship, and, soon after, we began working with him.

We all worked to outline a goal of creating a rating system on WikiApiary with the intention of using a bot to copy the ratings over to MediaWiki.org.

I’m very happy to say that Adiyta’s work can now be seen on WikiApiary. We don’t have the ratings showing up on MediaWiki yet (more on that in a bit) but since that wasn’t a part of the deliverables listed as a success factor for this project, this GSOC project is a success.

As a result of his hard work, the ball is now in our court — Markus and I have to evangelize his ratings and, hopefully, get them displayed on MediaWiki.org.

Unlike some other projects, this project’s intent is to help provide feedback for MediaWiki extensions instead of create a change in how MediaWiki itself behaves. To do this, Aditya and I worked with Jamie Thinglestaad to create a way for users to rate the extensions that they used.

We worked with Jamie for a few reasons. First, Jamie has already created an infrastructure on WikiApiary for surveying MediaWiki sites. He is actively maintaining it and improving the site. Pairing user ratings with current his current usage statistics makes a lot of sense.

Another reason we worked with Jamie instead of trying to deploy any code on a Wikimedia site is that the process of deploying code on WikiApiary only requires Jamie’s approval.

The wisdom of this decision really became apparent at the end when Adiyta requested help getting his ratings to show up using the MediaWiki Extension template.

Thank you, Aditya. It was a pleasure working with you. Your hard work this summer will help to invigorate the ecosystem for MediaWiki extensions.  Good luck on your future endevors.  I hope we can work together again on MediaWiki.

Why your javascript on Wikipedia will break

This week we did our first roll out of MediaWiki 1.19 on some of the smaller project sites. This staged roll out is a great way to find out how you are using the software in ways we didn’t expect and to give you a warning: “Beware! This thing you are doing is going to break!” Of course, I would prefer to avoid that wherever possible, but there are things I can’t control.

So now, I get to say “Beware!”:

Beware!

If are using document.write() in some javascript, whether in a gadget, in your common.js, vector.js, monobook.js or even global.js, you need to change it. In the cases that I saw, people had used a code fragment like the following:

function importAnyScript(lang,family,script) {
document.write('<script type="text/javascript" src="' + 'http://'
        + lang + '.'
        + family + '.org/w/index.php?title='
        + script + '&action=raw&ctype=text/javascript"></script>');

This has to be changed to something like the following:

function importAnyScript(lang,family,script) {
mw.loader.load('//' + lang + '.' + family
        + '.org/w/index.php?title='
        + script + '&action=raw&ctype=text/javascript');
}

Oops, I did it again!

Working on free software projects isn’t easy. Just because you’re giving away your work for anyone to use doesn’t mean that anyone is going to take it, no questions asked. Take my MediaWiki work as an example. I am being paid for the work, but it is freely licensed and I’m learning about the standards of quality that the community has formed around the code. Frankly, before becoming involved in such a serious PHP-based project, I didn’t have a very high opinion of PHP. Even Rasmus (creator of PHP) doesn’t seem to live in a pure php world and, as a result, thinks of systems where PHP is merely the web frontend instead of almost the entire system. So working with others who have been neck-deep in PHP for years, building one of the top-10 sites on the net entirely in PHP, and gaining intimate familiarity with the quirks of PHP, has been a wonderful experience. But MediaWiki isn’t the only free software project I’m involved in. I also contribute to Emacs occasionally. (For those not so familar with Emacs vs Vi, let’s just say this is like the social situation between Republicans and the Democrats or the Roman Catholics and Southern Baptists: You live next door to them, but you know they’re going to hell.) And it is my most recent commits to Emacs that have gained me noteriety. Yesterday, I was catching up on some blog reading (Planet Emacs, thankyouverymuch) and came across a nifty use of loccur.el. But it used defadvice instead of a hook (and hooks are better — no this is different than emacs vs vi, I swear). I looked at the code and thought, “Hey, I can make a tiny little contribution to Emacs here!” So I made a couple of small changes. Little did I know what a problem that was going to be. Óscar Fuentes used my commit message as an example of how not to write a commit message. This was not the first time I’ve been so honored. Three weeks ago, I made a mistake committing to the bzr repository for emacs and was again used as an example for the Emacs-devel community of how not to make a commit. There are two reasons I’m such a stellar example for the other Emacs developers. First, I’ve been using bzr for a couple of years while working on the iHRIS Suite. This experience (2 years more than most Emacs developers) naturally made me think I had things under control. So I didn’t bother reading over Bzr for Emacs Devs. Second, Emacs recently switched its source-control system (after much debate and some effort on speed the bzr side) from the ancient, worn, CVS to bzr. So people are still adapting their work flow. I just happened to make some commits that were particularly egregious and ended up being great examples of what people should avoid. So, yes, Free Software is a great thing, but that doesn’t mean the developers don’t take it seriously. And being reprimanded in public isn’t the most pleasent experience. But at least I can blog about it!

Notes at the Halfway Point

(This is copied from the first part of last week’s weekly report. I’ve copied it here since more people may be interested in what it feels like to work in a highly visible open source project.)

I’m halfway through a 3 month contract (a relationship with the WikiMedia Foundation that I hope to continue) and it seems like a good place to write up some of my lessons learned. Too many times I haven’t spent enough time going over my code. Or, when I’m refactoring someone else’s code, I don’t look over it enough. Sometimes this is due to my own inexperience with the MW code base: If I had more experience, I would have a better idea from just looking at the code that dieUsage() was being used incorrectly. Still, experience can be created, and when it is experience you lack, the effort to create experience must be made. This is where testing comes in. More than once while refactoring the UploadChunks api, Tim has just looked at the code and told me I wasn’t testing it properly. The same could be said for more trivial things like whitespace. Mixing whitespace commits with more substantive changes as well as just the way Emacs formats the code by default have irritated other developers. From this experience, I think I need to borrow an idea from Atul_Gawande’s Checklist Manifesto and make — gasp — a checklist. The checklist is the first step, but the second would be automating as much of the checklist as possible and creating a pre-commit hook that I could use to check my own code. From my experience, here are some ideas for a checklist:

  • Whitespace use.
    • Tabs, not spaces
    • avoiding spaces for indention and lining up code
    • spaces around parens
    • etc.
  • Code correctness.
    • Ensure each exit point has a test. (Since I prefer to write unit tests for my code, noting a test for each exit point in a code comment would take care of this. An automated check would just verify that the exit point has a notation.)
    • Make sure no E_STRICT warnings are thrown during tests

Note that any automation here is going to fail to really enforce anything: it is impossible to verify code-correctness completely without actually running the code. Instead the idea is not to verify that the code is bug-free, but just to make sure that I’ve caught the things that more experienced MW developers aren’t going to roll their eyes at. That said, the number of developers actively working on MW code as well as the CodeReview extension on mediawiki.org make developing MW code an overwhelmingly positive experience. These things mean that — combined with the depth of PHP and MediaWiki knowledge and care that people take — problems see the light of day really quickly. When your code is likely to end up on one of the top-10 sites on the Internet, this amount of care is crucial.