This page looks best with JavaScript enabled

mwparserfromhell 0.6

 ·  ☕ 4 min read

Ok I’m about four months late on this, but I’m not sure anyone else blogs about MediaWiki Python utility library releases so I’m still gonna tag this as a news post. In December last year, mwparserfromhell released version 0.6! There are two super exciting changes, and you should follow that link for the full changelog, since I’m only going to go over these two changes:

  • Underscores and spaces are now equivalent in the Wikicode.matches() (as in Template.name.matches()) method!
  • Template.get() now takes a fallback parameter (and also supports dict syntax for accessing params)!

(Code examples were found in my leaguepedia_archive repository or written for this post.)

Underscores and spaces thing

Previously, we would write:

1
2
3
4
        for template in wikitext.filter_templates():
            if template.name.matches('Infobox Team') or template.name.matches('Infobox_Team'):
                i = 1
                otherwikis = []

And now we can write:

1
2
3
4
        for template in wikitext.filter_templates():
            if template.name.matches('Infobox Team'):
                i = 1
                otherwikis = []

Yay!!!

This may just seem like a minor convenience, but this is a pretty huge improvement for a few reasons:

  1. The fact that this didn’t “just work” before was a source of “accidental complexity” - especially for beginners just starting to learn the ins and outs of this library (also I’m pretty sure I forgot this wasn’t already supported and messed up at least a few times, oops)
  2. If you forget to support underscores, the bugs that will arise are relatively nondeterministic in that it’s dependent on wiki users having “messed up” in a sense, and so hard to notice
  3. While Infobox Team and Infobox_Team are just two variations, what about Template:This template name has many different words? You get exponential growth, yikes

So this is actually something to be really excited about!!

template.get() thing

Here’s a direct link to the PR.

Fallback

I am SO EXCITED!! about this one!!!

Previously, we would write:

1
2
3
        if template.has(match_id + 'win'):
            if template.get(match_id + 'win').value.strip() != '':
                winner = True

And this can now be written as:

1
2
        if template.get(match_id + 'win', Parameter('', '')).value.strip() != '':
            winner = True

Woohoo!

Dict access

If we’re certain that a param exists, then we can also now just, access parameters as if the template is a dict - I’m a bit mixed about this syntax. It only saves a couple characters, and in my opinion removes clarity a bit.

Previously:

1
title = template.get('R' + r).value.strip()

And now:

1
title = template['R' + r].value.strip()

This is going to be slightly weird, though, because, remember, you get a Param object, not the value of the key:

1
2
3
4
5
6
page = site.client.pages['Sona'] # This is an mwcleric site object, not an mwclient site
text = page.text()
for template in mwparserfromhell.parse(text).filter_templates():
	if template.name.matches('Infobox Champion'):
		print(template['name'])
		print(type(template['name']))

If this were actually a dictionary with key-value pairs of Sona data (rather than a wiki template), we’d expect to get Sona, and <class 'str'>. But instead, what we get is:

name=Sona
<class 'mwparserfromhell.nodes.extras.parameter.Parameter'>

Of course, we knew that; that’s how template.get() has always worked. But when accessing via the dict syntax, this definitely could feel just a bit unexpected - so be careful! And maybe stick to the .get() method for clarity.

An argument in favor of the dict syntax

There’s an argument in favor of the dict syntax, though, which is to make it more obvious when we know that a parameter is expected to be in the template or not - just like when working with dicts.

  • template['name'] - we know the template has a name param (and we’ll get an error if it doesn’t)
  • template.get('name', None) - the template may not have a name param, and fallback to None

So, things to balance.

By the way, this code:

1
2
3
4
5
try:
    print(template.get("this template param doesn't exist"))
except ValueError as e:
    print("Not QUITE this dict-like")
print({}.get('kittens'))

will print:

Not QUITE this dict-like
None

We DO still need the fallback None that I wrote above, unlike when working with normal dicts. (And to be clear, this is NOT a criticism of the implementation; it would be a breaking change to have it any other way, as there could be a lot of code depending on try/catching ValueError if template.get() fails. A library as low-level as mwparserfromhell needs to be really, really, really stable, so breaking changes are to be avoided at all costs, especially for something that is, at the end of the day, really just syntactic sugar.)

Conclusion

I love love love love love love this library, and I’m so happy to see it continuing to be developed! mwparserfromhell is crazy impressively good at what it does, and just a joy to develop with, and these two patches are making it even more so!

I do recommend against using the dict-access syntax - I think it can be a convenient nice-to-have, but it varies just a bit too much in behavior from real dicts to make it a net positive. Stick to template.get('name', fallback).value and don’t forget to .strip() the result!

Share on

river
WRITTEN BY
River
River is a developer most at home in MediaWiki and known for building Leaguepedia. She likes cats.


What's on this Page