Ok I’m about four months late on this, but I’m not sure anyone else blogs about MediaWiki Python utility library releases so I’m still gonna tag this as a news post. In December last year, mwparserfromhell released version 0.6! There are two super exciting changes, and you should follow that link for the full changelog, since I’m only going to go over these two changes:
- Underscores and spaces are now equivalent in the
Wikicode.matches()
(as inTemplate.name.matches()
) method! Template.get()
now takes a fallback parameter (and also supportsdict
syntax for accessing params)!
(Code examples were found in my leaguepedia_archive repository or written for this post.)
Underscores and spaces thing
Previously, we would write:
|
|
And now we can write:
|
|
Yay!!!
This may just seem like a minor convenience, but this is a pretty huge improvement for a few reasons:
- The fact that this didn’t “just work” before was a source of “accidental complexity” - especially for beginners just starting to learn the ins and outs of this library (also I’m pretty sure I forgot this wasn’t already supported and messed up at least a few times, oops)
- If you forget to support underscores, the bugs that will arise are relatively nondeterministic in that it’s dependent on wiki users having “messed up” in a sense, and so hard to notice
- While
Infobox Team
andInfobox_Team
are just two variations, what aboutTemplate:This template name has many different words
? You get exponential growth, yikes
So this is actually something to be really excited about!!
template.get() thing
Here’s a direct link to the PR.
Fallback
I am SO EXCITED!! about this one!!!
Previously, we would write:
|
|
And this can now be written as:
|
|
Woohoo!
Dict access
If we’re certain that a param exists, then we can also now just, access parameters as if the template is a dict - I’m a bit mixed about this syntax. It only saves a couple characters, and in my opinion removes clarity a bit.
Previously:
|
|
And now:
|
|
This is going to be slightly weird, though, because, remember, you get a Param
object, not the value of the key:
|
|
If this were actually a dictionary with key-value pairs of Sona data (rather than a wiki template), we’d expect to get Sona
, and <class 'str'>
. But instead, what we get is:
name=Sona
<class 'mwparserfromhell.nodes.extras.parameter.Parameter'>
Of course, we knew that; that’s how template.get()
has always worked. But when accessing via the dict syntax, this definitely could feel just a bit unexpected - so be careful! And maybe stick to the .get()
method for clarity.
An argument in favor of the dict syntax
There’s an argument in favor of the dict syntax, though, which is to make it more obvious when we know that a parameter is expected to be in the template or not - just like when working with dicts.
template['name']
- we know the template has aname
param (and we’ll get an error if it doesn’t)template.get('name', None)
- the template may not have aname
param, and fallback toNone
So, things to balance.
By the way, this code:
|
|
will print:
Not QUITE this dict-like
None
We DO still need the fallback None
that I wrote above, unlike when working with normal dicts. (And to be clear, this is NOT a criticism of the implementation; it would be a breaking change to have it any other way, as there could be a lot of code depending on try/catching ValueError
if template.get()
fails. A library as low-level as mwparserfromhell
needs to be really, really, really stable, so breaking changes are to be avoided at all costs, especially for something that is, at the end of the day, really just syntactic sugar.)
Conclusion
I love love love love love love this library, and I’m so happy to see it continuing to be developed! mwparserfromhell
is crazy impressively good at what it does, and just a joy to develop with, and these two patches are making it even more so!
I do recommend against using the dict-access syntax - I think it can be a convenient nice-to-have, but it varies just a bit too much in behavior from real dicts to make it a net positive. Stick to template.get('name', fallback).value
and don’t forget to .strip()
the result!