I wrote a new MediaWiki extension this past weekend, LiveI18n! It’s a parser function (also available as a built-in Lua function) that’s an easy-to-use way to provide internationalization (i18n) to your wiki’s UI & navigational elements, such as infobox labels. There were a couple ways you could already do this, but both were kinda bad, for reasons I’ll explain below.
Usage examples
Parser function
With your user language set to pt-br
and $wgLiveI18nDefaultLanguageCode
set to es
:
This should be falling back to pt: {{#live_i18n:en=hello|es=not hello|pt=this is a fallback}}<br>
This should be displaying in the default, pt-br: {{#live_i18n:en=hello|es=not hello|pt=this is a fallback|pt-br=this is the real thing}}<br>
This should say hola because nothing good is available: {{#live_i18n:en=hello|es=hola}}<br>
Nothing should display here: {{#live_i18n:nothing is here!}}<br>
Or here: {{#live_i18n:de=nothing is here!}}
will give the following outputs:
This should be falling back to pt: this is a fallback
This should be displaying in the default, pt-br: this is the real thing
This should say hola because nothing good is available: hola
Nothing should display here:
Or here:
Lua
Again, with your user language set to pt-br
and $wgLiveI18nDefaultLanguageCode
set to es
:
|
|
will give the same outputs, namely:
This should be falling back to pt: this is a fallback
This should be displaying in the default, pt-br: this is the real thing
This should say hola because nothing good is available: hola
Nothing should display here:
Or here:
Why is there a config variable?
Why do I provide $wgLiveI18nDefaultLanguageCode
at all, instead of always falling back to $wgLanguageCode
?
- You could choose to use
qqx
for the former, and give your i18n labels names when they’re untranslated, e.g.cat-color-label
. This would be recommended only if you have a private wiki which you know is used by speakers of only 2 or 3 languages, and you know everything will be fully translated. - I can’t really think of any other reasonable use case, but I’m generally in favor of extra customization options rather than locking people into could-be-inconvenient defaults.
Alternatives (and why this is better)
MyVariables
You can use MyVariables’s variable {{USERLANGUAGECODE}}
to get the current user’s preferred language code. Then, you can use ParserFunction’s switch
parser function to pick from the available languages and display the correct output to the user. Indeed, this is the approach that I’m currently using on Leaguepedia as of writing this article.
There are some disadvantages to this approach:
- The syntax is a bit unwieldy: instead of using a purpose-built parser function, you’re switching on a variable.
- There is no Lua support, so if you’re in Lua, you’re using either
frame:callParserFunction()
orframe:preprocess()
, both of which are not great. - You can’t take advantage of fallback languages (e.g.
pt
andpt-br
should fallback to each other preferentially overen
) unless you write your own support for them.
int magic word
You can also use the int
magic word. This approach would have you create one system message per i18n message per language; for example, if the “Cats” infobox has a label called “fur color,” you might create the articles MediaWiki:customi18n-infobox-cats-fur-color
, MediaWiki:customi18n-infobox-cats-fur-color/de
, etc. The customi18n
prefix is there to ensure you don’t have any namespacing conflicts with any other system messages, and then you’d further categorize within the name of the template, and then the message name.
Because only sysops can edit the MediaWiki namespace, and also because editing this many separate pages is a bit annoying, you might want to create an infrastructure that can read from Lua modules ending in /i18n
and then update the corresponding system messages via a crontab script running on a per-minute basis, from an account with the correct permissions. Because you’re namespacing to messages prefixed by customi18n
, this shouldn’t be any more “dangerous” than using the MyVariables
approach in the first place, even though you’re effectively allowing anyone to edit a restricted namespace.
Of course there are also disadvantages to this approach:
- Anything that requires a crontab is a huge amount of infrastructure to set up unless you already have that in place (and lacking that setup, it’s a giant pain to edit the individual pages).
- Crontabs can crash.
- The up-to-a-minute delay can be really annoying when trying to see changes; also people may not be aware that the delay exists, and think that their edits aren’t working.
I considered using this approach for my i18n; the first bullet isn’t an issue for me, but the third certainly is, and so I decided to go for the MyVariables
approach.
Other?
I’m not aware of any other pre-existing way to do i18n on a single wiki (short of the Translate extension, which is overkill if all you want to translate is navigational/interface elements, and not your full content). But maybe there is another way to do it. In any case, I’m pretty sure having a dedicated parser function with Lua support is ideal, and that’s why I wrote LiveI18n!
Development
It took me probably 7-8 hours in total to write this extension. All I knew going in was that I was going to use $parser->getOptions()->getUserLang()
to get the user language code. This is not an exaggeration by any means; this is literally the only thing I knew about what had to be done, but I figured the rest of it would be “easy enough to figure out.” And I guess I was more or less right, considering that the extension seems to work.
(Incidentally, this is why I knew even that much - I had assumed that MyVariables and the int
magic word used the same way of getting the user language when I first implemented my i18n. Little did I know, MyVariables actually used a different approach and disabled parser cache every time you called it! Uh oh. That was a fun surprise the first time I turned that module on. So I read a bunch of source code, figured out what was going on, and then we patched MyVariables & sent the patch upstream as well.)
My general approach was to look at extensions that I knew were doing similar things to what I needed to do and then copy the general approach of what they were doing. For example, I started by copying my own first extension, CustomLogs’s extension.json
file (at least the parts of it I knew I’d need, like, uh, name
and author
and having a config variable).
Extensions that I used (this is probably not a complete list):
- Arrays (it has parser functions!)
- ParserFunctions (it also has parser functions!)
- Cargo (it has a Lua function!)
- Scribunto (it has…lots of Lua functions)
- I18nTags (I went on a super long tangent trying to set up CLDR for my language fallbacks before, umm, reading the docs and realizing that I wanted a built-in LanguageFallback…but this was only after I literally could not get CLDR to work, so…good thing I could not get CLDR to work I guess. Incidentally, this is why this extension requires 1.35+)
- ExtJSBase (I have no idea what this is, but I searched Github in the mediawiki org (I did a lot of that) for uses of
LanguageFallback
to figure out how I was supposed to get one of these things, and that was what had an example!)
I also did a lot of print-debugging by returning things that were not my actual output but rather some construction based on intermediate objects when my code wasn’t working. Now that it’s done, I kinda sorta maybe know what I’m doing. A bit. Haha. In any case, it was a super fun learning experience, and I think this extension is legitimately super useful, so definitely some of the most productive 7-8 hours I’ve spent this year.
Next, I want to write an extension to generate documentation pages for JSON contentmodel pages….