Introduce some JavaScript i18n functions

Description

There's JavaScript code dotted around core that handles i18n in JavaScript (for example, localised thousands separators in the pending comment count). We should bring this together into a wpL10n JavaScript library that can be reused by plugins.

I've done some work on this and I'll get a patch up in the next day or so.

I think a good set of i18n javascript functions improves the javascript localisation, especially for plural sentences. So I have here a first "sketch" of those functions, primarily focussed on gettext for javascript. See it all at ​GitHub. It is a small plugin that will show some translated messages from core if you activate it.

How it works:

It uses the Po/Mo files that are parsed in PHP to generate a structure that the javascript functions will use.

When enqueueing a script that needs some of the js gettext functions, you have to call wp_js_gettext_enable with the textdomains you want.

Objectives:

All translations (for PHP and JS) in one Po/Mo file for each textdomain.

As much as possible consistency between PHP and JS.

Possible drawbacks:

All the translated strings are always loaded: for the current core translations (textdomain 'default') is that about a 600KB script.

There has to be a JS function like sprintf to parse %s, %d, %1$s, etc. Possible starting point: ​sprintf for js

It uses the standard plural expression from gettext (for example n!=1), but this code is executed in js and maybe a security issue. the plural expression is currently filtered by a regular expression that removes all charters that shouldn't be in a plural expression, but it still is a bit tricky.

This are my thoughts about it: I appreciate it when you share yours. If you have any solutions for the drawbacks or just have some fancy idea: share it and/or create a pull request!

We've been discussing this today with @ocean90 and @omarreiss, @atimmer and @jipmoors from over at Yoast and after thinking intensively about it, we reached the following conclusion:

Developers should be able to continue using wp_localize_script().

We should not duplicate efforts. What wp_i18n_add_default_js_translations() does in 20491.diff​ can be automated.

We agreed that it makes most sens to use Jed's gettext functions and parse translatable strings inside the JavaScript files, like we already do with PHP. We could then have .json files containing these strings and pass these over to the JS.

Since this needs a lot of work on the WordPress.org side of things, we have to tackle this for 4.7

Jed hasn't been updated in a very long time. We need to ensure it stays alive. @omarreiss will ask its developer to create a dedicated Jed team on GitHub where we all could contribute to (instead of us/someone forking it).

Related:

​i18n-calypso - Calypso's wrapper around Jed, similar to what we're aiming for.

Jed hasn't been updated in a very long time. We need to ensure it stays alive. @omarreiss will ask its developer to create a dedicated Jed team on GitHub where we all could contribute to (instead of us/someone forking it).

Looks like it's a jQuery Foundation project now and relicensed as MIT, both of which bode well for its survival. It also now explicitly states that he believes the library to be feature complete, so I wouldn't necessarily expect constant updates.

Not sure where efforts to reach out ended up - if more is needed, happy to have a chat with somebody at jQuery/Alex/even Jed himself (funnily enough, a local friend). :)

@helen I've sent Alex Sexton an email asking for some context on the project and if he would be willing to add a few maintainers from the WP community and got a very nice answer:

Jed is a part of the jQuery foundation after the Dojo/jQuery Foundation merge, so they'd have to sign CLAs and that type of thing, but I'd be happy to let people work on it, assuming I agreed on the general direction (I may be against, say, entirely changing the APIs and breaking backwards compat significantly, but wouldn't necessarily be picky about coding styles or internal architecture). Very happy to discuss.

I'd say that means we don't really have to worry about if Jed stays alive or not and we can even fix bugs in it if we need to.

Yeah the Jed part is the smallest issue. It being a jQuery Foundation project is surely helpful for us. There probably will be occasional pull requests, but Alex sounds very cooperative in that regard.

The biggest task (at least from my perspective) will be parsing translatable strings inside the JavaScript files, add them to GlotPress and have the WordPress.org API deliver JSON files containing the translations. I will chat with @ocean90 about this, as he's the meta, i18n and GlotPress guru.

Leverage wp.i18n almost anywhere except for some customizer parts as I had no time.

This will be helpful for testing the string extraction. I had a quick chat with @ocean90 today and he generously volunteered to extend makepot.php to make this work.

JS strings will be added to the resulting POT files as usual. After that, JS strings need to be exported as JSON files. GlotPress already supports exporting strings based on the source (using the filtersGET param for example), and I have a ​GlotPress plugin in the works for the JSON export. I'll further test this plugin and will work on adding it to GlotPress core once it's ready.

Some remaining tasks / questions:

In PHP there's a gettext filter for people to override translations. Do we need a way for developers to filter translations in JS?

Add wp.i18n.esc_attr_x and probably esc_html_* equivalents as well.

Figure out the best way for devs to load the JSON files when enqueuing scripts and adjust wp.i18n. addTranslations() accordingly

Make sure wp-i18n.js and jed.js get properly minified when building nothing to worry about

I think I can handle (most of) these core parts while @ocean90 works on makepot.php. Of course any help is appreciated.

The ​GlotPress plugin for JSON export I mentioned is in much better shape now, with unit tests and everything. It's like 95% done. There's a chance it might get into GlotPress 2.3 as well.

Besides that, I'm still tinkering with the patch to find the best way to load the translation files. We could load the JSON files in PHP and use wp_add_inline_script to initialize them or we could also fetch them via Ajax (think wp.i18n.load_textdomain()). The latter is interesting when we want to do locale switching in JS in the future. Maybe anyone has an idea here.

Besides that, I'm still tinkering with the patch to find the best way to load the translation files. We could load the JSON files in PHP and use wp_add_inline_script to initialize them or we could also fetch them via Ajax (think wp.i18n.load_textdomain()).

That sounds interesting but for v1 we should load them via wp_add_inline_script(). Loading strings asynchronously sounds complicated. :)

It looks like the JS minify task ignores the @preserve tag in the file header.

Is numberFormat() compatible with WordPress' license? There is a @license See CREDITS.md tag but the file doesn't exist.

I wonder if we should make wp.i18n.sprintf() public or if there should be a wrapper instead. Something like wp.i18n.translate( 'Foo %1$s, %2$s', arg1, arg2 ) as an alternative for wp.i18n.sprintf( wp.i18n.__( 'Foo %1$s, %2$s', 'domain' ), arg1, arg2 )

It looks like the JS minify task ignores the @preserve tag in the file header.

That line's exactly like that in the original source. How does core preserve file headers in other JS files?

Is numberFormat() compatible with WordPress' license? There is a @license See CREDITS.md tag but the file doesn't exist.

Good question. I think in the last few patches I used different code than in the original patch, so that might not even be up to date anymore.

I wonder if we should make wp.i18n.sprintf() public or if there should be a wrapper instead. Something like wp.i18n.translate( 'Foo %1$s, %2$s', arg1, arg2 ) as an alternative for wp.i18n.sprintf( wp.i18n.__( 'Foo %1$s, %2$s', 'domain' ), arg1, arg2 ).

That sounds impossible with the optional domain param before the args, no?

Also, string.sprintf( arg1, arg2 ) might be more JavaScript-like, but ideally there should be a wp.sprintf utility method.

I wonder if we should make wp.i18n.sprintf() public or if there should be a wrapper instead. Something like wp.i18n.translate( 'Foo %1$s, %2$s', arg1, arg2 ) as an alternative for wp.i18n.sprintf( wp.i18n.__( 'Foo %1$s, %2$s', 'domain' ), arg1, arg2 ).

That sounds impossible with the optional domain param before the args, no?

I wonder if we should make wp.i18n.sprintf() public or if there should be a wrapper instead. Something like wp.i18n.translate( 'Foo %1$s, %2$s', arg1, arg2 ) as an alternative for wp.i18n.sprintf( wp.i18n.__( 'Foo %1$s, %2$s', 'domain' ), arg1, arg2 ).

That sounds impossible with the optional domain param before the args, no?

You're right…

Apparently Jed's sprintf implementation is simply a copy of the one from ​https://github.com/alexei/sprintf.js. We should rather use that directly (in wp-util, in a new ticket). In the meantime I'll remove the wp.i18n.sprintf line from the patch

I don't know if this has been mentioned, but in regard to sprintf(): in JS that is already using Underscores anyway, it is possible to use Underscores templates to do this. I have done this before.

$text = sprintf( __( 'String with %s.' ), '{{placeholder}}' );

vartext=wp.template(l10nTemplateFromPHP,value);

(Just a POC from off the top of my head.)

Interesting! Thanks for sharing. Loading 16KB of JS (the size of underscore.min.js) for using sprintf() is not a viable solution for core though. That's why I've opened #38147 to suggest adding the lightweight sprintf.js library instead.

Introduce WP_Scripts:: load_translation_file() to pass that JSON to wp.i18n.loadLocaleData().

Add some more QUnit tests.

If get_js_i18n_data() and WP_Scripts:: load_translation_file() seem legit I'll happily continue working on it.

Nice.

I think the naming can be improved. Maybe WP_Scripts::add_json_localization() or WP_Scripts::localize_with_json() for WP_Scripts::load_translation_file()?
I'm not a fan of the loop in get_js_i18n_data(). WP_LANG_DIR should only be checked if $domain is 'default'. Can we require that a path needs to be set: get_js_i18n_data( $domain, $path )? Or a $context which can be [core|theme|plugin]?

I noticed that escape() still uses _.keys() which is an Underscore function.

Regarding sprintf(), this patch also removes wp.i18n.sprintf as Jed's implementation is really out of date. Jed even encourages removing it from its source code:

I think the naming can be improved. Maybe WP_Scripts::add_json_localization() or WP_Scripts::localize_with_json() for WP_Scripts::load_translation_file()?

I had a hard time thinking of a good name, so +1. I'll move forward with WP_Scripts::add_json_localization() for the time being, unless we come up with something better.

I'm not a fan of the loop in get_js_i18n_data(). WP_LANG_DIR should only be checked if $domain is 'default'. Can we require that a path needs to be set: get_js_i18n_data( $domain, $path )? Or a $context which can be [core|theme|plugin]?

$context sounds interesting. The alternative is to make it more like load_*_textdomain() with a different function per context.

Idea that came to mind right now: Assuming the MO files are already loaded, we could probably get the path to the MO file from $l10n[ $domain ] and derive the JSON file from there.

I noticed that escape() still uses _.keys() which is an Underscore function.

Good catch. I thought I had removed all underscore leftovers in the latest patch. Will fix in the next one.

Regarding sprintf(), this patch also removes wp.i18n.sprintf as Jed's implementation is really out of date. Jed even encourages removing it from its source code:

Besides that, I'm still tinkering with the patch to find the best way to load the translation files. We could load the JSON files in PHP and use wp_add_inline_script to initialize them or we could also fetch them via Ajax (think wp.i18n.load_textdomain()).

That sounds interesting but for v1 we should load them via wp_add_inline_script(). Loading strings asynchronously sounds complicated. :)

I have a site where I need to implement CSP headers. It's already quite some fiddling. Please help getting rid of inline js. See #32067. Thank you!

20491-applied.2.diff​ extends the efforts to most if not all parts of the customizer. Not yet addressed are underscore templates.

It probably makes most sense to pass strings used in them to the template instead of printing them directly. Reduces redundancy and makes code more clear. Otherwise templates will get huge, see wp_print_admin_notice_templates() / #38112 as an example.

I have a site where I need to implement CSP headers. It's already quite some fiddling. Please help getting rid of inline js. See #32067. Thank you!

This patch already reduces plenty of inline JavaScript. While 20491.7.diff​ currently uses inline scripts to pass the translations, the Ajax route can always be considered later on. See it as a first step in the right direction.

I'm not sure that the escaping functions really make sense in JS. They don't really have a practical use, as far as I can see. Unless you are building HTML from strings via concatenation, which is generally considered a bad practice, I think, due to the greater potential for security issues.

In other words, I'd never do this:

$el.html('<div title='+wp.i18n.esc_attr__('Hello World')+'"></div>');

Instead I'd do this:

$el.html($('<div></div>').attr('title',wp.i18n.__('Hello World')));

No need to use esc_attr__(). Of course, the other way is just as safe if the escaping function is used, but it encourages developers to follow a pattern that in other circumstances could be very bad. (Especially considering that there isn't a generic esc_html() or esc_attr().)

It is really great to see how this has been coming along. As i18n-calypso has been mentioned earlier as a comparable project, I’d like give a little insight into what we have done at Automattic with ​Calypso (​ongoing development in the open). What is now i18n-calypso, was an integral part of Calypso before it was refactored into that new project.

​i18n-calypso largely consists of two components. A CLI utility that extracts the strings, and a library that handles loading and output of translations (with some hooks to make things like the ​Community Translator possible).

I was wondering why you chose the current approach (i.e. several functions "ported" from PHP), and I assume it’s mostly because of this:

We agreed that it makes most sens to use Jed's gettext functions and parse translatable strings inside the JavaScript files, like we already do with PHP.

@ocean90 has proposed a solution that ​parses and tokenizes JavaScript in PHP in the end, so I am not sure if that is still a deciding factor? (Could you give some details on why you chose not, resp. were not able, to use xgettext?)

What to make of all of this? Deciding whether to revamp the approach and use a single translate() function as i18n-calypso does, depends on factors that you are likely more familiar with than I am. I think a common (compatible) format between WordPress (related) projects is favorable, though.

Either way, I see the opportunity to make use of the ​tools that exist in i18n-calypso for extracting the strings, and maybe even use i18n-calypso for handling the translation altogether--possibly there is a need to adapt it for WordPress Core (such as adding support for text domains and removing the dependency on React) but it’s something that I think is worth discussing.

There is a point in having makepot.php be able to do both PHP and JavaScript (as in the ​POC implementation) but I see problems regarding newer developments in JavaScript (you’ll want to be able to extract translatable strings from plugins’ JavaScript even if they use ES6 or newer), so I think it should be considered to extract JavaScript strings in a separate step, in JavaScript.

I'm not opposed to re-think the current approach. One benefit of the current implementation is to make things easier for developers, more maintainable, and to make string extraction easier. This includes extracting translator comments, which I don't think i18n-calypso supports at the moment.

Newer ECMAscript standards and JSX support are definitely good reasons to explore doing the extraction using other tools than the one used in the POC.

One benefit of the current implementation is to make things easier for developers, more maintainable, and to make string extraction easier.

I've been looking into Drupal recently and they expose the same t() and formatPlural()functions from PHP in JavaScript. See ​https://www.drupal.org/docs/8/api/translation-api/overview. Example: Drupal.t('May', {}, {context: "Long month name"}); (the second argument is for placeholders to replace).

This includes extracting translator comments, which I don't think i18n-calypso supports at the moment.

Regarding the translation function, I can see the benefits of both a JS implementation of __()et al, as well as having "one function to rule them all". The only difference is basically translator comments support, so we could theoretically implement both if /* translators: */ parsing is easily possible.

Forgive my joining late in this discussion. In anticipation of the upcoming dev chat about consolidating future JavaScript efforts, I wanted to review this ticket and share a few of my own notes and experiences.

In my being oblivious to the existence of this ticket, I'd separately put together ​an i18n solution for the Gutenberg project, but I'm happy to see it aligns almost identically with what's currently proposed here. Using Jed is an obvious choice; workflow, integration with GlotPress, and making strings available in the browser are admittedly harder problems.

Despite having contributed to Calypso's i18n solution, I still find that the naming consistency here between PHP and JS functions will make for a nicer transition for developers familiar with the former (I think this is the "one function to rule them all" argument for __() et al.?). So I don't personally see a reason to change APIs are they're proposed currently.

Reinforcing a few points made earlier:

I agree with @jdgrimes (comment:53) that making esc_attr__ functions available will only encourage poor security practices and I'd suggest they be excluded.

I agree with @akirk (comment:55) that supporting latest language features of JavaScript in string extraction will be easiest using JavaScript parsers like ​Babel's Babylon (or ​Acorn, or ​Espree, or ​Esprima), which tend to be well-maintained and follow closest to the bleeding edge.

Unlike xgettext-js which uses a separate CLI process to call to Babylon's API directly for extracting strings, in Gutenberg I'd opted instead to explore a ​custom Babel plugin which extracts strings during the same build step as the application itself. Nothing about this necessitates opting in to ES2015+ or JSX syntax, but it was a convenient option for us to guarantee parsing is consistent between string extraction and the build.

I'm curious about the decision to use a separate JSON-formatted export of strings, versus preparing an object for Jed on the fly. So far it's worked well enough for us to generate the object from the return value of ​get_translations_for_domain (​see source). That said, we've been fortunate to disregard handling multiple domains or distinguishing between strings sourced from PHP vs. JS, so there may well be other considerations we're overlooking.

In consideration of some of the more recent discussions about browser support, it might be worth discussing what role the ​browser native i18n APIs could have on what should be implemented here. For example, do we need wp.i18n.numberFormat if ​Intl.NumberFormat is already in scope?

I'm curious about the decision to use a separate JSON-formatted export of strings, versus preparing an object for Jed on the fly. So far it's worked well enough for us to generate the object from the return value of ​get_translations_for_domain (​see source). That said, we've been fortunate to disregard handling multiple domains or distinguishing between strings sourced from PHP vs. JS, so there may well be other considerations we're overlooking.

I agree; doing this on the fly I think has better compatibility, and ensures that PHP and JS both stay in sync correctly. The only potential concern is that we're going to pass a lot of strings doing it this way, whereas building separately can allow us to just pass those that are actually required. We could potentially use a different domain for JS to get around this.

(cc @ocean90 for input on whether that's a good idea)

In consideration of some of the more recent discussions about browser support, it might be worth discussing what role the ​browser native i18n APIs could have on what should be implemented here. For example, do we need wp.i18n.numberFormat if ​Intl.NumberFormat is already in scope?

The problem with Intl is that it's ICU-based, and requires browser support for the language. This becomes an issue if we have localisation for a language that the browser doesn't (or vice versa, but that's unlikely to be possible to actually hit), plus it doesn't integrate with our existing gettext-based system, so there may be mismatches. I think consistency is super important here: our JS needs to always be from the mindset of progressive enhancement (even if that's not always the case).

I'm curious about the decision to use a separate JSON-formatted export of strings, versus preparing an object for Jed on the fly. So far it's worked well enough for us to generate the object from the return value of ​get_translations_for_domain (​see source). That said, we've been fortunate to disregard handling multiple domains or distinguishing between strings sourced from PHP vs. JS, so there may well be other considerations we're overlooking.

I agree; doing this on the fly I think has better compatibility, and ensures that PHP and JS both stay in sync correctly. The only potential concern is that we're going to pass a lot of strings doing it this way, whereas building separately can allow us to just pass those that are actually required. We could potentially use a different domain for JS to get around this.

Plugins in the Plugin Directory are required to have exactly one text domain becuase of various reasons and limitations in both core and WordPress.org. I don't think this will change anytime soon. Plus, if we can build something in a way developers don't have to worry about it, we should pursue it.

Projects like Jetpack easily have hundreds of translatable strings. Passing all the strings will be too much for them, and separating JS strings on the fly would have a bigger performance impact than just providing a JSON file right from the beginning. The language pack system with its update system has been reliable for so long now that I don't see any harm in providing separate files.

Projects like Jetpack easily have hundreds of translatable strings. Passing all the strings will be too much for them, and separating JS strings on the fly would have a bigger performance impact than just providing a JSON file right from the beginning. The language pack system with its update system has been reliable for so long now that I don't see any harm in providing separate files.

How would this work with local development? (And, for that matter, non-repository plugins.) Would we still need a dynamic piece for this?

Projects like Jetpack easily have hundreds of translatable strings. Passing all the strings will be too much for them, and separating JS strings on the fly would have a bigger performance impact than just providing a JSON file right from the beginning. The language pack system with its update system has been reliable for so long now that I don't see any harm in providing separate files.

How would this work with local development? (And, for that matter, non-repository plugins.) Would we still need a dynamic piece for this?

The data from the JSON file is handled by PHP right now, so it'd be easy to add and maintain a dynamic piece for this. Also, developers could use the same string extraction / JSON file generation script(s) locally and bundle the files in their plugins. Just like makepot.php.