Locales in different Java versions

3 Jun, 2018

I’ve spent a lot of time wrestling with i18n
and l10n in Java, for various projects.
On the back of that, I’ve put together an awful lot of little code snippets to
demonstrate and clarify various “interesting” things about Java’s handling of Locales.

This article is basically just to list some of the differences between the available
Locales in Java 8, 9 and 10. I’ll update this with info for more recent Java versions
once I’ve had a need to use them.

Most of the differences mentioned here are between Java 8 and Java 9 - that’s
where the big effort occurred. Between Java 9 and Java 10, there was only a tiny
change, mentioned at the end.

The code used as the source of this article is here but basically it’s:

get a unicode capable output stream
get a sorted list of all the available locales
get a sorted list of all the available countries
print all the locale.getDisplayName()
print all the country.getCountry() and country.getDisplayCountry()
print the count of locales and countries

I compiled that with OpenJDK’s Java 8 version, then ran it in OpenJDK’s JVMs for
Java 8, 9 and 10. I took the outputs from each and diffed them to see what has
changed between Java versions.

Note that these results are from the OpenJDK implementations of the JVM, running
on a Debian PC. Different JVMs for different OSs on different hardware may have
different locales information. Do your own tests if you need to know for sure!

Here are the main interesting bits I saw in the results:

1. My word, Java 9 added a lot of Locales!

Java 8 Locale count: 160

Java 9 Locale count: 736

That’s 4.6 times more languages Java 9 knows about than Java 8. They did
a lot of work on i18n for Java 9 so this isn’t a massive surprise; but still,
that’s a lot of languages

2. Many more language variants

Although a lot of that huge pile of new locales in Java 9 is new languages, a portion
of the new stuff is actually new languages variants added to either support less
widely spoken/written variants, or to be more precise about the language naming.

Similarly Portuguese, French and English gained large numbers of extra variants.

In particular English went from 12 variants to 106 variants - My word that’s
a lot of different Englishes!

3. Every(?) language has a country variant

Another cause of the locale explosion in Java 9 is that every new language Locale
added now also has a country specific locale. So adding a new language actually
adds 2 new Locales. For example Basque was added as a new language in Java 9:

Language: Basque
Language: Basque (Spain)

3. Unicode in the English names

In Java 8 and earlier, the display names of languages used for a display locale
of English didn’t include any “weird” characters. In Java 9 some of the newly
added languages, especially for the country specific variants, include unicode
characters (i.e. outside ascii range) in their display name, even with the display
Locale set to English. For example:

Yes, that French variant apostrophe is not a single quote character, it’s a true
apostrophe; and those circly dotty things in the Scandinavian languages are different.

Bizarrely, in Java 8 some of the country Locales did use unicode chars in the
English display name, but others didn’t. Then in Java 9, they appear to have fixed
the odd cases where overly anglicised naming was used. For example:

Yes, in Java 8 that country apostrophe is a single quote, but the circumflex is
indeed a circumflex, even when the name is displayed in English. But an acute on
Reunion was apparently taking things too far. For Java 9 they did a lot of work
to improve the sanity of this stuff.

4. Java 9 abbreviates some Country names

In Java 8 countries with “and” or “saint” in their names include those words in full,
in Java 9 the abbreviations of “&” or “St.” are used. A good example of that is
a country Locale which uses both:

5. Java 9 lost a country!

Country Code: AN, Country Name:Netherlands Antilles

No such place in Java 9. Wikipedia has an article about what happened.

6. Java 9 - Minor name tidying

Java 9 also saw a lot of little changes to namings. I assume some of this was due
to politics happening and some people changing decisions on how their country
or language name should be displayed. But clearly a number of the changes were
basically fixing typos. A few of the tweaks I’ve see that interested me are: