I’ve been a little harsh on Google in some previous posts, so I’m happy to have some good – no, make that great – Google news from the recent IUC34 conference. Albeit a little late compared to the tweeting others have done 🙂

Even though the internationalization community has made great progress towards more, and more uniform, locale data with the wide acceptance of the CLDR in recent years, we have been left with 2 big gaping holes: phone numbers and postal addresses. Up until now it has been practically impossible to implement correctly internationalized phone number and address formatting, parsing and validation, since the data and APIs have been unavailable.

Depending on the level of globalization awareness of the companies involved, this has resulted in implementations falling into one of these 3 broad categories :

Address and phone number fields are hard coded to work for only one locale and will reject as invalid everything else. This usually takes the form of making every single field required, and doing validation of every single field, even on web sites where the address and/or phone number of the user is not actually important (such as purchases of non-restricted software delivered electronically, or web sites with required registration to enable targeted ads). This of course also results in such companies collecting an amazing amount of absolute garbage. For instance, if you make “Zip code” a required field and validate it against a list of US zip codes, then you end up with an amazing percentage of your users living in the 90210 area – simply because that is the one US zip code people living outside the US have gotten drilled into them via exposure to the TV show.

Support for a limited number of countries/regions (limited by the number of regions you have the bandwidth to gather data and implement support for – with each company reinventing the wheel every time, for every country)

No validation (provide the user with one , single address field, and assume that if the user wants you to be able to reach you at that address, he/she will fill it in with good data)

As described in the IUC34 session (by Shaopeng Jia and Lara Rennie), collecting reliable and complete data to fill these holes was a major task (it’s no coincidence that nobody has done it before…):

There is no single source of data

Supposedly reliable data (ITU and individual country postal/telephone unions) turns out to be unreliable (data not updated, or new schemes not implemented on time)

Formats differ widely between countries/regions

Some countries even lack clear structure

Some countries (e.g., UK) use many different formats

Some countries use different formats depending on language/script being used

Chinese/Japanese/Korean addresses – start with biggest unit (country) if using ideographic script, but with smallest unit (street) if using Latin script

I have looked at these issues a few times in the past, and each time the team decided that we didn’t really need this information (translation: there was no way in hell we were going to be able to get the manpower to gather the information and implement a way to process it). Since Google does in fact have a business model that makes it very important to be able to parse these elements and format them correctly for display (targeted ads and Android, to name a couple of cases), it makes sense that they bit the bullet.

They deserve a lot of kudos for also going ahead and open-sourcing both the data and the APIs that are the result of that major undertaking, however.

Check it out:

According to the IUC34 presentation, the phone number APIs will allow you to format and validate 184 regions, while they will parse all regions. And the address APIs provide detailed validation for 38 regions, with layout+basic validation for all regions.

I really wish the Android developer guide would not (at least implicitly) recommend using localizable strings directly as array items.

In my opinion, any examples showing translatable strings as array items should in fact be accompanied by flashing red warning signs. Here’s why:

If the default version of a string array is updated with the addition of an array item, but the localized version has not been updated yet (very common scenario), then the menu item in question would simply be missing from the localized version (no fallback to the default version, since the string array represents the whole array). This can potentially result in serious functional issues in the localized versions of the software.

Since it is common for localization to lag behind updates to the base version, and since hopefully developers will clean out unused strings periodically, it is very common to have strings in the localized files that do not exist in the default version. That should not cause compiler errors or warnings!

It would be incredibly useful to get an error if a string that is actually referenced in the code does not exist in the default strings.xml file – but the current setting is just an annoyance.

Phew! Finally no need to stick my head in my bag when I answer my phone in public; no need to be insanely vague about exactly what I work with when talking to people outside of Motorola. Now I can just say CLIQ – or DEXT – or MOTOBLUR.

That name in itself is one of the few surprises in the announcement. We have been using the code name internally, and had never heard the official name until a few days ago. So now I suddenly know what I have been working on…

It seems there’s a reason I’m not in marketing, though – I would have thought the concept of a “clique” would have primarily negative connotations (and yes, I get the word play on both “clique” and “click”), but I guess not.