PHPMaster.com has posted its third part of its "Localizing PHP Applications 'The Right Way'" series. In this third part you'll learn more about locales and message domain switching.

In Part 2 you gained more insight into using the gettext library by learning the most important functions of the extension. In this part you'll learn how to best use a fallback locale, switch between locales, and override the currently selected message domain.

They show you how to set up the directory structure to handle a fallback locale, a choice to use when the system can't determine which to use. By using a default, you also avoid having the system translate from the default language to...the default language (like "English" to "English"). Included are also the code bits you'll need to switch between locales (just using a different domain) and using the dgettext function to specify a different domain than the selected one.

Evert Pot found out an interesting thing about the basename function in PHP - it's more than just a handy shortcut for paths, it's also locale aware.

It turns out basename does a bit more than just splicing the string at the last slash, because it's locale aware. In my case I was dealing with a multi-byte UTF-8 string. It took me quite some time figuring out what was going on, because I was testing from the console which had the en_US.UTF-8 locale, and the bug was appearing on Apache, which defaults to the C locale.

He includes an example snippet of code showing how it can work with both the default (well, for Apache anyway) of the "C" locale versus the "UTF-8" locale and return different results for the same urldecoded information.

Padraic Brady has posted about an issue he noticed when working with regular expressions and the "word" character type to find something that's alpha-numeric (including an underscore):

You can find the "word" generic character type used in a lot of PHP code including the Zend Framework. The problem is that the assumption above is incorrect. Now, most of the time these act identically because PHP is compiled using its own packaged PCRE library. However, I've seen more than once systems where this is not the case. Usually in some non-English capacity where additional locale support was considered necessary or standard practice.

The problem comes when PHP is compiled against a custom PCRE library, making it more locale-aware. He gives instructions on how to get this to a testable state on your environment (using an updated PREC library) and get it working for characters in French, like the accented "a" or "e".

On WebReference.com there's a recent article looking at the PEAR internationalization (i18n) packages and how they can be used to internationalize your application.

For many of us, the realization of the extent of countries' interdependence was driven home by the recent global economic meltdown. So what does all this have to do with us Web developers? It's a resounding wake up call that we have to think of other nationalities when we develop our websites and applications. In most cases, developing a web app in English alienates much of the world's population and greatly reduces potential profits! With that in mind, this article is the kickoff for a series that discusses the ramifications of globalization on our websites and applications.

The look at some of the local identifiers (like LC_ALL, LC_TIME, LC_ADDRESS and LANG), how to access the values for them on the different OSes and how to use the I18N_Country and I18N_Language packages from the PEAR I18N package to handle some simple multi-language support.

The Zend Developer Zone has taken a look at the first release of the Zend_Locale_UTF8 component for the Zend Framework and some comments from it's lead developer, Andre Hoffmann.

Andre Hoffmann posted a blog entry today to talk about his work on Zend_Locale_UTF8: "I just released the first version of Zend_Locale_UTF8.It doesn't come with all functions nor with the best performance, but it shows how the current state of development is."

He also lists some of the things still missing from the component, including: unit tests, substr, strstr, PHP6 support, and mbstring support.

Following up on some of his previous posts to the SitePoint PHP Blog, Harry Fuecks has posted this quick guide with some "hot UTF-8 tips" to share with the community.

As a result of all the noise about UTF-8, got an email from Marek Gayer with some very smart tips on handling UTF-8. What follows is a discussion illustrating what happens when you get obsessed with performance and optimizations (be warned - may be boring, depending on your perspective).

He talks mainly about using the native PHP functionality to avoid the mbstring issues that could arise by restricting locale behavior and using a fast case conversion function to handle strings correctly. The other tip involves delivery methods to those not able to recieve UTF-8 formatted content - checking their character set and responding accordingly.