One of my colleagues is the "Pseudo Man" (a rich source of puns in conversation!)

by Michael S. Kaplan, published on 2011/04/11 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/04/11/10152035.aspx

Pseudolocalization.

That interesting noun has its own Wikipedia article that was first written in October of 2005 (link right here, it has warnings as not citing references and sources first put up in June 2009), which includes the following definition:

Pseudolocalization is a software testing method that is used to test internationalization aspects of software. Specifically, it brings to light potential difficulties with localization by replacing localizable text (particularly in a graphical user interface) with text that imitates the most problematic characteristics of text from a wide variety of languages, and by forcing the application to deal with similar input text.

If used properly, it provides a cheap but effective sanity test for localizability that can be helpful in the early stages of a software project.

Pseudo-Localization

To prevent common globalization bugs, pseudo-localized builds were created. Pseudo-localization is a process that creates a localized product in an artificial language. That language is identical to English except that each character is written with a different character that visually resembles the English character. Except for being entirely machine generated, we create the pseudo-localized builds exactly the same way as we create the localized builds. Because even monolingual US software developers can read pseudo-localized text, it has proven to be an excellent way to find globalization problems early in the development cycle. In the Windows 7 beta, some UI elements were still in their pseudo-localized form, causing some interesting theories about what the meaning might be. We hope we have solved the mystery with this blog post. :-)

Control Panel Dialog in Pseudo-localized Windows 7

The words here might imply that pseudo started with Windows 7, but that isn't true.

Now it did not exist when XP or Server 2003 or Server 2003 R2 were being developed.

In truth, it started during the early design phases of Vista. Way back in the before time, when it was called Longhorn, actually.

Pseudo builds were first available in Beta 1 of Vista

The project was hatched largely by the localizability engineers, something they organically designed to help catch problems that these people knew of that were usually not caught until localization was formally underway since localized builds were required.

When presenting about pseudo to core teams, I usually have a description of it that is something like this:

Think of pseudo as an eager and hardworking yet naive intern localizer, who is eager to prove himself [or herself, I tended to alternate the gender] and who going to translate every single string that you don't say shouldn't get translated. So that when you install the build you know that:

if there are any places that aren't available to localization due to being hard-coded, since the intern localizer couldn't reach it, and

if there are places that, when localized, break code because things were put into resources that perhaps shouldn't be (think e.g. HTML tags)

Because while much and in many cases the majority of the time localizers can be smart and make the right choices, when you look at the millions of words across over 100 languages, someone can obviously make a mistake -- and the mistake can be easy to find or hard to do so. Therefore, this pseudo, this intern localizer, by acting as the busiest canary in the coal mine ever, can find the kinds of issues that we could never have hoped to have the same coverage of in every single language -- and faster than we ever could have done, anyway.

Little known fact that I'll tell you all now: we actually stage lots of our Windows builds on pseudo and "localize" them into English!

This was originally a bug while pseudo was first being set up in the official build lab, but became something of a de facto standard procedure due to the bugs we found in the process of enduring this original bug.

And now it is a great and easy way to verify the build process

Now since it was long before the NLS team build their own pseudo locales (the ones I first mentioned them in Walking off the end of the eighth bit, pointing to Shawn's Pseudo Locales in Windows Vista Beta 2), they had to build on top of an existing locale (and they needed an existing locale for architectural reasons). They decided to choose a locale that "would never be a localization target language in Windows or Office".

They built it at atop Turkmen (Turkmenistan), aka tk-TM.

This led to some interesting consequences with the collation of strings due to the way Turkmen sorting works -- a fact that was reported as a bug several times during both Vista and Windows 7!

And they were almost right about it never being a localization target language.

If you define never as being "until the next version" that is -- since we did create a Turkmen LIP for Windows 7....

Note that the Turkmen LIP almost could have been scrubbed due to the complications of all the work that assumed it would never be needed for localization, and we had to overcome some interesting engineering challenges because of all of the places in build scripts and test suites that would mistake it for pseudo.

Personally, I'm glad those hurdles were able to be overcome.

The whole question of Why one LIP and not another? is hard enough without adding "and has never been added [even unintentionally] to the Naughty List"....