Recommended Posts

Lately, I've been attempting to add support for different languages other than English in my game (i.e. Spanish, Danish, Hungarian, Chinese, Japanese, Korean, etc.). I'm more familiar and experienced with the former 3 languages, I never finished my Japanese course in Uni, and never touched Chinese or Korean.

My assumption was that I could just use a .ttf font for East Asian languages and the standard Segoe_UI.ttf for European languages. An example of the problem I have for European languages would be non-English standard characters. Examples: teljes képerny? (Hungarian), fuld skærm (Danish); the é, ? and æ letters do not show up and breaks the translation. With Japanese, I use a Japanese .ttf font, and what happens is that majority of the characters do not show up, and the ones that do show up aren't the ones I need.

My question is how would I implement these languages using .ttf fonts? For Asian languages, should I discontinue subtracting the value of each char by 32? Any help is greatly appreciated!

Where does your input text come from? You need to know which encoding it is stored in, such as UTF8, UTF16, etc...

stb_truetype seems to operate on 32-bit unicode code-points, so I assume that if your text is UTF8, you need to convert each UTF8 multi-byte code-point into a single UTF32 code-point before passing it to std_truetype.

I believe UTF-8 encoding is an encoding format for storing text that would multiple bytes to identify, yet it's meant to be backwards-compatible with ASCII. With that said, that front bit in each byte isn't used in calculating it's UNICODE value. Instead, it's used as a flag to determine if the next byte of data's back 7 bits are used to describe the localized character's unique value. These values, once calculated into normalized 32-bit (unsigned?) UNICODE encoding. Then, you'd have a bitmapped font that would represent each glyph in the font on the image by a 32-bit (again, unsigned?) value equal to the what would match up to your localized text's normalized UNICODE values.

I think that's in line with what Hodgman was saying above. With that being said, you'd want to store your text that you'd display onscreen in UTF-8 encoding, and use a library to read those UTF-8 strings, such as utf8proc, to convert it into a string of normalized UNICODE characters that'd you'd use as look-up values in your localized bitmapped font when rendering text to the screen.

Since we're storing text in UTF-8 and XML parsers typically expect its text to be stored as UTF-8 text, I store my strings in an XML schema like so:

Hello! Hola! New Game Nuevo Juego

I'd like to point out that my XML code doesn't appear here^ Looks like it was edited out :/

Then, in my code, I'd have a LocalizedPackage that'd load up an XML file of localized text, typically for an entire menu or for cut-scene dialog that would contain a collection of localized strings described by my LocalizedSting class. LocalizedPackage reads the XML file, and for each element in the XML, it create a LocalizedString instance. LocalizedString would then create an instance of LocalizedText that'd it hold for each element found. It'd read each element, use the 2-character code to determine which language it falls under, and label that LocalizedText with that language. the XML parser would read in the text, and tell it as UTF-8 string and convert it to normalized, an array/vector/list of unsigned long's.

Then, you'd do something like this in your code:

fontString->SetText(localizedPackage->GetString("greeting_string"));

LocalizedPackage contains would keep track of the game's current language with this variable:

LocalizedPackage would return the correct language's normalized UNICODE string that my FontString class would know how to interpret. Of course, if you wanted to provide localized text in a specific language regardless of the engine's current language, you could always do this:

Share this post

Link to post

Share on other sites

Subtracting 32 from character codes is not something you would normally do, not even in western scripts (not for TTF, anyway -- you might do that if you use your own bitmap font where glyphs start at zero and the space character is your first defined glyph).

For Asian fonts, you obviously must support Unicode in some way, since you'll be using considerably more than 255 different characters. Whatever you use is your decision, as long as you convert them to UTF-32 at the end (before passing it to stb_truetype). I'd go with UTF-8 for storage because it's straightforward and more efficient than UTF-32. Conversion routines are freely available too, so there's not much you need to think about.

UTF-16 is larger for most languages and has no advantages over UTF-8, but it has all of its disadvantages (e.g. non-obvious length to character count mapping), plus it does not work with legacy string routines and isn't as "intuitive".

Note that using the "standard" Segoe_UI.ttf font will require you to buy a license from Monotype, unless you rely on it being installed with the operating system (not the case on MacOS X, that'd be Lucida instead).

Also, you may need to redesign the UI, since different languages can have grossly different text lengths (up to 30-40% difference) and directions -- though I think you can legitimately write Japanese and Chinese left-to-right instead of top-down-right-to-left, as an alternative style.

Edited May 16, 2013 by samoth

0

Share this post

Link to post

Share on other sites

You would want to write a localisation system that works off of keystrings that look up the localised string to display, which means that your input text are coming from a text file or xls file or whatever you decide to store this in. After that it all boils down to your text renderer and the bitmap fonts it loads and which characters are present in that map.

You shouldn't redesign the UI too much for different langauges you should design it where you are running with a longest string setting from your localisation file and if that fits okish you are good. Framing is still important as well.

Unicode works with code points to find out what letter is which, a bitmapped font will just link these characters to the correct glyph and glyph information for you. You will have to compile the ttf font with the right settings though.

Share this post

Link to post

Share on other sites

Okay, thanks for all the replies. This whole UTF-8/16/32 thing is kinda new to me and I never knew what made it useful. Do you mean use something like wchar_t instead of char? I'm still in the process of finding out how to do this on XCode. I'm at work right now, so I'll have to wait until I get back today to further read everything that was said in detail.

Shogun.

0

Share this post

Link to post

Share on other sites

wchar_t isn't portably UTF-16 or UTF-32, it's an implementation dependent type that is typically 16-bits on Windows and 32-bits on other platforms. My recommendation is to find a Unicode library and use it's typedefs for Unicode character sizes. I like ICU, but there are other ones available.

Also, it's not going to be as simple as finding a single font for all East Asian languages. Due to CJK unification certain characters that are represented by the same code point are rendered differently in different languages (or differently between traditional Chinese and simplified Chinese). So you'll end up wanting a different font for each East Asian language you want to support.