Language support

My game has a good amount of text in it. I want to support most other languages such as Spanish,French,German,Italian.
Also, my code is straight 'C' so portability is easy (in theory).
My question is as I have many text strings what's the best way to store them (i.e. file format, layout)?
I assume I literally just want an text.english file for example that can be passed to a translator. My concern is also about those 'funny' characters :-) ; how do I store in a format that permits all the accents etc?. And if I really felt like punishing myself even Japanese!!!?!??!!
Any advice welcome. I guess it's important to get this correct at the start rather than alter the text format later!

Well, in TextEdit you'll find the options in the Preferences window. In Text Wrangler it is in the document options. Other programs will tend to work along those sorts of lines.

On the original question, Apple's approach is similar to Andy's, except that they use
"key" = "string";
You can do anything you like really. Just find a format that works for you, and don't make parsing it unnecessarily difficult.

I see.
Are you suggesting that I could use character viewer to insert my chars into a standard textedit doc? (for testing purposes). The parsing could be tricky, I guess the simplest way is to look for a LF and then assume that's the end of the text string. Then in game I can just access the string regardless of language with (for example) text[0],text[1] etc. for the relevant string.
Does that sounds reasonable? :-)

<Edit: I just now saw you're using "plain C". What frameworks/libraries are you using? If you're going to display multilingual text you must be using some kind of library that draws multilingual text, so what do you have? >

It's not as tricky as you think it is.

As mentioned, you'll have a separate file for each language. The file needs to have some known text encoding, and although there are gajillions for specific languages and regions, they're all dead. You're going to just use a Unicode encoding.

The Unicode character set is huge. It contains over a million different characters. That's what you're looking at in the Special Characters palette. It has every character for every language on the planet and thousands of symbols and signs and other useless things like this snow man: ☃.

UTF8 is a superset of plain ASCII. It's handy because all of the standard English characters and control characters are all the same single-byte equivalents as they are in ASCII. But, since any Unicode character needs to be represented, you can also have up to 6 byte characters.

There's also UTF16 and UTF32 (both with big and little endian variations). UTF16 uses 2 and 4 byte characters while in UTF32 all characters are 4 bytes.

So for your file, using UTF8 is a fine choice. Now, as for creating a text file that uses UTF, when you make a plain text file in TextEdit and then save it, you'll see a popup in the save panel. Pick UTF8 and there you go. You can type anything you want into that file. You can use the special characters palette or you can just type it out if you know how.

<Edit: But if you *were* using Cocoa...>

So let's say your file format will just be: "key:value" on a single line. When you read the file, just suck it in as an encoded string, say NSString's [initWithContentsOfFile: encoding:NSUTF8Encoding error:]. Then (because a line-ending in UTF8 is just a single byte and doesn't conflict with any of the multi-byte unicode characters) just split the string into lines. You can do that easily using the componentsSeparatedByString method. Now you have an array of lines. Next just loop through the lines, and (if it's not empty) split the line into key and value by finding the first ':' character (rangeOfString), and then use grab the string to the left and right of it (substringToIndex, substringFromIndex).

Now that you have a key and value, just shove it in a dictionary and lookup the key in the dictionary whenever you want to use the string.

Now, a few things, though... I used Cocoa above to explain how you'd read and chop up the string, but if you're using Cocoa, you can just use the actual localized strings files and just use NSLocalizedString to get it back. But if you're using some other language & framework, you can just use this same idea and translate from Cocoa into whatever code you're using. The principle is the same.

<I started this before Seth posted, but he got a placeholder post in before I finished>

It shouldn't be any problem to assume that a new line is the end of a string. If you need to support multiline strings then it might be better to put them into a binary file with null-terminated strings. You might want to consider XML if you need something more sophisticated, but that does make parsing more fiddly. Loading the data into an array of strings seems reasonable, although that really depends on what makes sense to you. I like the approach of having one file for each language, loading the one you need, and allowing the majority of your code to not pay any attention to the whole procedure (other than allowing for variable string lengths and so on).
The character palette is probably a reasonable way to create a test document. I'd imagine that your translator will have a better way to type Japanese.

Don't assemble sentences from from sentence fragments in a rigid order e.g. create "Hello world!" by doing "Hello" + " " + "World". This is problematic as different languages may need to place the words in different orders. Small words like "the" or "not" may not even be separate words, but may instead be prefixes/postfixes on other words which may make it impossible to translate them individually.

If you use some kind of format string (as used by printf in C) be aware that the order of the arguments may need to change, for them to make sense in another language, so use a print function where the format arguments can be named or numbered - use something like:

"Hello $FirstName$ $FamilyName$"

where $...$ is a named argument, the translated format string may need to rearrange the arguments to e.g.:

"こんにちは $FamilyName$ $FirstName$"

or may want to dropped, added or repeated certain arguments:

"こんにちは $FamilyName$さん"

tip: using named arguments rather than numbered ones, will be helpful for any translators that will work with the text.