Localisation of mods for BGEE

People interested in technical explanations and legacy tricks may read the rest of this text. People interested in an up to date practical solution shall read about the HANDLE_CHARSETS function added in WeiDU 237 instead. It implements an easy, well integrated (i.e. easy to uninstall), multi-platform compatible way of converting the texts as described below.

BGEE uses a new encoding for special characters used in international language such as French (à é ...). It is based on UTF8. That encoding stores special characters on 2 bytes instead of one in the past. BG and BG II were based on ISO-8859 encoding (special character on a single byte). It seems ISO-8859-1 was used for Western Europe, and at least another one for Poland and probably another one for Russian.To adjust to different language charset, as far as I know, the games were using specific BAM files containing characters shape matching the encoding.

With BGEE, mods such as BG2 Tweak Pack don't install properly in language using special characters. In the game, the string is displayed only until a special character is found, so it is shortened, possibly a lot, so that's unreadable. I assume that this is because a byte with the most significant bit set (as is the case for special characters in ISO-8859 encoding) is invalid in UTF8.In the cas of BPSeries, although the mod installs properly and the description are properly added to scrpdesc.2da, with StrRef matching the tlk file, as can be checked with Near Infinity, the BP scripts description don't display properly. It seems each line of a multiline text ends as soon as a special is found:

The solution I found is to convert the tra files into UTF8 encoding before installing the mod. Since the original game don't handle UTF8, this means you have to choose either BG2 or BGEE if you still target international players.Ideally, WeiDU would convert the tra files during installation. However this is not possible right now, and would require some addition in any case, since the encoding of texts for BG2 is not the same for all languages, and the proper encoding for each language should be either hardcoded into WeiDU or given to it an extended LANGUAGE instruction.

In the mean time, the solution is to convert the tra file during installation, depending on the language chosen by the user. I made a preliminary attempt based on BPSeries v1010, as it is very easy to check in game if the encoding is working. Here are my results so far.First of all here how it looks with proper encoding for BGEE:

I used a version of iconv compiled for Windows to perform the conversion. In principle Linux and MacOSX should have iconv already, so this solution can be used as well.Here is the batch I used to convert:convertbpseriestra.bat

:: The files to convert are the .tra files:: Adding .tpa files is also necessary for BG2 Tweak Pack

:: ~nx is used to keep only the filename (n) with extension (x), without the full path:: http://technet.microsoft.com/en-us/library/cc755694(WS.10).aspx:: See iconv manual:: -f to give the original file encoding (here CP1252 / WINDOWS-1252 / ISO8859-1 for French):: -t to give the final encoding (UTF-8)for %%i in (BPSeries\LANGUAGE\FRENCH\*.tra) do BPSeries\winutils\iconv -f CP1252 -t UTF-8 "BPSeries\LANGUAGE\FRENCH\%%~nxi" > "BPSeries\LANGUAGE\FRENCH\%%~nxi_utf8"

:: Note: copying converted files back upon the original .tra files is performed in the tp2 file:: in order to take benefit of the restore capabilities from WeiDU during uninstall

In order to integrate that conversion in the installation process, the only idea that came to me was to add a component at the beginning of the tp2 file for BPSeries. The following block was inserted just after the LANGUAGE instructions and the BEGIN @5001 for the first component:

// Convert// This could easily be done with Linux and Mac, since they must have built-in iconv// But I don't know how to write a .sh scriptACTION_IF ("%WEIDU_OS%" STRING_COMPARE_CASE ~WIN32~ = 0) AND ("%LANGUAGE%" STRING_COMPARE_CASE ~french~ = 0) THEN BEGIN //Windows AT_NOW ~bpseries/convertbpseriestra.bat~END // ELSE BEGIN// AT_NOW ~bpseries/convertbpseriestra.sh~//END

// Replace the original tra files (Weidu should restore the original at uninstall)// Note: unfortunately, MOVE does not remove the .tra_utf8 file after overwriting the tra file, it seems// After conversion, we need to reload the tra file// For french only, as exampleACTION_IF ("%WEIDU_OS%" STRING_COMPARE_CASE ~WIN32~ = 0) AND ("%LANGUAGE%" STRING_COMPARE_CASE ~french~ = 0) THEN BEGIN MOVE ~bpseries/language/%LANGUAGE%/setup.tra_utf8~ ~bpseries/language/%LANGUAGE%/setup.tra~ LOAD_TRA ~bpseries/language/%LANGUAGE%/setup.tra~END

A few notes:

I couldn't come up with a conversion script for Linux or MacOSX, so the code only check for Windows

I only dealt with French language, since it's mine and I could properly check the result. However I assume that the exact same script would work as well for other Western European languages such as Italian, Spanish and German, provided they are all based on Windows-1252 code page (or ISO8859-1).

additional batch/script files would be required for language using different languages, unless parameters are passed to the scripts to tell them the encoding of the original file (I assume that UTF8 applies to all languages in BGEE, why wouldn't it?)

I have an issue with the MOVE instruction, which doesn't remove the .tra_utf8 after overwriting the .tra file, maybe it's a bug in WeiDU 231

A huge drawback of this solution is that it only works if the user cooperates and installs the conversion component, which is also useless in English but will be offered nonetheless.Maybe WeiDU gurus can find out a much better solution for this issue. I do hope so. Otherwise I'm afraid international players will be left out of the BGEE community as far as mods are concerned. I'm posting these findings in the hope more knowledgeable people find a much better way.

Comments

Thanks for sharing these findings, Isaya. Until solved this could be a huge setback for multilingual modding. At the least, another hurdle to climb over.

If there's any way I can help out international players to fully enjoy my work, I'm more than willing to offer it. If you don't mind, I'll include this for optional component in the next edition of the BP series. Naturally, I won't be able to verify much of it installed linguistically, but at least I can set up all the batch files and monkey-see-monkey-do the supporting weidu code together from what you posted. If this is solved properly in either a future update of BGEE or WeiDu itself, I can just as easily remove the code

my my my... sometimes I'm so happy to have learned Shakespeare's language... I was thinking about translating some mods to help the French community but reading Isaya's post, I'm thinking it's not worth all the troubles... I could just do some translation and send them to someone that could put them together to work at least...

I'll keep an eye on how things are moving.

Great to see you there Isaya... I should go back to "La Couronne de Cuivre" some times ^^

@neoesprit, as I wrote in another thread about new kits, there is no difficulty with translating mods for BGEE. The only issue is when you want to make mods compatible with both BGEE and BG II/BGT/etc.So feel free to work on translation. There will always be a way to integrate them, don't worry. ;-)And you're welcome back at "La Couronne", especially if you feel like translating!

@horredtheplague, feel free to use that trick. However I believe you don't need to, at least for BPSeries, since the new release seems specific to BGEE, if I understand correctly. Correct me if I'm wrong.I checked the new release, and the changes to the descriptions compared to the original French translation show that you removed HLA and such stuff, so that release of BPSeries wouldn't fit BG II/BGT very well. That's why I assume this is only for BGEE.

Currently I'm updating the translation to add description of the new scripts.I propose to send you the file with both encoding. You could the UTF8 version directly if you indeed target only BGEE with this release.

As for the other languages, I believe that spanish and german should use the same original encoding as French (Windows-1252), if this Wikipedia page is to be trusted. The other code pages of interest would be 1250 for Polish and 1251 for Russian, among the languages officially supported by BGEE.

Thanks, Isaya. I appreciate the assistance in this. If you PM me over at Spellhold, I can give you an email to send to. You're absolutely correct; this release here is 'only' for BGEE. I'll be doing a seperate version for BG2 either w/ or possibly soon after the next Big Picture (v181) release in a week or 2 (if time allows, it's a big job). Reason for being seperate: the BG2 version has over a dozen mod's content to account for--and this version was heavily streamlined with just BGEE in mind, inc the difference in TLK strings. Eventually, I'll have the BG2 mod set up as a "mod-detector"--to install only the extra content that an install needs---and then maybe I can work out a universal release of the mod for both formats.

After completing the French translation for BPSeries 2016, I tried it in the game by replacing the current file with my new one converted to UTF8, without using the on-the-fly conversion. As expected, it worked fine.

So I decided to make an experiment with the russian translated Prowler posted on SHS. The original file posted was using the standard 8 bits character set, so that notepad++ was only displaying a series of special characters in my West Europe character set, but not russian ones.After conversion to UTF8 (notepad++ recognizes it as UTF8 without BOM), it looked far better:

In reinstalled BPSeries and selected Russian. Unfortunately, it doesn't look right in the game:I uninstalled BPSeries, changed my game to English, linked to en_US/dialog.tlk and installed again in Russian. I can't say it's any better, unfortunately:

I don't know what to think. Since the cyrillic characters display properly (or seem to) in notepad++, I assume that Windows can display them well enough. So either BGEE uses a system font that doesn't include the cyrillic characters, or it uses an internal file for the font that doesn't handle them (BG II was using a BAM file for font) or whatever. But since the game is not available with variations as BG II was (English and International versions, officially, plus specific ones for Eastern Europe, I think) I'm afraid the cyrillic characters may not work either for Russian players.

In any case, I would suggest you include the russian tra file converted to UTF8 in BPSeries so that you can get a feedback. We already know the previous encoding will not work anyway.

One last word on conversion. To convert files to UTF8 without having to learn the command line parameters of iconv, I recommend Cp Converter, available on SourceForge, if you're using Windows. It requires .Net, but should work out of the box on Windows 7 at least. It's very easy to use. One just need to select the original encoding and the final encoding (UTF8 for our purpose).The names are localised by .Net, so they appeared in French for me. I suggest to rely on the code page at the end of the name to select the right ones among the huge list.I think the typical ones are:

1252 for French, Spanish, German, Italian and also English (in case of special characters that word processors may use, as mentionned on the WeiDU forum)

1251 for Russian (cyrillic characters)

1250 for Polish and Czech

Its output is the same as iconv when I convert the tra file from French.

With the release of WeiDU v232, there is no more need to convert the character into UTF8, it seems. When the first mod is installed, WeiDU asks the player to choose the game language he/she is playing. This will tell WeiDU where to look for the dialog.tlk/dialogF.tlk file from all the lang\xx_XX directories and, it seems, how to convert the characters into the proper encoding.Edited: a wrong assumption about Rogue Rebalancing made me draw a false conclusion.

This is not only a problem of foreign language mods. English knows e.g. fiancé, and the world BG is happening in is properly called Faerûn.@Isaya, thank you very much for pointing out a solution. (I'll have to look at it again closely to understand it, though.)But do I see it right that it won't be possible to just install tra lines with special characters in without preparing them in some way?

@jastey That's true indeed. I forgot that English sometimes uses word of foreign origin, with their specific spelling. Also some mods such as BG1 NPC also make use of typical word processors characters like special forms of ' or ", that are also encoded with special characters.

With the solution I proposed, you didn't have to prepare the files as long as they were consistent in terms of encoding (all cp something and not UTF8). However I wouldn't recommend this way nowadays.

Horred and Wisp have devised another way of doing it, which is much simpler, especially since you don't have to worry about the operating system for the script to convert texts. Instead of converting at installation time, they provide in BP Series and Rogue Rebalancing the two sets of files, for old games and for new ones. This also saves time testing the conversion script for each language.I think cpXXXX is the default encoding and they use the same kind of conditional code to load the tra file with the different encoding if required for BGEE. I suggest you take these mods as reference instead.

You can still use iconv rather than the GUI tool to make the conversions if you have many files but that becomes a packaging task instead.

Yes, I experienced the crashing (freezing) of the game, too.@Isaya, would you be so kind and explain to me how the Cp Converter is started? I have Windows 7 and installed .NET v4.5, but I have no clue as to how to start the program.

@jastey Actually the default download link on the main page I linked to is the source code. Probably that's why you can't find how to start it. So head over to the Files menu and browse to the most recent version to get a link for the executable. Here is the link to the latest version for now.

Unfortunately you can't simply drag en drop files and have to use the File menu. You can select several files for conversion at the same time (that's probably why it's written "Sources files", why didn't I realise it before :facepalm:).Select the source file encoding, typically "Western Europe (Windows) - 1252" in many cases (I expect the actual name to be in german if that's the language you use in Windows). For Destination, I used "Unicode (UTF8) - 65001" for BGEE.I didn't use any of the two options, actually I don't know what they mean.

@Isaya thank you for the other link, I was there but I didn't realize it's a different download.But still, if I start the exe, all I get is a menue with an empty source file box (no possibility to navigate to any files), and the source and destination drop menues for chosing the encodings. How do I actually get to the files I want to encode?

@jastey There should be menu called "File" above the "Source files" list, with only the "Open" choice. You should be able to select one or several files to add to the list. You can also use the menu several times to add other files from potentially different directories.Otherwise it means there is something wrong with the interaction with .Net framework. All the computers I tried the tool with were having a development tool installed (hence development version of .Net too). I hope the problem you face is not specific to runtime environment of .Net or with a specific version of .Net.

@Isaya, I was either too stupid to realize i can actually press the word "File" to open the menue or the reinstall of .Net did the trick. Either way, it works now. Thank you for the link and your patience!

your recommendation for mods that are written exclusively for BG:EE or BGII:EE is to use UTF-8 (no BOM).

your recommendation for mods that are written to install on BG/BGII/TUTU/BGT/BG:EE/BGII:EE is to do the following:1. provide two pre-converted copies of the .tra, one in ASCII (ISO-8859) for the original games, and one in UTF-8 (no BOM) for the new games.2. in .tp2, detect the game and use the appropriate .tra

And we hope eventually to come up with a way that does not require two complete copies of the .tra for each language.

(I am most of all trying to confirm that you do not recommend the "convert at install time using batch file and iconv" in your original post - your BG1NPC fix is elegant, but if it doesn't work then I need to do some serious file editing.)

@cmorganYou want CP1252 (aka Windows-1252) or others, not ISO-8859-1 (aka Latin-1) or others. They are subtly different. (Also, none of these are ASCII, they are all (mutually incompatible) extensions of ASCII.) Edit: languages east of ~Germany use other encodings, CP1251 for Russian, CP1250 for Polish, for example.

@Wisp, did you do any magic for AUTO_TRA and USING yet? I guess porviding two sets of tra files is the current way to go, and although @DavidW's idea of defining e.g. ~German (BGII:EE)~ and ~German (original BGII)~ as two different languages is really great (because so simple), having an automated solution that would get the right files depending on the game without the need for confusing players would be great.

@cmorgan My impression is that including both UTF8 and CP-xxxx encoding in a mod, instead of on-the-fly conversion is more portable. Conversion requires an external script and we all know that Windows, Mac and Linux have their own way of doing it.

I'm not saying that the conversion at install time is not a viable solution. The solution I provided and that is currently included in the BG1 NPC beta is only for Windows so far. I'm convinced it wouldn't be hard to make a sh script compatible with Mac and Linux given that they most probably have iconv as a built-in command. I will look into this.The current script only works for Western countries languages. I remember a polish version of BG1 NPC was mentionned and it would need a modification of the conversion script. The fact that it requires a different encoding could easily be handled by adding a second parameter to the script to give the encoding to apply (CP1252 for french and spanish, CP1250 for polish).

:: Use a parameter to the script (%1) to specify the directory that must be converted
:: Use a second parameter (%2) to specify the initial encoding
@echo off
for %%i in (bg1npc\tra\%1\*.tra) do bg1npc\iconv -f %2 -t UTF-8 "bg1npc\tra\%1\%%~ni.tra" > "bg1npc\tra\%1\%%~ni.tra_utf8"
withAT_NOW ~bg1npc/conv_tra.bat %LANGUAGE% CP1252~in the TP2 file (or better, with a variable for encoding set according to language).

To avoid dependency on the operating system, on the fly conversion would benefit from a command included in WeiDU to convert a file. I'm wondering if the iconv library is not already in the build process, maybe not explicitely but through another dependency. Maybe @Wisp could tell.That wouldn't change the fact that the mod would still have to detect BGEE/BG2EE, convert the files and load them explicitely. It could look like this, instead of AT_NOW and all the LOAD_TRA as of now:ACTION_DEFINE_ASSOCIATIVE_ARRAY trafiles BEGIN
~p#brlt.tra~ => ~p#brlt.tra_utf8~
... (to list all the files to handle)
END
... if BGEE is detected
ACTION_PHP_EACH tratrafiles AS original => bgee BEGIN
// CONVERT_ENCODING input_encoding output_encoding input_file output_file
CONVERT_ENCODING ~CP1252~ ~UTF-8~ ~bg1npc\tra\%LANGUAGE%\%original%~ ~bg1npc\tra\%LANGUAGE%\%bgee%~
MOVE ~bg1npc\tra\%LANGUAGE%\%bgee%~ ~bg1npc\tra\%LANGUAGE%\%original%~
END

I think that including two sets of files, like Rogue Rebalancing and BP Series, or converting on the fly, are both working solutions (we need a framework for Linux and Mac in the second case though).Using preconverted files moves the complexity of conversion out of the TP2 and is probably easier.Moreover conversion on the fly requires a specific test to avoid doing it several times in case you put it in an ALWAYS block when you have multiple components and none that is mandatory. In BG1 NPC, I used the fact that the bg1npc_tmp.tra file is processed to create bg1npc.tra to determine that a component has already been installed and that conversion is therefore done.Other mods don't necessarily have such a thing to track an initial preparation and a specific check is then required, such as creating an additional empty file with weird extension as some mods do.

@Wisp, @Isaya, thank you - that clears some things up. My analysis is as follows, then, at least temporarily - [1] Modders creating mods that are written exclusively for BG:EE or BGII:EE should use code editors like NotePad++ or JEdit and save in encoding UTF-8 (no BOM).

[2] Modders writing for multiple game variants on BG/BG2 content should (at least for now) provide two separate .tra files, one encoded in CP1252 for the older game variants, one encoded in UTF-8 (no BOM) for the :EE series. Several models for this exist, including using things in your .tp2 like

[3] Modders porting existing mods that have large numbers of .tra files and/or HUGE .tra files for multiple game variants on BG/BG2 content should continue to explore how to use and extend @Isaya's "convert using shell or bat scripting at install". The caveats seem to be that currently using AUTO_TRA and language declarations seem to bork, and if a user changes the language in BG[II]:EE they can bork the .tra loads.

[4] We need to take up a collection for Wisp so that he can dedicate 40+ hours per day to recreating the Rosetta Stone in WeiDU so that all this encoding stuff will magically be fixed by pre- and post-processing .tra files to and from various encodings, languages, and platforms.

@cmorgan Being forced to use USING won't be suitable in all cases, as @jastey found out (here).I agree the solution able to maintain the use of AUTO_TRA would be better.

I made some small changes to the TP2 for BG1 NPC in my copy of the repository, in order to handle Linux and Mac conversions. They will need to be tested with the game on these OS as all I could do was test the conversion script itself on Linux. I ensured the command line argument for iconv was compatible with the Mac OS documentation found online. However the script on Linux and the code checking the installation is made on Mac remains to be done.I also introduced a change to allow the script to use any encoding as input so that a polish translation would be easy to add, for instance.I checked installation on BGEE V1.2 on Windows. I checked only with Imoen interactions (including Xzar and Montaron) on the starting area.I'll upload the changes to github so that you can review them.

I stumbled upon an issue which I cannot solve myself. I'm testing my Polish translation for a mod and it seems to work correctly on both BG:EE and BG2:EE, but only WHEN language in game is set to English. Obviously the case is valid only with BG:EE atm (cause BG2:EE don't have official translations yet). Now, when I install the mod while language is set to Polish, item descriptions shows "Invalid xxxx" (where xxxx seems to be random string number). When I switch language to English all items descriptions show correctly in Polish (but obviously the rest of dialogues are in English). When I switch language to Polish again, it is once again a mess.

On both BG:EE and BG2:EE installation of the mod using Polish translation went fine, the mess is shown only when the language in game is set to Polish.

@Isaya, @Wisp you're main experts here. Any clues what might be the culprit? Probably something obvious...

@Cahir‌ WeiDU is installing the mods' strings in polish over the english TLK. In which language WeiDU is going to apply the new strings is defined the first time you install a mod. Check that the next time you install the mods, you first set the language to polish

Also, @Isaya‌, wouldn't it be easier to make this an Action Function and include with WeiDU (like HANDLE_TILESETS and the patches ADD_SPELL_EFFECT, etc).

@CrevsDaak thanks for the tip, man. Turns out I should have set language in weidu.conf to lang_dir = pl_pl first. Now it's working like a charm:)

It's not that obvious there, that one should do it first. Shouldn't WeiDU automatically detect in which language the game is installed and modify correct dialog.tlk accordingly? Or maybe I should set it up somehow permanently, not only by changing line in weidu.conf, but somewhere else?

@Cahir‌ weidu.conf is a file that appeared in WeiDU 232 or higher in order to track down the language selected in game. It was a decision made by the WeiDU author not to read the language from a game configuration file. On reason may be that WeiDU doesn't have to know for each game and operating system where to look for the file.As the question asked by WeiDU and the weidu.conf file didn't exist at the time this topic was started, I couldn't mention it. In principle, you never have to edit the file. It is created the first time you install a mod, WeiDU asks you which language you intend to play the game with. If you remove it, WeiDU will ask you again next time you install a mod.Unless you reinstall all your mods, you should never change the game language in weidu.conf. WeiDU only adds texts in the dialog.tlk file of the game language specified and saved in weidu.conf. If you change language at some point during the installation of mods, the texts will be added partially in the first language and in the second language. In that case, you will get Invalid xxx whatever language you select in game.

@CrevsDaak Wisp created a new function called HANDLE_CHARSETS that integrates the ability to make install time conversion in a way compatible with Windows, Mac OS X and Linux. It requires that a Windows version of iconv is included in the mod (Mac OS X and Linux have it as part of the OS). This functionality couldn't be added directly in WeiDU because of open source licenses conflicts.This capability is still being worked on, as you can see in the topic. However the initial version of the function is used in Edwin Romance V2.06 and is integrated in a beta version of WeiDU (236.01).