i18n status

i'm mentoring bernat for the gsoc-i18n-project.
we've just completed the concept-phase, and start implementing a basic translation-system to sapphire. if you've got access to the gsoc-wiki, have a look at:

Principally I think you make the whole thing to complicate given that at the moment nothing is internationalized.

Why do you want to create a parser if you have to search all strings manually before you can use your parser? I think that's just a big unneeded overhead.

My idea would be much easier: :-)

- In code and templates just use _($namespace, $entity, $arg1, $arg2, ...)

- When introducing a new string to localize just write it in an XML file (one per locale) with all the needed information: namespace, entity, context, priority, string, and maybe additional a boolean to indicate whether it contains a string that is already escaped (HTML) or not
In that way you can then create something like a compiler that creates you your PHP array file that contains only namespace, entity, string (and maybe the boolean). If someone needs to translate to another language he has all the needed data in one XML file per module and in that way it is even easily and quickly possible to write desktop programs to ease the translation process.

The thing that in my opinion causes the biggest problems with your underline function is that if you use it in that way, everyone needs to write his modules in en_AU or whatever the default is for Silverstripe. But let's say someone wants to create a module for an Italian site, obviously he doesn't care much about the English translation and just puts the Italian strings everywhere.
Translation is then a big pain.
But if he just uses the entities in the code/templates and creates the XML file with all the information in Italian, someone can simple use that XML file, translate it and everything would work.

Maybe you should create a simple tool (in PHP or whatever else that is cross-platform compatible) to make it easy to create those XML files. Or you can simple use a table in the database for that and create an exporter.

I think in that way the implementation is much easier to program and to use also for non-native English speakers.

Tell me what you think about that - hope to have helped you and not confused you :-)

It's true that now it will be necessary to manually search all strings. But when writing new modules/code you will avoid editing multiple files to add a single string. And readability improves as well, since entities could be meaningless. The initial adaptation will be done just one time, but these benefits will remain. And it's a bit quicker to adapt the existing strings if you don't have to move it to another file anyway.

> The thing that in my opinion causes the biggest problems with your underline function is that if you use it in that way, everyone needs to write his modules in en_AU or whatever the default is for Silverstripe.

We can define an "original" locale/file to solve this problem (right now the hardcoded one is the generic "en"). But anyway if they are writting a module that aims to be translated it's generally better if they write it in English, isn't it?

We could even allow to define in some place in the module the language that original strings are written in - but i'm more with the original-file solution, what do you think? (the language of this original file can be stated when uploading it to the translator UI).

Well, but I don't think that meaningless entities should be used... that makes not really sense at all. What are your ideas if something changes in the original translation? Then you need to change that in the code files, and for every other language in the translation files, right?
I think from the viewpoint of a translator (which often has no idea of PHP) the easiest way to fix translation errors (even in the original language) is to work always with the same system, namely a translation file.

> The initial adaptation will be done just one time, but these benefits will
> remain. And it's a bit quicker to adapt the existing strings if you don't have
> to move it to another file anyway.

You just automate the process of moving it to another file :-)

> We can define an "original" locale/file to solve this problem (right now the
> hardcoded one is the generic "en").

That would be an option...

> We could even allow to define in some place in the module the language
> that original strings are written in - but i'm more with the original-file
> solution, what do you think? (the language of this original file can be
> stated when uploading it to the translator UI).

I think if we use that architecture the language should be specified somewhere directly in the file, maybe something like a custom docBlock. In that way the original language is always in the right place bundled with the strings.

Don't understand me wrong, I have nothing against your concepts, but I just think you are going to implement much more than it's needed.. and writing a parser that works with all special constructs (different escaping, line breaks and so on) is not at all a trivial task..

>Well, but I don't think that meaningless entities should be used... that makes not >really sense at all. What are your ideas if something changes in the original >translation? Then you need to change that in the code files, and for every other >language in the translation files, right?
>I think from the viewpoint of a translator (which often has no idea of PHP) the >easiest way to fix translation errors (even in the original language) is to work >always with the same system, namely a translation file.

Entities are used as identificators, and since the usual way to go for a translator will be to use the translator ui module, which will track changes in the original php file (using entities to identify the strings), if a module developer has some kind of special entities naming system he can use it :). Of course, if entities are meaningful this can help translators, these are good practices, but it's just that it's not required.