The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Bi-Lingual site issues

in this forum but rather than replying to it, I have opened this thread to ask some specific questions about some of the solutions proposed there - though it's no requirement to read that thread in order to reply.

1. When we are talking about language data being in php files we are assuming that our enconding will be in ascii, right? Because I don't think that unicode/utf-8 php files would compile (ie work). So does this approach not work if I want to use utf-8 or am I missing something? What is the recommended encoding for a multi-lingual website? utf-8 or is it just simpler overall to use ascii (if we are not mixing languages that is...)

2. If I have everything in the database, even every little menu/form word, such as login, register, about us, services, products, name, username, password etc, how do I query for these?
a. One big query to get all words/phrases into a PHP structure and then use that as I am rendering the entire page?
b. Make a query per item? This seems a bit excessive to me, because I know that making a couple of queries per page is not that bad but how about 20, 30 or more queries? Or is it ok as long as it's just a single connection?
c. Some sort of compromise where I query once for a subset of the translated text that fills up most of the page and if I need to print a larger body of text query specifically?

So I am thinking maybe the best "compromise" would be to put the little words/phrases that make up the site (menus/forms etc) into some php files and then bigger items such as descriptions, news articles etc into the database. I would not hesitate to throw everything in the database except I have not convinced myself that I have a tried, tested, practical and efficient scheme for rendering the pages.

To question 1:
UTF-8-encoded PHP files do indeed run just fine - in fact one of my sites is done entirely in UTF-8, from the static HTML headers to the PHP scripts themselves, even on down to the database. Everything is UTF-8, and I've not had any problems. Just be aware that with any multi-byte encoding scheme (like UTF-8), you'll need to use multi-byte-aware string functions (not an issue in my particular implementation, as everything I'm doing could be done just as easily in ASCII). See http://www.php.net/manual/en/ref.mbstring.php

To question 2:
Here's how my particular multi-lingual site is beginning to come together (still in the planning stages, though I've started some of the core code already):
As I process a page, I build a list of string identifiers that are used on that page. When I'm ready to finally generate the output, I turn that list into a comma-delineated list of strings and run one single MySQL query using the IN syntax. This then gives me all the strings I need, and I only had to make a single query to get them all. An alternative approach I'm considering is to store separate copies of my templates, one set for each language. This will make managing the common strings easier (won't have to query the database for them), but complicates site maintenance (have multiple copies of otherwise identical files that need to be maintained).

Thanks for your response. How do you save php files as utf-8? I tried Eclipse (and even Netbeans and jEdit) but I could find no way that they let you choose an encoding to save your files. I could only do it with Notepad by choosing Save as... Unicode but that did not work, apache served the file as text, it did not invoke php on it.

> An alternative approach I'm considering is to store separate copies of my
> templates, one set for each language.

That is how I do it as well; There are a number of advantages to this approach though, one being is that you can more easily have variable formatted layouts on a locale basis, for example?

> How do you save php files as utf-8?

In your favourite editor (I use jEdit) you can select which encoding to use, so read up on your editors documentation. In regards to using UTF-8, you need to declare this encoding in your templates meta data.

Notepad from what I believe is basic ASCII only and doesn't support Unicode.

You are better off to use the encoding in all of your forms as well, and just to be sure, send the encoding in your headers as well, which for the majority of browsers, the send header will over-ride the encoding specified in the template... But that is what you want anyways, since the encoding specified in the template is a fail safe - a fall back if you want?

Then there is your database, it's better to specify UTF-8 as well, when you create you database schema, ie

Indeed Utf-8 does work (using jEdit). I am now trying to figure it out in Eclipse as well. But notepad does indeed allow you to "Save as" in Unicode as well (but does not mention which encoding exactly) though it does not appear to be utf8 maybe utf-16 or some kind of Microsoft unicode which does not work (this is what threw me off).

I am still wondering whether to implement this as files+DB mix or pure DB, I guess there is no one "correct" answer so I guess I will just make a decision and move on.