The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

XML encoding type to use

Hi Guys,

I have written a php application that parses an XML file and uploads the values to a database. However, when I try to upload a different language like french for example my PHP outputs errors as shown below.

Yes, but you have to make sure you are using utf-8 all the way through, from your text editor, to mysql via PHP and into your output to webpages' html declarations. If you are serious about i18n then it seems the best way to go.

Yes, but you have to make sure you are using utf-8 all the way through, from your text editor, to mysql via PHP and into your output to webpages' html declarations. If you are serious about i18n then it seems the best way to go.

Thank you for your help. The MYSQL database I am using uses "latin1_general_ci" collation. Obviously, I will need to change this to utf-8 if I am going to use utf-8. Is there any alternative encoding that will support "latin1_general_ci"?

Collation is an algorithm for comparing strings. It's mostly used for sorting correctly. For example, in German &#246; comes in the middle of the alphabet, while the same character comes as the second last in Swedish. The collation controls this. A collation only works for the encoding, that it was intended for. That's why they are named like that. The collation doesn't change how data comes in and out of the database.

MySql is charset aware, so you can set the charset on a per-connection basis. If you set the connection to be utf-8, then MySql will assume that you pass it utf-8 encoded data. MySql is also aware of how data is stored internally. You can set this globally or per-table. If the connection charset differs from the storage charset, MySql will convert on in/out. This means that if you pick a storage charset that doesn't support all characters that you use, you're in trouble. Therefore, if you use utf-8 for the connection, it's a good idea to use utf-8 for storage.

Note that php isn't charset aware. Thus it's your responsibility to make sure that the data you pass to MySql is in the proper encoding. You can generally assume that input from browsers (eg. $_GET and $_POST) will be encoded in the same charset that the form which presented the form was in. In Firefox, you can go to the menu View -> Character Encoding and see what is selected. See Character Sets / Character Encoding Issues for more details on this.