When the 'charset' option is available to the Zend_Db::Factory(), do the following in the adapters _connect method:
||Adapter||Proposed Solution||
|Mysqli|use mysqli_set_charset|
|Oracle|use the charset option for oci_connect|
|Db2|no solution|
|Pdo_Ibm_Db2|no solution|
|Pdo_Ibm_Ids|no solution|
|Pdo_Mssql|no solution,depends on server configuration|
|Pdo_Mysql|use PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES "utf8''|
|Pdo_Oci|include charset in DSN|
|Pdo_Pgsql|use SET NAMES|
|Pdo_Sqlite|not needed|

Comments

Posted by Bill Karwin (bkarwin) on 2007-06-14T18:34:20.000+0000

This is also relevant for OCI adapter and others, but the implementation varies.

mikespook writes:

I used utf8 for my charset with Oracle.

But, there are no item can set the charset in the param of the Zend_Db::factory().
And you konw, MS Windows use GB18030 for the charset in China.
So when I send a SQL to the Oracle with charset utf8, I got a error.

Since the PDO database adapters use base (abstract) method for creating a db connection I've added a check ('mysql' == $this->_pdoType) so the 'set names' query is issued only for PDO_Mysql adapter (I don't know how other databases handle this issue).

I really hope this issue will be fixed in the next minor release, it has been long overdue.

If there is something else I can do to help, please let me know.

Posted by Goran Juric (gog) on 2008-05-25T16:43:18.000+0000

Please disregard the patch for Pdo_Mysql, haven't really worked with Pdo till now. "SET NAMES" do work for Pdo_Mysql adapter!!! You just have to pass it as a driver option...

On Pdo_Mysql I've always just used $db->query('SET NAMES UTF8'); without any problems.
I tested it to make sure it works correctly.

Posted by Goran Juric (gog) on 2008-05-26T03:05:04.000+0000

Ofcourse it does. But if you send a query you are forcing you app to connect to the database. There are times that you don't want to connect to the database if it isn't really needed, that's what lazy loading is for.

This way, you can use Pdo_Mysql, utf8, and still have lazy loading of the db.

But this issue is still unresolved because in Mysqli you do have to send "SET NAMES 'utf8'" after connecting to the database.

Posted by Pádraic Brady (padraic) on 2008-05-26T13:36:21.000+0000

I would love to see this implemented. At the moment managing lazy loading is either something you get creative about, or with a light application simply ignore. Getting it internalised would make life that bit easier.

any way .. could this at least be fixed in the Oracle Adapter since oci_connect supports charset ? :D

Posted by Wil Sinclair (wil) on 2008-06-09T17:34:38.000+0000

All right, this item officially has twice as many votes as the next popular item. If we don't implement it for 1.6 we risk user revolt. :0 Simon, is this something that you'll be able to do. If not, I'll troll for help from the rest of the community.

That's great to provide those patches, but I think the interface to supply character set to each respective brand of database is only half the problem.

The other part of the problem is that a given character set may be known by a different name in each database. For example, MySQL uses 'utf8' where Oracle uses 'AL32UTF8'. Oracle even has a charset called 'utf8' but it's different.
See http://decipherinfosys.wordpress.com/2007/01/…

This makes it hard to create a set of config parameters that can be used orthogonally for several brands of database. You'd have to know the different names in each database for any given charset you want to use.

How would I recommend solving this? The {{Zend_Db_Adapter}} interface currently supports a config parameter 'driver_options', which is an associative array used for options that are specific to each brand of database. I would suggest that {{Zend_Db_Adapter}} use 'driver_options'->'charset' for the name of the charset as it is known by the brand of database.

Then you can implement a charset abstraction layer. Decide on a set of core charsets that will be supported by {{Zend_Db}}, and define these as {{const}} strings.

class Zend_Db
{
const CHARSET_UTF8 = utf8;
. . .
}

Then in each {{Zend_Db_Adapter}} class, create a protected associative array mapping the abstract charset names to native charset names. If the database doesn't support one, or if you simply don't know what it it's called, leave it out of the mapping array.

Then if I pass a config parameter 'charset' => 'utf8' to my Db Adapter factory, it'll perform the mapping and add the entry to the driver_options, where it will get applied using code similar to your patch.

Developers can still use uncommon charsets that are not in the set of core abstract charsets supported by Zend_Db's mapping table. They just need to specify it in the 'driver_options' array by its native name, instead of in the 'charset' parameter.

You don't necessarily have to fill out the full set of core charsets in every adapter class. But you do need to throw an exception if I pass an abstract charset name that isn't supported in the adapter I chose.

Since this has been sitting open for so long, I went ahead and took a stab at making a patch. So here it is, against current SVN r11422. Adds charset support to Mysqli, PDO_Mysql, and PDO_Pgsql adapters. A new optional parameter 'charset' is passed to the adapter.

Posted by C Snover (snover) on 2008-09-17T07:22:53.000+0000

Whoops. And then there was JIRA not showing me the comment discussion. I feel extremely embarrassed and am going to crawl into a hole now. :)

Posted by David Berlioz (quazardous) on 2008-12-03T00:33:50.000+0000

Hi,

do not forget Zend_Db_Adapter_Oracle : need charset support too !!!

it's ok in PDO_OCI, but PDO_OCI is still unstable and chrashes very often (try use clob :( u'll see ).

Usage of SET NAMES for setting the charset of the connection, the result and the client for MySQL is WRONG, WRONG, WRONG! Opens the door for SQL Injections.
SET NAMES exists only for the sake of the mysql command line client where there is no way to set the connection's character set when you connect, thus SET NAMES is used. mysql's API, libmysql, has means of setting the character set, for good reasons. Using SET NAMES can lead to security problems with specific character sets...Think - SQL Injection.
Either use:
http://bg.php.net/manual/en/…
or
mysqli_options() on a handle returned from mysqli_init() and this should be done before mysql_real_connect()

$mysql = mysqli_init();
mysqli_options($mysql, 7, "utf8"); //sorry for the magic number but ext/mysqli doesn't register the value of MYSQL_SET_CHARSET_NAME
// coming from mysql.h as constant. In mysql.h it is 7 and won't change for the time being.
mysqli_real_connect($mysql, ...);

Andrey

Posted by Tobias Petry (ice-breaker) on 2009-02-25T08:00:01.000+0000

Andrey did not provide any information about the problem witht this type of sql injections and while it is hard to find anything on that i provide a source: http://ilia.ws/archives/…

Posted by Yanick Rochon (yanick) on 2009-03-03T11:24:55.000+0000

I needed to have UTF-8 encoding on MySQLi driver, here's a quick patch that I'm using to solve this problem. I would hope every implementation would support this option.

Bill: I like the idea of supporting charsets by using the constants. However, I feel like that's a different issue than this one. The focus of this issue should be on supporting setting the charset in the first place. Adding a layer of abstraction on top of that is pretty much a standalone task, even though its implementation would probably rework a part of the code that's being updated for the charset support. Seeing as a lot of people need this feature it might not be worth waiting for a full implementation of the charset abstraction proposal.

Seeing as I need support for this in the Oci8 adapter, I'm going to write a patch for that today. If there's no solution for Db2 (Db2 experts, please comment!) I would like to suggest implementing the suggestions from the table to get this issue resolved.

Posted by Sorin Alin Stoiana (sorin) on 2009-04-05T16:29:24.000+0000

@Jurrien FYI, the patch I submitted here [#action_23500 ] also provides support for Oci8

This issue has been resolved for the Pdo_Mysql, Pdo_Pgsql, Mysqli and Oracle drivers in revision 14691 and 14692. At the moment I can only verify that this works on Oracle. Before closing this issue I would like to confirm there are no issues with the patch. Would everybody be so kind to verify that the patch works for Pdo_Mysql, Pdo_Pgsql and Mysqli? You can specify the charset in the adapter config using the 'charset' key.

Posted by Sorin Alin Stoiana (sorin) on 2009-04-06T01:34:21.000+0000

A more complete patch is [here|#action23500] . It implements charset support for Mysql, Oracle, Pdo_Mysql, Pdo_Oracle and Pdo_Pgsql, as well as unit tests for these. It has been tested with Oracle XE, Oracle Real Application Cluster, Mysql, and Pgsql.

Why was this patch disregarded? From what I read in the Contributor's Guide, all changes must come with unit tests.

While I approve the kind of swift action Jurrien took in solving the issue at hand, I believe we shouldn't compromise when it comes to providing fixes.

A test was available already, but was marked as skipped. It's enabled in revision 14952 and successfully passes on Mysqli, Pdo_Mysql, Pdo_Oci and Oracle.
Still looking for someone to test on Pdo_Pgsql.

I've installed Postgres for myself and identified and fixed one last issue there. As of revision 14953 I can successfully run the test on Mysql, Pdo_Mysql, Oci8 and Pdo_Pgsql. Pdo_Oci already supported the charset option. Closing issue as "fixed". :)

Posted by Paul Voss (chmuul) on 2010-01-06T06:48:49.000+0000

I'm working with Zend Framework Version 1.9.6 here and cannot confirm that this issue has been resolved (using pdo_mysql rev 16942).
My html output is set to utf-8, the database runs on utf-8 as well (utf8_general_ci) but all special chars are messed up.