PDO_MYSQLND: Prepared Statements, again

2008/06/25by admin

Server-side Prepared Statements are an outdated technology from ancient times, are they? Brian gives a long list of arguments in his blog posting “Prepared Statements, Musings” why one should think twice before using server-side prepared statements. PDO does enforce the use of prepared statements for all statements which return a result set. Good or bad?

Server-side Prepared Statement are SQL statements which get prepared once and can be executed multiple times after the preparation. During the prepare state the database server will validate the SQL syntax, create a prepared statement object at the server and return a prepared statement handle to the client. Different database servers will perform different actions as part of the prepared statement object creation. Typically the prepared statement object at the server contains at least the parse tree of the SQL statement. More things can happen at this stage. Your database server might reserve and/or allocate internal server resources and buffers or run SQL optimizations.

The client can execute the prepared statement as many times as he wants. Upon subsequent executions, there is no need to parse the SQL statement again. The parse tree is already associated with the prepared statement object stored at the server. As a side-effect there is no need to transfer the entire SQL statement over the wire upon each execution. This can result in CPU and network bandwidth savings.

Prepared Statement are often considered as secure because of bound parameters. Application programmers can bind a variable (or a value) to placeholders inside of the prepared statement. For example you can bind the PHP variable $name to the prepared statement SELECT id FROM test WHERE name = ? during prepare. Whenever you execute the statement, the placeholder will be replaced with the current value of $name. Only the value of $name will be transferred to the server. The server reads the value and checks if it is a valid input. Note that I did not add quotes around the questionmark. The server knows from the parse tree what to expect and how to handle the input in a secure manner. You cannot fool the server with $name = '"' or similar.

The above sounds great such as the PDO introduction on prepared statements at http://php.net/manual/en/pdo.prepared-statements.php does. But before you go out and wipe the prepared statement syntax onto the back of each of your developers, try to understand pro’s and con’s of the architecture.

Multiply the number of concurrent connections by the number of prepared statements per connection to know why your server eats hardware – read Brian’s post

To to scale a “cache” move it from the server to the client. Scaling servers is tricky

For statements executed only once prepare() causes an unnecessary round-trip and “caching” work

MySQL: no “caching” of the execution plan, you won’t save the SQL optimization step

Secure bind parameters thanks to parse tree access

Preventing SQL injections should be an application task not a task of the (interchargable) storage layer

mysqli_real_escape_string() is available to build secure SQL statements

Input validation requires application logic not available in the database

SQL parsers could be made available at the client as well for the emulation of parameters

Bind parameters et. al. mean less network traffic

For statements which are run only once its not true due to the unnecessary prepare() round-trip

You could achieve the same with a clever emulated, client-side prepared statements friendly protocol

No portable standard syntax (ordinal vs. named parameter) exists

Prepared Statements and cursors go hand in hand: scrolling possible

Unbuffered result sets require and block server resources

Risk to forget to close as early as possible

For small result sets buffering and client-side scrolling is a good alternative

MySQL: cursors are materialized, they always go to disk

AFAIK, server-side prepared statements have been designed with clients in mind which remain connected to the server for a relatively long time. Think of good old centralized host applications. You power up 100 thin-clients in the morning hours which remain connected to the server all day. The clients run one application with very little dynamic SQL. “Caching” SQL statements, using server-side prepared statements makes perfectly sense in such an environment.

Compare this type of application with your PHP web application. Your PHP web application is request based and the life time of your scripts is very short. Prepared Statement handles do not survive a request. Go through your application and check how many statements you run more than once. If you rarely run the same query twice, don’t be surprised to see a performance penalty when using prepared statements.

I do not understand why PDO does enforce the use of prepared statements for all queries returning result sets (luckily you can work around this with MySQL). I do understand why JDBC is build around prepared statements. But PHP is not Java: no desktop application, no application server. PHP users should be given the choice to select the proper tool for the task.

If you hit an unsupported statement its not a bit deal: catch the 1295, ER_UNSUPPORTED_PS error and fall back to the non-prepared statement API. PDO does that for you – its transparent for the PHP user. You don’t need to worry about it.

The only limitation you might hit is on stored procedures returning result sets. For example, CREATE PROCEDURE p() BEGIN SELECT id FROM test ORDER BY id ASC LIMIT 3; SELECT id, label FROM test WHERE id < 4 ORDER BY id DESC LIMIT 3; END; creates a procedure which returns more than one result set. When using PDO_MYSQL and the MySQL client library (C API, formerly known as libmysql) you will not be able to use bool PDOStatement::nextRowset (void ) to fetch all result sets. However, PDO_MYSQLND and mysqlnd will lift this limitation. The following code from the new unpublished test pdo_mysql_stmt_nextrowset.phpt does pass our internal testing of PDO_MYSQLND and mysqlnd.

What you should be aware of when using server-side prepared statements with MySQL is the syntax for placeholders and the different communication protocol used by MySQL.

The syntax used for placeholders varies among different database vendors. MySQL has chosen to support the ordinal parameter syntax (SELECT id FROM test WHERE id = ?, bind(1, $myvar)). The MySQL server does not support named parameters (SELECT id FROM test WHERE id = :myid, bind(“myid”, $myvar)). Named parameters are emulated by PDO. I do not know of any standard on prepared statement parameters. Other databases do not support ordinal parameters but do support named parameters. In case of those other systems, PDO will emulate the ordinal parameters.

MySQL will always use the “binary protocol” with prepared statements. Traditionally the MySQL server has converted all data into strings before sending them to the client. When not using prepared statements, the client will retrieve all results as strings and has to convert them into the “correct” datatype, if need be. The binary protocol removes this conversion overhead.

The ext/mysqli PHP extension supports the binary protocol idea and tries to return native PHP types whenever possible. For example, when fetching an INT column using ext/mysqli and prepared statements, ext/mysqli will return an PHP integer variable to the caller. PDO_MYSQL, on the other hand, does always convert a value from an INT column into a PHP string no matter whether you are using server-side prepared statements or the prepared statement emulation. PDO_MYSQLND does fix this shortcoming and tries to return a proper native type whenever possible. If you do not want PDO_MYSQLND to return “correct” types, you can use PDO::setAttribute(PDO::ATTR_STRINGIFY_FETCHES, true) to enforce the PDO_MYSQL behaviour.

Not every database system supports prepared statemetns. Therefore PDO includes a prepared statement emulation.

PDO consists of a core and database specific drivers. The drivers provide an interface which the core uses to execute your PHP function calls. The prepared statement emulation is part of the core. Consequently, every driver does use the same emulation.

The prepared statement emulation contains a simple SQL parser as it must be able to handle placeholder substitution. But there is a problem with this approach. PDO does not aim to provide any SQL abstraction and it has to support all the different SQL dialects of the database vendors. Search bugs.php.net/ and pecl.php.net/bugs/ to learn how tricky it is to support each and every SQL dialect:

The PDO prepared statement emulation is still not free of bugs after many years as it seems. I believe that it will never be and drivers should be allowed to use their own specialized and optimized SQL parsers. Here is a simple example of what I mean. The PDO documentation of PDO::prepare() is correct when it describes the return value but it does not mention a little pitfall of the PDO design:

What the documentation leaves out that the PDO prepared statement emulation can fool you. When using the emulation the prepare() call will not invoke the database server. The database server will not validate your SQL statement. The PDO SQL parser will do the validation and accept invalid SQL such as SELECT ice FROM arctica AND antarctica, see also bug #44169. Both PDO_MYSQL and PDO_PGSQL can fool you, if you use their default settings. The only solution for MySQL is to explicitly disable the PDO prepared statement emulation approach by setting the PDO attribute PDO::ATTR_EMULATE_PREPARES to false.

For a complete semantical validation of a SQL statement you will always have to do a round-trip to the server. Even specialized SQL parser inside the PDO drivers would not be able to detect missing tables or misspelled columns. However, they could at least do a proper static syntactical analysis of the statement and detect “gotcha’s” like SELECT ice FROM arctica AND antarctica.

One last example on the fun you can have with the PDO prepared statement emulation is this SQL statement:

To demonstrate that it is not always desired to use prepared statements I have done a little benchmark with PDO_MYSQLND on my desktop system. I did on purpose use PDO in a sub-optimal way to show my point of “give the user the choice”. An unexperienced PDO user might not even be aware of the fact that all of the below will be executed as prepared statements. The way I use the PDO API does not make you think of prepared statements in the first line.

The test table is CREATE TABLE test(id INT, label VARCHAR(255)) ENGINE=MyISAM. The micro-benchmark is run on an AMD Athlon 64 Dual Core 5000+ with 4GB of memory. The MySQL Server 5.1.25-rc runs on a Core2Duo E6750 @ 2.66GHz with 4GB of memory. The computers are connected via 100/1000 MBit – not very exiting hardware. Both computers run on OpenSuse 10.3.

Without Prepared Statements

Server-side Prepared Statements

PDO::ATTR_EMULATE_PREPARES = true

PDO::ATTR_EMULATE_PREPARES = false

rows = 50, runs = 5000

INSERT

0.0111s

0.0098s

SELECT

2.9959s

3.2884s

Total

3.0007s

3.2982s

rows = 500, runs = 5000

INSERT

0.0975s

0.0976s

SELECT

13.4670s

15.4043s

Total

13.5645s

15.5019s

If you turn on the emulation of prepared statements by setting the PDO attribute PDO::ATTR_EMULATE_PREPARES to true, PDO will parse the SQL itself for parameter substitution and use the non-prepared statement API of the underlying database system. In case of MySQL, PDO will fall back to the C-call mysql_real_query(). PDO will not use mysql_stmt_prepare(), mysql_stmt_execute() and other calls from the prepared statement API. So to say, this is a backdoor which allows you to disable the use of prepared statements and ignore the PDO principle of enforcing the use of server-side prepared statements. However, even if you disable the use of server-side prepared statements you end up using a client-side prepared statement emulation. In any case you will be using a prepared statement style API for fetching results.

How do the two ways of running the INSERT and SELECT compare? Server-side prepared statements are some 10% slower – however, note that I’m running the SELECT 5.000 times to make the effect visible.

Let’s have a look at the INSERT statement. Its seems to qualify for a prepared statement as you can use bind parameters. Upon each execution you will not send the entire INSERT over the line but just the parameters. And you can prepare and parse it once and execute it many times. Maybe this is faster?

Server-side prepared statements and using bindParam() has the edge – its some 10% faster. However, do you want to know how to make it really fast? Try the MySQL-specific multi-insert syntax: INSERT INTO test(id, label) VALUES (1, "a"), (2, "b"). Its 12x faster for this micro-benchmark.

Note that I’m using MyISAM which is a non-transactional engine. With transactions and transaction logs you’ll have to use try-and-error to find the optimal size of a transaction. In the worst case yout multi-insert transaction does exceed internal buffers for transaction handling and waits will occur. Don’t forget the time required for a ROLLBACK of a large transaction either in case the INSERT operation fails.

The SELECT statement from my example does not profit from being run as a prepared statement. It contains no parameters and its gets executed only once.: 10% performance loss, see above. The likely reason is the additional round-trip caused by preparing the statement. The performance-loss will vary with the time required for the round trip and its relation to the fetch time as well as the number of executions.

If the number of executions grows but only one row will be fetched and the fetch time remains very small, the picture shows that server-side prepared statements and the PDO emulation are about equal fast. Preparing SELECT label FROM test WHERE id = ? once and fetching 256 rows of n-bytes each from the table test gives the below results. Differences below 3%, like those for 256 bytes per row, are within the metering precision and are not significant.

Without Prepared Statements

Server-side Prepared Statements

PDO::ATTR_EMULATE_PREPARES = true

PDO::ATTR_EMULATE_PREPARES = false

rows = 50, runs = 256

20 bytes

3.0748s (100%)

3.2495s (106%)

255 bytes

3.3829s (100%)

3.4970s (103%)

2048 bytes

4.8837s (100%)

4.7942s (98%)

4096 bytes

6.0851s (100%)

5.7563s (94%)

The results seem to proof that the use of server-side prepared statements makes most sense for large result sets. Therefore, I have run another micro-benchmark which creates a table with many columns, prepares SELECT * FROM test and executes it many times to stabilize the results. All rows will be fetched after each execution.

Without Prepared Statements

Server-side Prepared Statements

PDO::ATTR_EMULATE_PREPARES = true

PDO::ATTR_EMULATE_PREPARES = false

rows = 2000, runs = 10

5 columns, 226 bytes

0.1119s (100%)

0.1005s (89%)

10 columns, 442 bytes

0.2661s (100%)

0.1773s (66%)

40 columns, 2493 bytes

1.0757s (100%)

0.9173s (85%)

100 columns, 7952 bytes

2.9114s (100%)

2.3648s (82%)

500 columns, 40813 bytes

15.9159s (100%)

11.7043s (73%)

It seems that server-side prepared statements clearly win this micro-benchmark. However, whenever you read benchmark results check how your system performs for the same benchmark. And try to understand what the benchmark does. For example, the above micro benchmark runs the same query 10 times. Will your application run the same SELECT statement 10 times? Does your application deal with result sets as large as 2000 rows of 500 columns with 40813 bytes per row (2000 x 40813 = 77MB result set)?

Looking at the above micro-benchmark results it needs further testing to say if server-side prepared statements will result in a performance win. For example, I did not extensively test BLOB columns for which – in case of MySQL – you can expect a performance benefit when using prepared statements and their “new” client-server protocol.

If performance is not the primary reason for using server-side prepared statements why do so many people advocate their use for PHP applications? I can only agree with Lukas, who has blogged about prepared statements when I already had started drafting this posting, that its about PHP users looking for a convenient way to secure their applications against SQL injections. However, I don’t think that prepared statements have been designed for this.

There seems to be a strong demand among the PHP user base for a fool-safe way to secure database calls against SQL injections. PHP application developers seem to want to move the application developers task of securing their application towards the storage layer. Strictly speaking mysqli_real_escape_string() and disabling MySQL’s multi-query (disabled by default with mysqlnd) is all a PHP developer needs to secure his database calls against SQL injections. But for convenience reasons the use of prepared statements has been announced as a best practice. If you ask me its not best practice. It would be better to continue to secure an application against attackers instead of (ab)using a side-effect of prepared statements.

Brian has given prepared statements a criticial look from the perspective of resources. Lukas has added the aspect of security to the discussion. Both presented dashboard ideas for a complete redesign. As long as a new solution is not available, I would appreciate if everybody recommending prepared statements as a best practice teaches the pros and cons of the technology in the context of PHP applications: neither does a portable standard definition for server-side prepared statements exist nor did PDO manage to provide a convincing solution for client-side (emulated) prepared statements. Consider that when talking about the best practice “prepared statements”…

11 Comments

A few points:
1) the list of constructs that support PS is based on 5.1
2) the fix to enable the query cache with PS is in 5.1 and it will only work via the prepared statement API (not using PREPARE/EXECUTE statements)
3) 5.1 is still not GA and most of us will be stuck on 5.0 for at least a while
4) that transparent retry for constructs that are not supported via PS in PDO is a bit scary in terms of performance. it should throw a notice or something

A side issue unrelated to PS:
mysql_real_escape_string() is not meant to handle identifier quoting. So if we take this issue from the perspective of the need for a security tool, then identifier quoting should also be put on the table.

Oh I should also note that PS are not part of the SQL standard. I think there is something similar in the standard for embedded SQL (only a hand ful of old school languages have gotten official host status in the standard) that I can see in my copy of “SQL99 complete really!”, but its not the same as what we all have become accustomed to calling PS.

I don’t think it matters what statements are supported by MySQL’s prepared statements. The list of supported statements is not that bad in 5.0 either. But its no secret that overall the list of issues is long – thanks for adding some details to it.

My point is that focussing on prepared statements only in PDO is very questionable due to the many issues both with server-side prepared statements and the prepared statement emulation of PDO itself. Let’s be honest about the status of PDO. Let’s be honest about the pro’s and con’s when advocating prepared statements for PHP or PDO.

As we discussed in a private mail exchange earlier: its time for something new. Its time to re-think PS itself and the PS advocacy of the past.

A litte side-note for everybody who tries to tell me that PDO is widely used and/or well tested.

Two weeks ago we catched a wonderful bug in PDO_MYSQL[ND] causing wrong results when fetching numbers. As explained in the blog posting PDO_MYSQL does return all results as strings. The calculation of the required string buffer to hold a value from a numeric column did not take into account the minus sign required for negative values. What happened was that, for example, PDO_MYSQL has allocated a buffer for 4 numbers to represent -9999 to 9999 as a string. Look at -9999 – you need a buffer of 4 + 1 = 5 signs to represent it as a string. Fetching -9999 from the database would have returned 9999.

Did I miss the bug report or did nobody ever notice this?

Yes, I don’t like PDO 1.0. I want PDO 1.x ASAP. It would be a great thing for PHP.

I disagree with this statement: “Preventing SQL injections should be an application task not a task of the (interchargable) storage layer”. The storage layer should handle preventing SQL injections, /because/ it is interchangeable. I check my data for correctness for my purposes, and then I want to simply store and retrieve it, not worry about nuances of versions of SQL languages my particular current storage engine has. Thus there should be a nice way (maybe call it Prepared Statements lite), where you separate the SQL and the VALUES in two separate variables. This is really what people are trying to do by using server side prepared statements, they want to separate the database specific SQL stuff and the stuff they really care about, their data. mysql.execute(“INSERT into tTable(name, description) VALUES(?,?)”, mysql.String(a), mysql.String(b));

Basically combining the prepared statement and the bindings into a single request to the server. The server then does all the magic, does not cache anything, everyone is happy, and we achieve separation of CODE (SQL) and DATA, which really really should have been done in the first place 😉

I was asked by a developer at the MySQL Conference if Zend_Db (the framework I worked on) protected against SQL Injection. I tried to tell him about how to use escaping functions, but he seemed to want something to do it all for him. This shows that some developers have a fundamental misunderstanding of how to solve SQL Injection.

Neither mysql_real_escape_string() nor prepared statements are guarantees of secure query design. The developer has responsibility to do the right thing and not interpolate tainted application variables into a SQL query. This applies to SQL values, SQL identifiers, or any other SQL syntax generated dynamically in the application.

Functions to escape interpolated values or parameterize values in a query are tools that the developer must *use*. These tools are not an automatic “proof” against SQL Injection. The developer must use his or her judgement and employ these tools appropriately.

There is no way to protect against SQL Injection automatically, if you allow dynamic SQL at all. Like it or not, developers have to use their brains sometimes!

Regarding Brian’s concern about resource management in the server, if prepared statement handles were scoped to a transaction, instead of the connection, could the resources on the server be freed when the transaction finishes? This would reduce the accumulation of resources on the server for statement handles that might be needed.

Freeing statement handles promptly at the finish of a transaction would also reduce the anomalies that Lukas is concerned about, when prepared statements are used in long-running connections.

Regarding the performance overhead of the extra round-trip for prepare, could this be mitigated if MySQL surfaced an execute-direct API call with support for bound parameters? This type of call has been specified in the SQL CLI forever (well, since ODBC 1.0) but MySQL doesn’t support such a mechanism. This would allow the security benefit of prepared statements, without the network overhead.

The server-side statement handle could also be freed immediately during a SQLExecDirect() call, so there’d be zero resource accumulation.

I think it should be noted that PDO isn’t designed to force the use of server-side prepared statements; the API is intended to force you to separate your parameters from your SQL for the purpose of avoid SQL injection problems and the tedious manual coding required to properly secure them.

It just so happens that this marries up to native APIs used by many database client APIs, and for the high end commercial databases it tends to be the optimal way to express commonly recurring queries.

Both the mysql and postgres PDO drivers did not initially support native prepared statements, and when that code was added (due to popular demand) the next request was for a way to disable it… because both databases suffer(ed) from various breakages in their implementations, either on the server side or in the client library implementation. And yes, I admit that PDO does have bugs too.

You, as a driver implementor, are free to decide whether you default to using server side prepares or not, or even whether you allow their use at all; whatever works best.