Pitfalls of Transactions with PHP

Why Transactions?

Transactions -- a database management system (DBMS) feature that enables data
recovery on internal errors -- raise significant controversy among developers.
Some view transactions as a crutch, while others view them as a necessity.

Using
transactions can save time and money, because transactions enable databases to
restore themselves. But inappropriate use of transactions means wasted overhead
and impeded performance. And transactions do not replace programming;
sadly, some developers have foregone error trapping and data validation,
thinking that transactions will catch all errors.

Transaction support in a DBMS or its table types is defined using the ACID test:

Atomicity is an all-or-none proposition.

Consistency guarantees that a transaction never leaves your database in a half-finished state.

Isolation keeps transactions separated until they're finished.

Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination.

Initially, the MySQL DBMS did not support transactions. As of version
3.23-max, MySQL supports transactions with two additional table types, InnoDB
and BDB. InnoDB is recommended for its CPU efficiency and support of row-level
and multiple-concurrency locking. Now that MySQL has full transaction support,
PHP developers need to incorporate them effectively. Although PHP 4.x lacks
native transaction functions, transaction statements can be used the same way
as standard SQL queries.

When to Use Transactions

To maximize the use of transactions, developers should follow these simple
guidelines:

Use transactions for mission-critical actions that must all complete. Completing an online purchase is mission-critical. Browsing a list of top sellers is not.

Use transactions for data subject to race conditions. Even though the
database server and operating system are multi-threaded and can process
multiple queries concurrently, the specific group of queries must be processed
in order and cannot be processed independently out of order.

Use transactions in conjunction with, not as a replacement for data
validation and programmatic error trapping.

By supporting transactions, the DBMS provides an extreme level of
reliability and drastically reduces the need for manual recovery and data
integrity validation. However, any transaction, by definition, adds overhead
because at some level, the DBMS must lock data and take a snapshot of the data
at the outset. So transactions must be used appropriately.

Transactions Versus Errors

Developers can sometimes misunderstand the definition of a transaction
failure. The transaction will fail when the database engine itself has an error
that causes a query or queries not to complete. The most common misconception
is that 0 rows returned or affected are considered a failure. For example, if a
query of DELETE FROM tablex WHERE fieldy = 10 returns 0 affected
rows, that is considered a successful query according to the transaction
definition because the query completed and returned its affected rows. Whether
the number of affected rows matches the expected result is a matter for
programming, not transactions.

Some developers believe transactions mean the elimination of programming,
even using transactions in lieu of programmatic data validation. This is wrong:
transactions are intended to recover errors made internally by the database
engine or externally by extreme measures such as a power outage. Transactions
were never intended to account for user errors.

Web developers must trap errors because transactions do not prevent data
from being lost between the database and the web application. Theoretically, a
transaction could complete successfully, but the results of the transaction
could fail to reach the web application. If the web application connects to the
database server through a local host connection (using the UNIX socket), this is
generally not an issue unless the socket becomes corrupted or disabled.
However, if the web application connects to the database via the TCP/IP socket,
a packet can be dropped in transmission. From the database standpoint, the
transaction has succeeded, but if the web application has lost data, it has
failed.

For example, the database could execute a SELECT query
successfully and return a result set, but if those data rows are lost in the
transfer to the web application, the application would show 0 as the number
of rows. This is why before looping through a result set, a developer always
needs to verify that mysql_num_rows(result identifier) is non-zero
to make sure the web application has received the data successfully.

Assuming transactions cover all possible failures is a mistake. Programmers still need
to program!

As well, database developers still need to plan. Before the programmers
begin to code the web application, the database developers must make informed
decisions about when to use a transaction-safe table. MyISAM table types
typically execute SELECT queries much faster than InnoDB types.
According to recent PHP Oracle/MySQL benchmarks, InnoDB table types handle more inserts per second than BDB table types.

For this example we will use InnoDB for our
transaction-essential tables and MyISAM for tables not requiring transactions.

An Illustrated Example

Let's model a fictional e-commerce site that sells MP3 files. The database
will need a table to store customer contact information and another table to
represent each individual MP3 file the customer is purchasing. We will also
assume the use of temporary data to hold the user's shopping cart information. We will then transfer the information to permanent tables upon checkout. Any
shopping cart information that is not transferred to the permanent tables can
be purged. We will use temporary tables instead of sessions to hold the
shopping cart data because we want to keep statistics on user behavior, such as placing items in a shopping cart but not completing the transaction. (Once our
fictional web site is as successful as Amazon, we may switch to sessions that hold shopping cart data).

Transactions in the Example

Transfer data from the temporary order tables to the permanent order tables.

Update the permanent order tables to reflect their relationships to the customer table.

Delete data from the temporary order tables (if necessary).

This process exemplifies a transaction, as all stages contribute to a single
goal — the completion of a sale. If the database crashes during a single
step of this process, we immediately want to recover and restore the
information. For example, if the customer information fails to load, we want
to revert the database state back to the previous state, preserving the
shopping cart data in the temporary order tables.

These queries must execute in the specified order. If the deletion of data
from the temporary tables executes before the data is transferred to the
permanent tables, the operation will fail. Using transaction-safe tables will
prevent that problem. To ensure that these tables use the InnoDB handler, the
developer can use the ALTER statement to change the table type
after creation, or tack the table type on at the end of the CREATE
statement. The latter is preferred.

One database can use numerous table types. Assuming we are selling MP3 files
legitimately for download, we can assume a table to hold the name, description,
and price of each song. We will also assume separate linking tables to link
songs to categories (assuming a many-to-many relationship), songs to artists,
songs to CDs, and songs to record labels. These tables will be searched
frequently. The process of adding to the inventory is done in batch format
when traffic is slow. Also, since these are MP3 files, the purchasing of a song
does not affect inventory; adding a song to a shopping cart does not impact the
song table whatsoever. Therefore, these tables should be MyISAM, which
processes SELECT queries the fastest. This the default type in
MySQL as of version 3.23.

The programming logic can catch any SQL queries that fail or return no
results. To make these SELECT queries that search the inventory
part of a transaction would be a waste of overhead.

The best strategy for working with transactions is to apply the try/catch
programming strategy from Java. Let's examine a model of troubleshooting with PHP.