Introduction

Databases are central to many web applications. A database can hold almost any collection of information you may want to search and update, such as a user list, a product catalog, or recent headlines. One reason why PHP is such a great web programming language is its extensive database support. PHP can interact with (at last count) 17 different databases, some relational and some not. The relational databases it can talk to are DB++, FrontBase, Informix, Interbase, Ingres II, Microsoft SQL Server, mSQL, MySQL, Oracle, Ovrimos SQL Server, PostgreSQL, SESAM, and Sybase. The nonrelational databases it can talk to are dBase, filePro, HyperWave, and the DBM family of flat-file databases. It also has ODBC support, so even if your favorite database isn't in the list, as long as it supports ODBC, you can use it with PHP.

If your data storage needs are simple and you don't need to serve many users, you may be able to use a plaintext file as a makeshift database. This is discussed in Recipe 10.2. Text files require no special database software but are appropriate only for lightly used, basic applications. A text file can't handle structured data well; if your data changes a lot, it's inefficient to store it in a plain file instead of a database.

DBM flat-file databases, discussed in Recipe 10.3, offer more robustness and efficiency than flat files but still limit the structure of your data to key/value pairs. They scale better than plaintext files, especially for read-only (or read-almost-always) data.

PHP really shines, though, when paired with a SQL database. This combination is used for most of the recipes in this chapter. SQL databases can be complicated, but they are extremely powerful. To use PHP with a particular SQL database, PHP must be explicitly told to include support for that database when it is compiled. If PHP is built to support dynamic module loading, the database support can also be built as a dynamic module.

Many SQL examples in this chapter use a table of information about Zodiac signs. The table's structure is:

The specific functions required to talk to the database differ with each database, but each follows a similar pattern. Connecting to the database returns a database connection handle. You use the connection handle to create statement handles, which are associated with particular queries. A query statement handle then gets the results of that query.

This example retrieves all the rows from the zodiac table with Oracle, using the OCI8 interface:

The OCILogin( ) function connects to a given Oracle instance with a username and password. You can leave out the third argument (the instance) if the environment variable ORACLE_SID is set to the desired Oracle instance. A statement handle is returned from OCIParse( ) , and OCIExecute( ) runs the query. Each time OCIFetch( ) is called, the next row in the result is retrieved into a result buffer. The value of a particular column of the current row in the result buffer is retrieved by OCIResult( ) .

In this case, pg_connect( ) connects to PostgreSQL using the provided database name, user, and password. The query is run by pg_exec( ) . There's no need for a separate parse and execute step as with Oracle. Because pg_fetch_row( ) retrieves a specific row from the result set into an array, you loop over all the rows (using pg_numrows( ) to get the total number of rows) and print out each element in the array.

First, mysql_connect( ) returns a database handle using the provided hostname, username, and password. You then use mysql_select_db( ) to indicate which database to use. The query is executed by mysql_query( ) . The mysql_fetch_row( ) function retrieves the next row in the result set and NULL when there are no more rows; use a while loop to retrieve all the rows.

Each example prints out all the data in the zodiac table, one row per line, with spaces between each field, as shown here:

Recipe 10.5 through Recipe 10.9 cover the basics of sending queries to the database and getting the results back, as well as using queries that change the data in the database.

There are a number of options and optimizations for each database PHP supports. Most database interfaces support persistent connections with separate connection functions. In the previous three examples, you would use OCIPLogon( ) , pg_pconnect( ), and mysql_pconnect( ) for persistent instead of single-request connections.

If you require a database-specific set of functions, the PHP online manual section for each database has many useful tips for proper configuration and use. If you can, use a database abstraction layer instead. Starting with Recipe 10.4, all the SQL examples use the PEAR DB database abstraction layer, which minimizes the amount of code that has to change to make the examples work on different databases. Here's code that can display all the rows in the zodiac table using DB and MySQL:

The only thing that needs to change to make this code work on another database is the argument passed to DB::connect( ) , which specifies what database to connect to. However, a database abstraction layer doesn't make SQL completely portable. Each database vendor generally has custom SQL extensions that enable handy features on one database and don't work at all on another database.

While it's possible to write SQL that works on different databases with a minimum of changes, tuning a database for speed and efficiency is not portable. Having portable database interactions can be a useful goal, but it needs to be balanced with the likelihood of your code being used with multiple databases. If you're writing code for wide distribution, working with many databases is a plus. If your code is an internal project, however, you probably don't need to be as concerned with database independence.

Whatever database you're using, you're probably going to be capturing information from HTML form fields and storing that information in the database. Some characters, such as the apostrophe and backslash, have special meaning in SQL, so you have to be careful if your form data contains those characters. PHP has a feature called "magic quotes" to make this easier. When the configuration setting magic_quotes_gpc is on, variables coming from GET requests, POST requests, and cookies have single quotes, double quotes, backslashes, and nulls escaped with a backslash. You can also turn on magic_quotes_runtime to automatically escape quotes, backslashes, and nulls from external sources such as database queries or text files. For example, if magic_quotes_runtime is on, and you read a file into an array with file( ), the special characters in that array are backslash-escaped.

For example, if $_REQUESTS['excuse'] is "Ferris wasn't sick," and magic_quotes_gpc is on, this query executes successfully:

Without the magic quotes, the apostrophe in "wasn't" signals the end of the string to the database, and the query produces a syntax error. To instruct magic_quotes_gpc and magic_quotes_runtime to escape single quotes with another single quote instead of a backslash, set magic_quotes_sybase to on. Recipe 10.10 discusses escaping special characters in queries. General debugging techniques you can use to handle errors resulting from database queries are covered in Recipe 10.11.

The remaining recipes cover database tasks that are more involved than just simple queries. Recipe 10.12 shows how to automatically generate unique ID values you can use as record identifiers. Recipe 10.13 covers building queries at runtime from a list of fields. This makes it easier to manage INSERT and UPDATE queries that involve a lot of columns. Recipe 10.14 demonstrates how to display links that let you page through a result set, displaying a few records on each page. To speed up your database access, you can cache queries and their results, as explained in Recipe 10.15.

Using Text-File Databases

Problem

You want a lightweight way to store information between requests.

Solution

Use a text file with advisory locking to prevent conflicts. You can store data in the text file in any useful format (CSV, pipe-delimited, etc.) One convenient way is to put all the data you want to store in one variable (a big associative array) and then store the output of calling serialize( ) on the variable:

$data_file = '/tmp/data';
// open the file for reading and writing
$fh = fopen($data_file,'a+') or die($php_errormsg);
rewind($fh) or die($php_errormsg);
// get an exclusive lock on the file
flock($fh,LOCK_EX) or die($php_errormsg);
// read in and unserialize the data
$serialized_data = fread($fh,filesize($data_file)) or die($php_errormsg);
$data = unserialize($serialized_data);
/*
* do whatever you need to with $data ...
*/
// reserialize the data
$serialized_data = serialize($data);
// clear out the file
rewind($fh) or die($php_errormsg);
ftruncate($fp,0) or die($php_errormsg);
// write the data back to the file and release the lock
if (-1 == (fwrite($fh,$serialized_data))) { die($php_errormsg); }
fflush($fh) or die($php_errormsg);
flock($fh,LOCK_UN) or die($php_errormsg);
fclose($fh) or die($php_errormsg);

Discussion

Storing your data in a text file doesn't require any additional database software to be installed, but that's pretty much its only advantage. Its main disadvantages are clumsiness and inefficiency. At the beginning of a request, you've got to lock your text file and haul out all your data from it, even if you're only using a little bit of the data. Until you unlock the file at the end of the request, all other processes have to wait around, doing nothing, which means all your users are waiting too. One of the great assets of databases is that they give you structured access to your data, so you only lock (and load into memory) the data you actually care about. The text file solution doesn't do that.

What's worse, the locking you can do with a text file isn't nearly as robust as what you can do with a database. Because flock( ) provides a kind of file locking called advisory locking, the only thing that prevents multiple processes from stepping on each other and trashing your data is politeness and diligent programming. There's no guarantee your data is safe from an innocently incompetent or intentionally malicious program.

Discussion

PHP can support a few different kinds of DBM backends: GDBM, NDBM, DB2, DB3, DBM, and CDB. The DBA abstraction layer lets you use the same functions on any DBM backend. All these backends store key/value pairs. You can iterate through all the keys in a database, retrieve the value associated with a particular key, and find if a particular key exists. Both the keys and the values are strings.

The following program maintains a list of usernames and passwords in a DBM database. The username is the first command-line argument, and the password is the second argument. If the given username already exists in the database, the password is changed to the given password; otherwise the user and password combination are added to the database:

The dba_open( ) function returns a handle to a DBM file (or false on error). It takes three arguments. The first is the filename of the DBM file. The second argument is the mode for opening the file. A mode of 'r' opens an existing database for read-only access, and 'w' opens an existing database for read-write access. The 'c' mode opens a database for read-write access and creates the database if it doesn't already exist. Last, 'n' does the same thing as 'c', but if the database already exists, 'n' empties it. The third argument to dba_open( ) is which DBM handler to use; this example uses 'gdbm'. To find what DBM handlers are compiled into your PHP installation, look at the "DBA" section of the output from phpinfo( ). The "Supported handlers" line gives you your choices.

To find if a key has been set in a DBM database, use dba_exists( ) . It takes two arguments: a string key and a DBM file handle. It looks for the key in the DBM file and returns true if it finds the key (or false if it doesn't). The dba_replace( ) function takes three arguments: a string key, a string value, and a DBM file handle. It puts the key/value pair into the DBM file. If an entry already exists with the given key, it overwrites that entry with the new value.

To close a database, call dba_close( ) . A DBM file opened with dba_open( ) is automatically closed at the end of a request, but you need to call dba_close( ) explicitly to close persistent connections created with dba_popen( ).

You can use dba_firstkey( ) and dba_nextkey( ) to iterate through all the keys in a DBM file and dba_fetch( ) to retrieve the values associated with each key. This program calculates the total length of all passwords in a DBM file:

The dba_firstkey( ) function initializes $k to the first key in the DBM file. Each time through the while loop, dba_fetch( ) retrieves the value associated with key $k and $total_length is incremented by the length of the value (calculated with strlen( )). With dba_nextkey( ), $k is set to the next key in the file.

You can use serialize( ) to store complex data in a DBM file, just like in a text file. However, the data in the DBM file can be indexed by a key:

While this example can store multiple users' data in the same file, you can't search, for example, a user's last access time, without looping through each key in the file. Structured data like this belongs in a SQL database.

Each DBM handler has different behavior in some areas. For example, GDBM provides internal locking. If one process has opened a GDBM file in read-write mode, other calls to dba_open( ) to open the same file in read-write mode will fail. The DB3 handler, however, provides no such internal locking; you need to do that with additional code, as discussed for text files in Recipe 18.25. Two DBA functions are also database-specific: dba_optimize( ) and dba_sync( ). The dba_optimize( ) function calls a handler-specific DBM file-optimization function. Currently, this is implemented only for GDBM, for which its gdbm_reorganize( ) function is called. The dba_sync( ) function calls a handler-specific DBM file synchronizing function. For DB2 and DB3, their sync( ) function is called. For GDBM, its gdbm_sync( ) function is called. Nothing happens for other DBM handlers.

Using a DBM database is a step up from a text file but it lacks most features of a SQL database. Your data structure is limited to key/value pairs, and locking robustness varies greatly depending on the DBM handler. Still, DBM handlers can be a good choice for heavily accessed read-only data; for example, the Internet Movie Database uses DBM databases.

Discussion

After loading the DB functions from DB.php, connect to the database with DB::connect( ), execute the query with $dbh->query( ) , and retrieve each row with $sth->fetchRow( ) . The Solution example connects to MySQL. To connect to Oracle instead, you just need to change $dsn. This variable holds the data source name (DSN), a string that specifies which database to connect to and how to connect to it. Here's the value for Oracle:

$dsn = 'oci8://david:foo!bar@ORAINST';

For PostgreSQL, $dsn is:

$dsn = 'pgsql://david:foo!bar@unix(/tmp/.s.PGSQL.5432)/test';

The PostgreSQL DSN is a little more complicated because it specifies that the connection should be made using a local Unix socket (whose pathname is /tmp/.s.PGSQL.5432) instead of a TCP/IP connection. In general, the form of a data source name is:

database_interface://user:password@hostname/database

The database_interface part of the DSN is the kind of database you're using, such as Oracle, MySQL, etc. Currently, PEAR supports 10 database backends, as listed in Table 10-1.

Table 10-1. PEAR DB backends

Name

Database

fbsql

FrontBase

ibase

Interbase

ifx

Informix

msql

Mini-SQL

mssql

Microsoft SQL Server

mysql

MySQL

oci8

Oracle (using the OCI8 interface)

odbc

ODBC

pgsql

PostgreSQL

sybase

Sybase

To use a particular PEAR DB backend, PHP must be built with support for the database that corresponds to the backend. Note that to use the Oracle OCI8 backend, PHP must have the OCI8 extension (--with-oci8 when building). The older PHP oracle extension (--with-oracle) isn't compatible with PEAR DB.

user and password are the username and password to use to connect to the database. hostname is usually the hostname that the database is running on, but it can also be the name of an instance (for Oracle) or the special syntax used previously to indicate a local socket. database is the name of the logical database to use, such as what you'd specify with the dbname parameter in pg_connect( ) or the argument to mysql_select_db( ).

Discussion

The fetchRow( ) method returns data, while fetchInto( ) puts the data into a variable you pass it. Both fetchRow( ) and fetchInto( ) return NULL when no more rows are available. If either encounter an error when retrieving a row, they return a DB_Error object, just as the DB::connect( ) and DB::query( ) methods do. You can insert a check for this inside your loop:

Recipe 10.10 goes into detail about when you need to quote values and how to do it.

By default, fetchRow( ) and fetchInto( ) put data in numeric arrays. You can tell them to use associative arrays or objects by passing an additional parameter to either method. For associative arrays, use DB_FETCHMODE_ASSOC:

Whatever the fetch mode, the methods still return NULL when there is no more data to retrieve and a DB_Error object on error. The default numeric array behavior can be specified with DB_FETCHMODE_ORDERED. You can set a fetch mode to be used in all subsequent calls to fetchRow( ) or fetchInto( ) with DB::setFetchMode( ) :

Discussion

All these functions return a DB_Error object if an error occurs in executing a query or retrieving the results. If the query returns no results, getRow( ) and getOne( ) return NULL; getAll( ), getCol( ), and getAssoc( ) return an empty array.

When returning results, getRow( ) returns an array or object, depending on the current fetch mode. The getAll( ) method returns an array of arrays or array of objects, also depending on the fetch mode. The single result getOne( ) returns is usually a string, because PHP database drivers generally cast retrieved results into strings. Similarly, getCol( ) returns an array of results whose values are usually strings. The results from getAssoc( ) are returned as an array. The type of elements of that array are controlled by the fetch mode.

Like DB::query( ), you can pass these functions a query with placeholders in it and an array of parameters to fill the placeholders. The parameters are properly quoted when they replace the placeholders in the query:

The parameter array is the second argument to each of these functions, except getCol( ) and getAssoc( ). For these two functions, the parameter array is the third argument. The second argument to getCol( ) is a column number to return if you don't want the first column (column number 0). For example, this returns the values of the planet column:

$cols = $dbh->getCol('SELECT symbol,planet FROM zodiac',1);

The second argument to getAssoc( ) is a boolean that tells the function whether to force the values in the associative array it returns to be arrays themselves even if they could be scalars. Take this query for example:

Because the query passed to getAssoc( ) asks only for two columns, the first column is the array key, and the second column is the scalar array value. Here's how to force the array values to be one-element arrays:

Just as fetchRow( ) and fetchInto( ) do, getRow( ), getAssoc( ), and getAll( ) put data in numeric arrays by default. You can pass them a fetch mode (the third argument to getRow( ) or getAll( ), the fourth argument to getAssoc( )). They also respect the fetch mode set by DB::setFetchMode( ).

Modifying Data in a SQL Database

Problem

Solution

With PEAR DB, use DB::query( ) to send an INSERT, DELETE, or UPDATE command:

$dbh->query("INSERT INTO family (id,name) VALUES (1,'Vito')");
$dbh->query("DELETE FROM family WHERE name LIKE 'Fredo'");
$dbh->query("UPDATE family SET is_naive = 1 WHERE name LIKE 'Kay'");

You can also prepare a query with DB::prepare( ) and execute it with DB::execute( ):

$prh = $dbh->prepare('INSERT INTO family (id,name) VALUES (?,?)');
$dbh->execute($prh,array(1,'Vito'));
$prh = $dbh->prepare('DELETE FROM family WHERE name LIKE ?');
$dbh->execute($prh,array('Fredo'));
$prh = $dbh->prepare('UPDATE family SET is_naive = ? WHERE name LIKE ?');
$dbh->execute($prh,array(1,'Kay');

Discussion

The query( ) method sends to the database whatever it's passed, so it can be used for queries that retrieve data or queries that modify data.

The prepare( ) and execute( ) methods are especially useful for queries that you want to execute multiple times. Once you've prepared a query, you can execute it with new values without re-preparing it:

$prh = $dbh->prepare('DELETE FROM family WHERE name LIKE ?');
$dbh->execute($prh,array('Fredo'));
$dbh->execute($prh,array('Sonny'));
$dbh->execute($prh,array('Luca Brasi'));

Discussion

In the Solution, the first execute( ) runs the query:

SELECT sign FROM zodiac WHERE element LIKE 'fire'

The second runs:

SELECT sign FROM zodiac WHERE element LIKE 'water'

Each time, execute( ) substitutes the value in its second argument for the ? placeholder. If there is more than one placeholder, put the arguments in the array in the order they should appear in the query:

$prh = $dbh->prepare(
"SELECT sign FROM zodiac WHERE element LIKE ? OR planet LIKE ?");
// SELECT sign FROM zodiac WHERE element LIKE 'earth' OR planet LIKE 'Mars'
$sth = $dbh->execute($prh,array('earth','Mars'));

Values that replace a ? placeholder are appropriately quoted. To insert the contents of a file instead, use the & placeholder and pass execute( ) the filename:

To tell execute( ) not to quote a value, use the ! parameter. This is dangerous when applied to user input; it's useful, however, when one of the values is not a scalar, but a database function. For example, this query uses the NOW( ) function to insert the current date and time in a DATETIME column:

To execute a prepared statement many times with different arguments each time, use executeMultiple( ) . Instead of just passing it one array of arguments as with execute( ), you pass it an array of argument arrays:

You must declare the array first and then pass it to executeMultiple( ), or PHP gives an error that says you are passing executeMultiple( ) a parameter by reference. Although executeMultiple( ) loops through each argument in the array, if it encounters an error part-way through, it doesn't continue on with the rest of the arguments. If all queries succeed, executeMultiple( ) returns the constant DB_OK. Because executeMultiple( ) never returns a result object, you can't use it for queries that return data.

The Interbase and OCI8 DB backends take advantage of native database features so that prepare( )/execute( ) is more efficient than query( ) for INSERT/UPDATE/DELETE queries. The Interbase backend uses the ibase_prepare( ) and ibase_execute( ) functions, and the OCI8 backend uses the OCIParse( ) , OCIBindByName( ), and OCIExecute( ) functions. Other database backends construct queries to execute by interpolating the supplied values for the placeholders.

Discussion

The number of rows in a result set is a property of that result set, so that numRows( ) is called on the statement handle and not the database handle. The number of rows affected by a data manipulation query, however, can't be a property of a result set, because those queries don't return result sets. As a result, affectedRows( ) is a method of the database handle.

If $planet is Melmac, $dbh->quote($planet) if you are using MySQL returns 'Melmac'. If $planet is Ork's Moon, $dbh->quote($planet) returns 'Ork\'s Moon'.

Discussion

The DB::quote( ) method makes sure that text or binary data is appropriately quoted, but you also need to quote the SQL wildcard characters % and _ to ensure that SELECT statements return the right results. If $planet is set to Melm%, this query returns rows with planet set to Melmac, Melmacko, Melmacedonia, or anything else beginning with Melm:

Because % is the SQL wildcard meaning "match any number of characters" (like * in shell globbing) and _ is the SQL wildcard meaning "match one character" (like ? in shell globbing), those need to be backslash-escaped as well. Use strtr( ) to escape them:

strtr( ) must be called after DB::quote( ). Otherwise, DB::quote( ) would backslash-escape the backslashes strtr( ) adds. With DB::quote( ) first, Melm_ is turned into Melm\_, which is interpreted by the database to mean "the string M e l m followed by a literal underscore character." With DB::quote( ) after strtr( ), Melm_ is turned into Melm\\_, which is interpreted by the database to mean "the string Melm followed by a literal backslash character, followed by the underscore wildcard."

A quote method is defined in the DB base class, but some of the database-specific subclasses override that method to provide appropriate quoting behavior for the particular database in use. By using DB::quote( ) instead of replacing specific characters, your program is more portable.

Quoting of placeholder values happens even if magic_quotes_gpc or magic_quotes_runtime is turned on. Similarly, if you call DB::quote( ) on a value when magic quotes are active, the value gets quoted anyway. For maximum portability, remove the magic quotes-supplied backslashes before you use a query with placeholders or call DB::quote( ):

Discussion

When they encounter an error, most PEAR DB methods return an DB_Error object. The DB::isError( ) method returns true if it's passed a DB_Error object, so you can use that to test the results of individual queries. The DB_Error class is a subclass of PEAR::Error, so you can use methods such as getMessage( ) to display information about the error. If you want to display everything in the error object, use print_r( ) :

Using setErrorHandling( ) lets you define a behavior that's invoked automatically whenever there's a database error. Tell setErrorHandling( ) what to do by passing it a PEAR_ERROR constant. The PEAR_ERROR_PRINT constant prints the error message, but program execution continues:

To print out an error message and then quit, use PEAR_ERROR_DIE . You can also use the PEAR_ERROR_CALLBACK constant to run a custom function when an error is raised. This custom function can print out even more detailed information:

When the incorrect SQL in the $dbh->query( ) method raises an error, pc_log_error( ) is called with the DB_Error object passed to it. The pc_log_error( ) callback uses the properties of the DB_Error object to print a more complete message to the error log:

DB Error: no such field (SELECT aroma FROM zodiac WHERE element
LIKE 'fire' [nativecode=Unknown column 'aroma' in 'field list'])

To capture all the data in the error object and write it to the error log, use print_r( ) with output buffering in the error callback:

With the PEAR_ERROR_TRIGGER constant, setErrorHandling( ) uses PHP's trigger_error( ) function to generate an internal error. This error is handled by PHP's default error handler or a user-defined error handler set by set_error_handler( ). By default, the internal error is an E_USER_NOTICE :

<br />
<b>Notice</b>: DB Error: no such field in <b>/usr/local/lib/php/PEAR.php</b> \
on line <b>593</b><br />

Make the error an E_USER_WARNING or E_USER_ERROR by passing a second argument to setErrorHandling( ):

Building Queries Programmatically

Problem

You want to construct an INSERT or UPDATE query from an array of field names. For example, you want to insert a new user into your database. Instead of hardcoding each field of user information (such as username, email address, postal address, birthdate, etc.), you put the field names in an array and use the array to build the query. This is easier to maintain, especially if you need to conditionally INSERT or UPDATE with the same set of fields.

Solution

To construct an UPDATE query, build an array of field/value pairs and then join( ) together each element of that array:

Discussion

The DB::autoPrepare( ) method is concise and easy to use if you have a recent version of DB. PHP 4.2.2 comes with DB 1.2. Newer versions of DB can be downloaded from PEAR. Use method_exists( ) to check whether your version of DB supports autoPrepare( ):

If you can't use DB::autoPrepare( ), the array-manipulation techniques shown in the Solution accomplish the same thing. If you use sequence-generated integers as primary keys, you can combine the two query-construction techniques into one function. That function determines whether a record exists and then generates the correct query, including a new ID, as shown in the pc_build_query( ) function in Example 10-1.

The switch statement controls what action the program takes based on the value of $_REQUEST['cmd']. If $_REQUEST['cmd'] is add or edit, the program displays a form with textboxes for each field in the $fields array, as shown in Figure 10-1. If $_REQUEST['cmd'] is edit, values for the row with the supplied $id are loaded from the database and displayed as defaults. If $_REQUEST['cmd'] is save, the program uses pc_build_query( ) to generate an appropriate query to either INSERT or UPDATE the data in the database. After saving (or if no $_REQUEST['cmd'] is specified), the program displays a list of all zodiac signs, as shown in Figure 10-2.

Figure 10-1. Adding and editing a record

Figure 10-2. Listing records

Whether pc_build_query( ) builds an INSERT or UPDATE statement is based on the presence of the request variable $_REQUEST['id'] (because id is passed in $key_field). If $_REQUEST['id'] is not empty, the function builds an UPDATE query to change the row with that ID. If $_REQUEST['id'] is empty (or it hasn't been set at all), the function generates a new ID with nextId( ) and uses that new ID in an INSERT query that adds a row to the table.

If you don't have DB_Pager or you do but don't want to use it, you can roll your own indexed link display using the pc_indexed_links( ) and pc_print_link( ) functions shown in the Discussion in Examples 10-2 and 10-3.

Discussion

DB_Pager is designed specifically to paginate results that come from a PEAR DB query. To use it, create a DB_Pager object and tell it what query to use, what offset into the result set to start at, and how many items belong on each page. It calculates the correct pagination.

The $pager->build( ) method calculates the appropriate rows to return and other page-specific variables. DB_Pager provides a fetchRow( ) method to retrieve the results in the same way the DB class operates. (You can also use fetchInto( ) with DB_Pager). However, while it provides all the data you need to build appropriate links, it also leaves it up to you to build those links. The offset the previous page starts at is in $data['prev'], and $data['next'] is the offset of the next page. The $data['pages'] array contains page numbers and their starting offsets. The output when $offset is is shown in Figure 10-3.

Figure 10-3. Paginated results with DB_Pager

All the page numbers, "<<Prev" and "Next>>," are links. "<<Prev" and "1" point to the current page; the others point to their corresponding pages. On page 4, the "Next>>" link points back to page 1. (But on page 1, the "<<Prev" link doesn't point to page 4.) The numbers in the links refer to page numbers, not element numbers.

If DB_Pager isn't available, you can use the pc_print_link( ) and pc_indexed_links( ) functions shown in Examples 10-2 and 10-3 to produce properly formatted links.

After connecting to the database, you need to make sure $offset has an appropriate value. $offset is the beginning record in the result set that should be displayed. To start at the beginning of the result set, $offset should be 1. The variable $per_page is set to how many records to display on each page, and $total is the total number of records in the entire result set. For this example, all the Zodiac records are displayed, so $total is set to the count of all the rows in the entire table.

The SQL query that retrieves information in the proper order is:

SELECT * FROM zodiac ORDER BY id

Use modifyLimitQuery( ) to restrict the rows being retrieved. You'll want to retrieve $per_page rows, starting at $offset - 1, because the first row is 0, not 1, to the database. The modifyLimitQuery( ) method applies the correct database-specific logic to restrict what rows are returned by the query.

The relevant rows are retrieved by $dbh->getAll($sql), and then information is displayed from each row. After the rows, pc_indexed_links( ) provides navigation links. The output when $offset is not set (or is 1) is shown in Figure 10-4.

Figure 10-4. Paginated results with pc_indexed_links( )

In Figure 10-4, "6-10", "11-12", and "Next>>" are links to the same page with adjusted $offset arguments, while "<<Prev" and "1-5" are greyed out, because what they would link to is what's currently displayed.

Discussion

Using Cache_DB is almost the same as using DB, but there are some crucial differences. First, Cache/DB.php is required instead of DB.php. The Cache/DB.php file then loads the appropriate DB classes. Instead of creating a database handle with the DB::connect( ) method, you instantiate a Cache_DB object with the new operator and then call the object's connect( ) method. The syntax of $cache->connect( ) is the same, however, so you just pass it the DSN that identifies the database. The query( ) method of Cache_DB works just like that of DB, however there are no prepare( ) and execute( ) methods in Cache_DB. query( ) returns a statement handle that supports fetchRow( ) and fetchInto( ), but the default fetch mode is DB_FETCH_ASSOC, not DB_FETCH_ORDERED.

The first time a particular SELECT statement is passed to $cache->query( ) , Cache_DB executes the statement and returns the results, just like DB, but it also saves the results in a file whose name is a hash of the query. If the same SELECT statement is passed to $cache->query( ) again, Cache_DB retrieves the results from the file instead of running the query in the database.

By default, Cache_DB creates its cache files in a subdirectory of the current directory called db_query. You can change this by passing a directory name as part of an options array as a second argument to the Cache_DB constructor. This sets the cache directory to /tmp/db_query:

$cache = new Cache_DB('file',array('cache_dir' => '/tmp/'));

The first argument, file, tells Cache_DB what container to use to store the cached data. file is the default, but you need to include it here to specify the container options in the second argument. The relevant container option is cache_dir, which tells Cache_DB where to create the db_query subdirectory. Including a trailing slash is required.

By default, entries stay in the cache for one hour. You can adjust this by passing a different value (in seconds) when creating a new Cache_DB object. Here's how to keep entries in the cache for one day, 86,400 seconds:

Because the expiration time is the third argument, you have to pass the defaults for the first two arguments as well.

The cache isn't altered if you change the database with an INSERT, UPDATE, or DELETE query. If there are cached SELECT statements that refer to data no longer in the database, you need to explicitly remove everything from the cache with the $cache->flush( ) method:

$cache->flush('db_cache');

It's very important to include the db_cache argument to flush( ). The PEAR Cache system supports dividing up the cached items into different groups, and the Cache_DB object puts everything it's keeping track of in the db_cache group. Leaving out the group argument results in deleting the files in the base cache directory (which is probably the directory you're running your script from).

The file container stores each result in a file whose name is based on an MD5 hash of the query that generated the particular result. Because MD5 is case-sensitive, the file container is case-sensitive, too. This means that if the results of SELECT*FROMzodiac are in the cache, and you run the query SELECT*fromzodiac, the results aren't found in the cache, and the query is run again. Maintaining consistent capitalization, spacing, and field ordering when constructing your SQL queries results in more efficient cache usage.

Although this recipe focuses on the file container, the PEAR Cache system supports a number of other containers that hold cached data, such as shared memory, PHPLib sessions, databases via the dbx library, and msession sessions. To use a different container, pass the appropriate container name as the first argument when creating a new Cache_DB object:

$cache = new Cache_DB('shm');

See Also

Program: Storing a Threaded Message Board

Storing and retrieving threaded messages requires extra care to display the threads in the correct order. Finding the children of each message and building the tree of message relationships can easily lead to a recursive web of queries. Users generally look at a list of messages and read individual messages far more often then they post messages. With a little extra processing when saving a new message to the database, the query that retrieves a list of messages to display is simpler and much more efficient.

The primary key, id, is a unique integer that identifies a particular message. The time and date that a message is posted is stored in posted_on, and author, subject, and body are (surprise!) a message's author, subject, and body. The remaining four fields keep track of the threading relationships between messages. The integer thread_id identifies each thread. All messages in a particular thread have the same thread_id. If a message is a reply to another message, parent_id is the id of the replied-to message. level is how many replies into a thread a message is. The first message in a thread has level 0. A reply to that level message has level 1, and a reply to that level 1 message has level 2. Multiple messages in a thread can have the same level and the same parent_id. For example, if someone starts off a thread with a message about the merits of BeOS over CP/M, the angry replies to that message from CP/M's legions of fans all have level 1 and a parent_id equal to the id of the original message.

The last field, thread_pos, is what makes the easy display of messages possible. When displayed, all messages in a thread are ordered by their thread_pos value.

Here are the rules for calculating thread_pos:

The first message in a thread has thread_pos = 0.

For a new message N, if there are no messages in the thread with the same parent as N, N's thread_pos is one greater than its parent's thread_pos.

For a new message N, if there are messages in the thread with the same parent as N, N's thread_pos is one greater than the biggest thread_pos of all the messages with the same parent as N.

After new message N's thread_pos is determined, all messages in the same thread with a thread_pos value greater than or equal to N's have their thread_pos value incremented by 1 (to make room for N).

To properly handle concurrent usage, pc_message_save( ) needs exclusive access to the msg table between the time it starts calculating the thread_pos of the new message and when it actually inserts the new message into the database. We've used MySQL's LOCK TABLE and UNLOCK TABLES commands to accomplish this. With other databases, the syntax may vary, or you may need to start a transaction at the beginning of the function and commit the transaction at the end.

The level field can be used when displaying messages to limit what you retrieve from the database. If discussion threads become very deep, this can help prevent your pages from growing too large. For example, here's how to display just the first message in each thread and any replies to that first message: