Sep26

Persistent Database Connections

Persistent Database Connections

Persistent connections are links that do not close when the
execution of your script ends. When a persistent connection is
requested, PHP checks if there's already an identical persistent
connection (that remained open from earlier) - and if it exists, it
uses it. If it does not exist, it creates the link. An 'identical'
connection is a connection that was opened to the same host, with
the same username and the same password (where applicable).

People who aren't thoroughly familiar with the way web servers work
and distribute the load may mistake persistent connects for what
they're not. In particular, they do not give
you an ability to open 'user sessions' on the same link, they
do not give you an ability to build up a
transaction efficiently, and they don't do a whole lot of other
things. In fact, to be extremely clear about the subject,
persistent connections don't give you any
functionality that wasn't possible with their non-persistent
brothers.

Why?

This has to do with the way web servers work. There are three ways
in which your web server can utilize PHP to generate web pages.

The first method is to use PHP as a CGI "wrapper". When run this
way, an instance of the PHP interpreter is created and destroyed
for every page request (for a PHP page) to your web server.
Because it is destroyed after every request, any resources that it
acquires (such as a link to an SQL database server) are closed when
it is destroyed. In this case, you do not gain anything from trying
to use persistent connections -- they simply don't persist.

The second, and most popular, method is to run PHP as a module in a
multiprocess web server, which currently only includes Apache. A
multiprocess server typically has one process (the parent) which
coordinates a set of processes (its children) who actually do the
work of serving up web pages. When a request comes in from a
client, it is handed off to one of the children that is not already
serving another client. This means that when the same client makes
a second request to the server, it may be served by a different
child process than the first time. When opening a persistent connection,
every following page requesting SQL services can reuse the same
established connection to the SQL server.

The last method is to use PHP as a plug-in for a multithreaded web
server. Currently PHP 4 has support for ISAPI, WSAPI, and NSAPI (on
Windows), which all allow PHP to be used as a plug-in on multithreaded
servers like Netscape FastTrack (iPlanet), Microsoft's Internet Information
Server (IIS), and O'Reilly's WebSite Pro. The behavior is essentially
the same as for the multiprocess model described before.

If persistent connections don't have any added functionality, what
are they good for?

The answer here is extremely simple -- efficiency. Persistent
connections are good if the overhead to create a link to your SQL
server is high. Whether or not this overhead is really high depends
on many factors. Like, what kind of database it is, whether or not
it sits on the same computer on which your web server sits, how
loaded the machine the SQL server sits on is and so forth. The
bottom line is that if that connection overhead is high, persistent
connections help you considerably. They cause the child process to
simply connect only once for its entire lifespan, instead of every
time it processes a page that requires connecting to the SQL
server. This means that for every child that opened a persistent
connection will have its own open persistent connection to the
server. For example, if you had 20 different child processes that
ran a script that made a persistent connection to your SQL server,
you'd have 20 different connections to the SQL server, one from
each child.

Note, however, that this can have some drawbacks if you are using a
database with connection limits that are exceeded by persistent
child connections. If your database has a limit of 16 simultaneous
connections, and in the course of a busy server session, 17 child
threads attempt to connect, one will not be able to. If there are
bugs in your scripts which do not allow the connections to shut
down (such as infinite loops), the database with only 16 connections
may be rapidly swamped. Check your database documentation for
information on handling abandoned or idle connections.

Warning

There are a couple of additional caveats to keep in mind when
using persistent connections. One is that when using table
locking on a persistent connection, if the script for whatever
reason cannot release the lock, then subsequent scripts using the
same connection will block indefinitely and may require that you
either restart the httpd server or the database server. Another is
that when using transactions, a transaction block will also carry
over to the next script which uses that connection if script execution
ends before the transaction block does. In either case, you can
use register_shutdown_function() to register a
simple cleanup function to unlock your tables or roll back your
transactions. Better yet, avoid the problem entirely by not using
persistent connections in scripts which use table locks or
transactions (you can still use them elsewhere).

An important summary. Persistent connections were designed to have
one-to-one mapping to regular connections. That means that you
should always be able to replace persistent
connections with non-persistent connections, and it won't change
the way your script behaves. It may (and
probably will) change the efficiency of the script, but not its
behavior!

Коментарии

If anyone ever wonders why the number of idle db process (open connections) seems to grow even though you are using persistent connections, here's why:

"You are probably using a multi-process web server such as Apache. Since
database connections cannot be shared among different processes a new
one is created if the request happen to come to a different web server
child process."

this one bit quite a bit of chunk out of my you-know-what. seems like if you're running multiple database servers on the same host (for eg. MySQL on a number of ports) you can't use pconnect since the port number isn't part of the key for database connections. especially if you have the same username and password to connect to all the database servers running on different ports. but then it might be php-MySQL specific. you might get a connection for an entirely different port than the one you asked for.

To those using MySQL and finding a lot of leftover sleeping processes, take a look at MySQL's wait_timeout directive. By default it is set to 8 hours, but almost any decent production server will have been lowered to the 60 second range. Even on my testing server, I was having problems with too many connections from leftover persistent connections.

For the oci8 extension it is not true that " [...] when using transactions, a transaction block will also carry over to the next script which uses that connection if script execution ends before the transaction block does.". The oci8 extension does a rollback at the end scripts using persistent connections, thus ending the transaction. The rollback also releases locks. However any ALTER SESSION command (e.g. changing the date format) on a persistent connection will be retained over to the next script.

It seems that using pg_pconnect() will not persist the temporary views/tables. So if you are trying to create temporary views/tables with the query results and then access them with the next script of the same session, you are out of luck. Those temporary view/tables are gone after each PHP script ended. One way to get around this problem is to create real view/table with session ID as part of the name and record the name&creation time in a common table. Have a garbage collection script to drop the view/table who's session is expired.

There's a third case for PHP: run on a fastCGI interface. In this case, PHP processes are NOT destroyed after each request, and so persistent connections do persist. Set PHP_FCGI_CHILDREN << mysql's max_connections and you'll be fine.

In IBM_DB2 extension v1.9.0 or later performs a transaction rollback on persistent connections at the end of a request, thus ending the transaction. This prevents the transaction block from carrying over to the next request which uses that connection if script execution ends before the transaction block does.

I've been looking everywhere for a benchmark or at least comparison of the overhead used by oci_connect and oci_pconnect.
Just saying "oci_connect is slower because the overhead..." is not enough for me. For than I wrote a couple scripts to compare perfomance.
At the end I found out an average of 34% more time using a oci_connect than oci_pconnect, using a query of 50 rows and 100 columns.
Obviously this wasn't a real benchmark however it gives a simple idea of the efficiency of each function.

One additional not regarding odbc_pconnect and possibly other variations of pconnect:

If the connection encounters an error (bad SQL, incorrect request, etc), that error will return with be present in odbc_errormsg for every subsequent action on that connection, even if subsequent actions don't cause another error.

For example:

A script connects with odbc_pconnect.
The connection is created on it's first use.
The script calls a query "Select * FROM Table1".
Table1 doesn't exist and odbc_errormsg contains that error.

Later(days, perhaps), a different script is called using the same parameters to odbc_pconnect.
The connection already exists, to it is reused.
The script calls a query "Select * FROM Table0".
The query runs fine, but odbc_errormsg still returns the error about Table1 not existing.

I'm not seeing a way to clear that error using odbc_ functions, so keep your eyes open for this gotcha or use odbc_connect instead.