Programming with the DBI becomes a matter of habit. You may choose to code directly with the DBI rather than using on of the many wrappers available on CPAN because of efficiency concerns, or because you are dealing with legacy code, or simply because you want to have a grip at the core of things. Whichever reason for using the DBI directly, the time comes when you have to face one of the simple problems listed here. If you were looking for answers, this is the place to go. If you found the answer on your own, let's compare notes. Either way, enjoy the reading.

If you want to initiate action on a table, or a group of tables from a specific database, it would be better to make sure that such table exists. Sometimes you don't need to worry, since your scripts only apply to a well designed and never-changing database. Sometimes, though, you are not so lucky, or you simply need to check if a table - perhaps a temporary one - was already created.

Standard SQL does not have a way to tell you that, even though some dialects provide a useful idiom (MySQL allows CREATE TABLE IF NOT EXISTS and DROP TABLE IF EXISTS). If your database of choice doesn't give you such facility, or you want to create something portable to more than one DBMS, this technique could be handy.

This routine uses DBI's tables method as a primary way of checking. Unfortunately, not every DBMS answers to tables in a consistent way. Therefore, I use a double standard. If tables returns a non empty array, then I test each element until I find one matching the table being searched. If the array is empty, it could be either that the database is empty, or that the DBMS fails to report. In this case, just to be on the safe side, I use the second method, trying to issue a SELECT statement with a false WHERE clause. If this method returns an error, it means that the table does not exist. Using eval and localizing the database handler's attributes ensures some degree of safety.

I can't guarantee that it will work with every database driver, but it is the best I could think of.

There are basically two ways of inserting records through the DBI. The easiest one is using the do method, passing the whole query, without worrying about parameters. Basically, the same query that you'd type in a native database client would also go through a do method. This is something you should do if you have to insert just one record. If you need to insert many of them within one loop, then it is strongly advisable to use prepare//execute.

Using prepare and /execute for insertion means (most of the times) using placeholders, which is the recommended method of dealing with this problem. Placeholders are efficient and they guarantee that the values are properly quoted. The only thing sort of unpleasant with them is that you can get confused about how many placeholders to enter or about the order of the values to submit. In the first case, don't worry, since the DBI will complain about a wrong number of placeholders. In the second case, it could become tricky, if you swap things like first and last name. But you don't have to rely only on your patience and accuracy. You can use some idioms to fortify your query.

Your first priority when inserting records is creating an insertion query that has the right list of columns and a matching number of placeholders. To ensure that you are always referring to the right list of fields, it is useful to store such list into an array.

In this example, there are only three columns, but the idiom would work the same way if we had 100 or more. From the @fields array it is easy to create a string of fields to be used either in an INSERT or a SELECT statement. In this case, the INSERT query is

It is obvious that executing a query with scalars is only useful when the scalars are updated. For example, within a loop, we may call a function that fills the values with appropriate contents, so that we can call execute with valid items.

Syntactically, a lol is perhaps the most natural structure to pass to an execute method. This is because execute expects a list of values. Therefore an array is the logical candidate for this function.

The only issue with this insertion is that we must insure that the array elements are in the same order as the fields to insert. No shortcuts available here. Either we pay attention when we encode the array, or we ensure that the items are filled in the proper order when we get the data from an external source. Using a list of lists we trade speed for accuracy. However, if we have some means of making sure that the order is correct, this is the fastest insertion method available from the DBI.

I must mention that recent versions of the DBI allow also the method execute_array with the related bind_param_array, and it is useful for mass insertion. However, unless your DBMS supports such a method directly, execute_array is actually slower than other methods.

If we want to be sure that each value is going to the corresponding column in the database, we can use a data structure where the values are referred by name, rather than by position. A list of hashes looks like the best solution for this problem.

The naive approach would be to iterate the list and access the items by name.

But this style of coding, although correct, may take long for a large number of columns, and we can still make a mistake and swap positions, thus resulting in an incorrect insertion. It's better to take advantage of our column list and exploit Perl capability of creating hash slices.

In the above example, not only we write less, but we are also sure that the order of values comes according to the columns list, which was the one we used to create the insertion query in the first place. For large datasets with many columns, this idiom is definitely the best way to go. Read more about this method in chromatic's article about the DBI (off site).

Once you have successfully inserted records into a database, you'd like to get them back, possibly in a format that suits your current needs, rather than the needs you had when the data was created. Fetching data from a database is a collaborative effort. First, you instruct the database about what you want to get, and the database will eventually comply and prepare for you a dataset ready for consumption. It's important to remember that, no matter how complex your query, the ultimate result will always be a bi-dimensional array, rows and columns filled with the data you asked for.

But you don't want always bi-dimensional arrays. Your application needs may be less (or more) demanding than that. Therefore you need to transform the result through the DBI, converting the dataset into the data structure more suitable for your current task. Even though you can't force the DBMS to change the only structure it can deal with, it will assist you in your translation.

The following sections examine a few of the idioms offered by the DBI when fetching records. Unlike the DBI manual, where you get help on specific instructions, the matter in this post is presented by task. Hopefully, you can find the recipe tailored for your application.

Notice that we could have just selected "name", like we did in the previous example. The reason for the extra columns is to demonstrate how to use this method in cases where you have limited control over the columns returned by the DBMS. For example, the output for the DESCRIBE command in Oracle or MySQL will return multiple columns, and you may want to isolate only one of them.

Coming back to the example at hand, the result from selectall_arrayref is a reference to a bi-dimensional array. Using the @{} construct it is turned into an array, and map will only filter the second element from each row. As a result, we get a flat array containing only the names.

The DBI has also a specialized function to accomplish this task. selectcol_arrayref returns a reference to an mono-dimensional array. If we have a query that should return only one row, or having the wanted row in first position, then this method returns what we want. The only thing we have to do is de-referencing the array.

The cryptic looking code in this example is just for educational purposes. I wouldn't do it in a real application, but again it could be useful to know that it is possible. fetchall_arrayref accepts an optional parameter, a "slice," telling the DBI which columns we want and in which format. If we use an array slice, as in this example, its indexes mark the columns to fetch. [0, -2] means take the first and second-to-last column. Therefore, our dataset will have two columns. Then, map will filter the second column, which is the name. Of course, in a three-column dataset, [0, -2] is the same as [0, 1], or [-3, -2]. Just to show what you can do.

No sweat here. If we want a list of lists, that's what a RDBMS is always happy to provide you and the DBI is glad to help.

There are two methods to get a lol from the DBI. Either you use the prepare /execute / fetchall sequence, or you use selectall_arrayref, which is a handy shortcut. The following snippets are equivalent.

A list of hashes is a common need. For example, if you are using HTML::Template, you should find this method useful. You can either build the loh manually, fetching a hash and pushing it to an array, but the DBI can do it for you quite efficiently.

The values you pass are not important. You could also say {name=> undef, salary => undef} and it would have been the same. Only the keys are relevant. Notice also that the order of your hashref can be different from the order of the resulting hashref, since hashes in Perl don't have a guaranteed order (You all should know that, but a reminder shouldn't hurt :) ).

You can build a more complex structure, for example assigning to each hash value an array reference. Depending on which column you want to transform into a key, you can either assign the key directly or use shift if the desired key is in first position.

If you wanted to assign "name" as the hash key, then you could either modify the query to have "name" as first column in the dataset or modify the map code to go with your wishes ( map {$_->[1], [ $_->[0], $_->[2] ] }).

The default DBI method for a hash of hashes is selectall_hashref, which needs as arguments the query to execute and the column to use as key, indicated by position. The position starts with 1. Don't get confused with the array index that would be 0 to get the same column from one row of a lol.

Unlike the manual example above, where the hash was created with map, you can only create a hash of hashes, and the key used is also repeated among the values. The usefulness of such structure is left entirely to the reader imagination.

fetchall_arrayref is sometimes the fastest method for retrieving records. It depends on the number of rows to fetch and on their size. (see Speeding up the DBI for a comparison.)

The fastest method, which you can always count on, is fetchrow_arrayref. It is even faster when you combine it with binding.

What is binding? To understand what it is and why it is convenient, let me tell you how the DBI fetches records from the DBMS. Whenever a dataset is ready, the DBI uses one variable to transport the record from the DBMS to your application. Then, the data is copied from that variable into yours.

With binding, you are telling the DBI "Don't use your bucket to fetch water. Use mine, so we don't waste time pouring from one to the other."

The DBI has two methods to bind column results to your variables.

bind_col establishes a relationship between one column and your variable.

Binding scalars is straightforward. Just have your scalar variables ready and call bind_col for each column of the dataset. Columns are specified by position. The variable bound must be a reference to your scalar.

Apart from speed, the great advantage of binding scalars is that you can call columns by name, rather than by position, making your application clearer.

Notice that the binding must happen after the call to execute. According to the DBI docs, this is not mandatory and a DBD driver could modify this behavior. If you write code with portability in mind, try to follow this rule, though.

Let's say you are willing to relinquish the easy way of referring to the columns by name and want to bind the result to an array, because you don't want to pollute your application with many variables. Therefore you decide to bind the result set to an array. You can't use the syntax in the previous section, since \@array is a reference to an array, while bind_columns requires a list of references.

Once more, map to the rescue, to help passing the reference of each array element to bind_columns. The only tricky part here is to establish the right number of elements for the array

You can also combine scalars and arrays and do something hybrid like this one, even though I don't know what good it can do to you. If you have such an array already established for other purposes, remember that also this idiom is available. Notice that, since the array elements are already references, map intervention is no longer needed.

In another article I measured the relative speed of fetching methods, and fetchrow_hashref came as the slowest one. So much slow, in fact, that you often choose not to use it, reasoning that you'd rather call columns by position than suffer the speed penalty.

However, if your only concern is to refer to columns by name, you can use a hash with almost the same efficiency you'd get with an array reference.

There are two distinct advantages here. The first is speed. You are fetching records at the same speed as fetching an array reference. And second, you are referring to columns by name. Not only that, but you are using the names that you have chosen, not the ones coming from the database, as in the case of fetchrow_hashref.

The difference between this idiom and fetchrow_hashref that you should keep in mind, is that with fetchrow_hashref you are creating a new hash at each loop, while here you are using the same data structure over and over. Therefore, if your purpose was to use a bound hash to push it into an array, forget it, because you would create an array of references where each element is a reference to the same hash. Use this idiom only if you want to improve your application readability, and referring to the columns by name.

Datasets can be quite large. Actually it is in such cases that you want to use hashes instead of arrays, because it is easy to use a record by position when you have just a handful of columns. If your result has fifty columns, instead, referring to them by position is a secure recipe for disaster.

On the other hand, binding a large hash to a result set could be a long task. Therefore we can use an idiom similar to the ones that have simplified our life when inserting records.

To carry out this task, we create an array of column names that must be in the same order as the ones in the SELECT statement. It could be the same list (recommended!), or an equivalent list with the names of our choice. After that, our ubiquitous map will get the job done.

If you want to be convinced of this hash fetching method efficiency, here is a simple comparison. Feel free to modify this code to suit your needs. The important thing to note is that a bound hash is almost as fast as an array reference.

Before closing this long review, I would like to warn against another hidden problem with hashes. It is not something that you find in your everyday applications, but it can happen. So here it goes.

The problem arises when your result set, by mean of a JOIN, has one or more columns with the same name. In this case, an arrayref will report all the columns without even noticing that a problem was there, while a hashref will lose the additional columns. Let's consider a simple example (which is meaningless in its stand-alone format, but it's just for the sake of the example).

The CROSS JOIN in this query will return one row with two columns, both containing '1'. Using a hashref, you'll get only one, while using an array-based fetching method you get both of them.

Of course, you can solve this problem by using aliases in your query (SELECT a.id AS id1, b.id AS id2) but you never
know when code written by somebody else could end up this way. Just be aware of this remote possibility if you find less columns than you were expecting.

This is hardly the end of it. Perl is such a rich language and DBI is such a vast module that you can discover new idioms every day. Nevertheless, the list of idioms in this article should be enough to satisfy the average user's curiosity.

A variation on the hash-binding theme is to use the contents of the statement handles NAME attributes as the hash keys. Since in most instances you'll be interested in all the keys you are selecting this can be a useful shortcut. For example (untested code):

Too bad you aren't checking whether your actions succeed,
and neither are you using transactions when you are modifying
your database. Now your examples are equivalent of Perl code
that isn't strict, doesn't have warnings turned on, and isn't
checking the return values of system calls.

Pity.

There's no execuse to not wrap your database modifying
statements inside a transaction, and to not rollback on
failure.

As for checking for the existance of a table, that's
rather easy in Sybase and MSSQL:

SELECT 1
FROM sysobjects
WHERE name = "whatever" -- Name of the table
AND type = "U" -- User defined table.
-- Use "S" for a system table.

There's no execuse to not wrap your database modifying statements inside a transaction, and to not rollback on failure.

Nonetheless, I'll give you some.

I had error checking and transaction support in my test script, but I removed them on purpose.

Why?

This post is focused on idioms, and I didn't want to burden the examples with something that doesn't belong there.

Error checking is an idiom in itself. You can check for errors by testing the object, by creating a handle sub, by using eval. Depending on your needs, the error checking could take several shapes, and I have approached this subject elsewhere.

Transactions are not a mandatory DBI feature. The DBI can deal with the big boys (Oracle, MS SQL, IBM DB2, and so on) but it can also handle DBF files, Excel spreadsheets, CSV files, and others where transactions are not supported.

When transactions are supported, while it is true that every update BELONGS TO a transaction, it is not true that every update IS a transaction (Update meaning that the transaction-related code could be around a group of statements rather than around a single one.) So, in addition to my goal of keeping the code focused on the example, I didn't want to give the impression that the transaction code is necessary for that particular idiom to work.

When transactions are supported, while it is true that every update BELONGS TO a transaction, it is not true that every insertion IS a transaction.

I'm nitpicking, but with Oracle, DB2, Sybase, etc. every DML operation is run in a transaction. The transaction may automatically commit at the end of the operation (i.e. if AutoCommit is on), but there is always at least an implicit transaction as even a single row insert may in fact generate more operations through triggers (inserts to a shadow table, updates of summary tables, etc.) and these all will be guaranteed to perform as a single operation, even in the absence of explicit transactions in the DML.

That being said I think that keeping the transaction logic out of the examples is a good thing, as long as their use and functionality is explained somewhere in the document - after all transactions are pretty central to RDBMS systems...

Like many monks rewrite code to
improve it, I am going to rewrite the first three paragraphs
in the attempt to add a touch of tact to them:

Thank you for contributing gmax, that is an impressive
start of a cookbook. But ... shouldn't you have incorporated
more error checking and transaction support? These are,
IMHO, an integral part of DBI - actually, I strongly feel
that there's no excuse to not wrap your database modifying
statements inside a transaction, and to not rollback on
failure. Along those lines, one could accuse your examples
of being equivalent to Perl code that isn't strict, doesn't
have warnings turned on, and isn't checking the return
values of system calls.

By the way, Re: Style, *again* appears to be the only node where you actually thanked someone here. Many, many have thanked
you in the past - you should return the favor more often, if nothing more than for being appreciated.

Years ago my mother used to say to me, she'd say, "In this world, Elwood, you must be" -- she always called me Elwood -- "In this world, you must be oh so smart or oh so pleasant." Well, for years I was smart. I recommend pleasant. And you may quote me.

Just to answer Abigail-II's concerns mentioned earlier, I'll mention that there is a section in the DBI docs on Transactions that goes over an idiomatic method of error handling. I won't repeat it in detail, but to summarize, set RaiserError on, wrap all DBI operations in one big eval block, check $@ after exiting the eval block, and rollback the transaction if there was an error (update: and commit if there was no error). You can even set RaiseError during the connect (in which case you would not rollback if there were an error during the connect), and do the connect inside the eval block, which the example in the docs does not do.

I try the sample of "Binding a large hash" because perl is ten times slower than php to get data...and I lost 5 seconds!
- 349s in perl with "Binding a large hash" methodwhile ($sth->fetchrow_hashref)
- 344s in perl with while ( my $hash_ref = $sth->fetchrow_hashref)
- 32s in php with while ($row = db_fetch_array($statement,OCI_ASSOC))