Background

So, the other day, I opened up a MySQL dump file just to graze over it and see how they did things (you can call be weird if you want to, I prefer to think of it as “curious”). Anyway, I noticed something very cool that I thought was absolutely ingenious. They were inserting multiple rows in a single SQL statement, sort of like this…

I instantly had tons of places (and reasons) I wanted to do this. First my reasons:

Most of the overhead of a SQL
statement (especially an insert) is time spent sending the statement
to the server. Each statement has overhead, so doing several things in
a single statement can significantly improve performance.
This is generally true even if the statement is significantly more complex
(so long as it isn’t sort intensive).

Updating indexes is often the slowest part of inserting rows into a table,
and inserting several at one time only rebuilds the index once (after all the rows are
added) rather than once per row. This can be a significant performance boost.

If you are making a create-database SQL script, or generating
scripts to import data from an external source (like an excel file) you can
decrease the total size of the script if you eliminate hundreds (or thousands) ofINSERT INTO item (id, title) … and can tie them altogether.

So, I started using the multi row insert statements, and have really loved it. I was using them in MySQL and PostgreSQL, and they worked perfectly just as described above, so I sort of assumed it was a SQL standard. Then I tried using it on an Oracle database (11g) and it failed, which is odd because Oracle prides itself in being the first database to implement all the new SQL standards as soon as possible. It turns out this is an extension not a standard.

But, I decided it was worth knowing how to insert multiple rows in a single statement using each of the SQL dialects that I usually work with.

Of course, my ultimate goal isn’t just to see if I could do it with each engine, but to come up with a database neutral way to do it. I wanted to find the holy grail of SQL developers: the plain vanilla way that works everywhere.

Comments on the Plain Vanilla Way

Using UNION ALL to build one big table with all the rows you want to INSERT is not elegant,
you have to look at it for a while and do a little bit of head scratching before you
say oh, and then you will probably say why in the heck did they do that?

But it works great, and has two of the three benefits from inserting multiple rows at once.

5 points for all database engines that can do it.

Oracle gets only 3 points because they still require that stupid from dual effectively ruining the possibility of having a truly vanilla way of doing things. That means, if your server is running Oracle, you have to insert it like this:

Comments on the Better Way

The better way is easy to read, easy to write, easy to maintain, easy to add more rows to the existing query with minial typing. It also has the added benefit of being faster.

In short, it is absolutely elegant. 5 more points for engines that support it.

Comments on the Oracle Way

The INSERT ALL method probably enjoys some speed optimization above using the UNION ALL method, which is probably worth a point (that is why Oracle gets a 4 instead of just 3). I just can’t bring myself to offer more points than that in this case.

INSERT ALL still has the flexibility of allowing (our should I say requiring) a field signature to be provided for each row, this added flexibility probably equates to extra processing time before the insertions take place, but I can’t look under the hood to see for myself so don’t quote me.

This is an example of Oracle taking a cool statement for inserting into multiple tables at the same time, and using it to satisfy the multiple rows in the same table too. I agree it should work (it makes sense that you might want to insert several lines in a child table as you insert one line in a parent table), but I don’t think that that is a good excuse for not supporting the Better way. In fact, I think it is a real cop out.

Why I used UNION ALL instead of UNION

UNION ALL will offer significantly better performance than UNION, because the UNION command assumes that if there are duplicates they should be removed (mathematically speaking, the union of two sets includes only one copy of the overlapped region.) If you use UNION ALL, it keeps both copies of the overlapped region.

Of course, you shouldn’t have any overlapped region, and you certainly don’t want the db engine to sort and then compare all the records to make sure you don’t have any duplicates before it moves forward. UNION ALL doesn’t do any of that sorting and filtering so it will have significant performance improvements over just using UNION.

Points are awarded as follows:

MySQL (v 5.1)

10 out of 10

PostgreSQL (8.3)

10 out of 10

Microsoft SQL Server (2008)

10 out of 10

Sqlite (3)

5 out of 10

Oracle (11g)

4 out of 10

Oracle, catch up already.

Does anyone want to place bets about whether Sqlite will support the better insert statement before Oracle does?

For that matter, does anyone want to place bets about whether Oracle will ever stop requiring the FROM DUAL statement that no one else seems to need?

Maybe I should blog about one or more of the features that oracle really shines in just so people don’t think I have an anti-oracle bias.