TECHNOLOGY: Ask Tom

On Redefinition, Nature, and Triggers

I have a table that contains millions of records, and I need to update it regularly. I want to perform an UPDATE and a COMMIT for every, say, 10,000 records. I don't want to do this in one step, because I may end up with rollback segment issues. Any suggestions?

Well, this is much more complex than it looks. Suppose you do break this large transaction up into many small ones, and halfway through, something, such as an ORA-01555 (snapshot too old) error caused by your committing frequently or a system failure, goes wrong. So now your batch update is partway doneit is "somewhere in the middle"and needs to be restarted. Unless you wrote lots of code to make it restartable, you might have a huge mess on your hands. How do you pick up where you left off?

is that the odds of an ORA-01555 are very high, because you are reading the table you are modifying and the SELECT * FROM T must be as of the time when the query was started. As you are modifying this table, the probability of an ORA-01555 goes upand you are the cause of it. The undo you generate with your UPDATE of table T is likely to be needed by your SELECT on table T; however, when you execute a COMMIT, you enable the database to reuse your generated undoand if it does (because the undo retention is set too small or because there is insufficient undo space allocated to hold all of the undo you are generating), you will almost certainly get the ORA-01555 error.

Additionally, when you do run into the ORA-01555 error and the code block fails, how do you restart it? You might need either a column in that table that could tell you if it had been bulk-updated already or another tracking table into which you inserted the primary keys of rows already modified, using "SELECT * FROM T WHERE PK NOT IN (select pk from tracking_table)" or a similar approach to do this.

So you are faced with writing lots of code to accomplish this. My preference would be to do one of the following things:

Just update the table in a single statement.This approach will be by far the most efficient in terms of resource usage. The concern about possible rollback segment issues is very much offset by the fact that youneedthat undo for your query to complete successfully.

Use DBMS_REDEFINITION to process the update.This will have the possible advantage of being an online operation (with no locking conflictswhich can be good or bad, depending on the circumstances). It will also avoid any row migration that might happen with an UPDATE that makes rows "larger" than they were. It will result in the most-compact data structures afterward as well. You can even add an ORDER BY to the table redefinition in Oracle Database 10g and above to resequence the rows on disk if that makes sense.

Use CREATE TABLE AS SELECT to select the "updated data" into a new table.This is similar to the DBMS_REDEFINITION option but is an offline operation (so no modifications can be permitted to the original source table while you are doing this) and is more manual. Whereas the DBMS_REDEFINITION option automates the creation of indexes, grants, and constraintseverything you need to create a copy of an objectthe CREATE TABLE AS SELECT method requires you to do all of that manually.

Let's look at using DBMS_REDEFINITION to create an "updated copy" of a table. The goal is to create a new column that is a concatenation of three existing columns, rid the table of the three existing columns, and sequence the rows on disk by another column (to have the existing data sorted by that column on disk). I'll start with a copy of ALL_OBJECTS to test with, as shown in Listing 1.

So I have a table T with a constraint (and possibly more, such as grants, indexes, triggers, and so on). As you can see in Listing 1, the data is stored randomly on diskit is definitely not sorted by OBJECT_NAME, as the full table scan I started with a SELECT * FROM T shows. I would like to take three columnsTEMPORARY, GENERATED, and SECONDARYand concatenate them into a new FLAGS column. Further, I would like to "drop" the TEMPORARY, GENERATED, and SECONDARY columns as well as the SUBOBJECT_NAME, DATA_OBJECT_ID, and TIMESTAMP columns from the new table and, last, organize theexistingdata by OBJECT_NAME. Note that any newly added data will not be stored in sorted order in the table.

Now I am ready to begin the UPDATE, using the DBMS_REDEFINITION packageavailable with Oracle9i Database and aboveto do an online table redefinition, as shown in Listing 2. (The ability to sort the data during a redefinition was added in Oracle Database 10g Release 1.)

The COL_MAPPING parameter in the START_REDEF_TABLE procedure is what does the UPDATE and DROP column magic. You basically execute a SELECT on the data, using the COL_MAPPING parameter, which can include functions (which you would have used in the SET clause of the UPDATE). The ORDERBY_COLS parameter accomplishes the resequencing of the existing table data on disk. If you enable SQL_TRACE=TRUE when executing the call to START_REDEF_TABLE, you will see an INSERT like this being executed:

Note that the TEMPORARY, GENERATED, SECONDARY, SUBOBJECT_NAME, DATA_OBJECT_ID, and TIMESTAMP columns do not get copied to the T_INTERIM table but the TEMPORARY, GENERATED, and SECONDARY columns get concatenated into the new FLAGS column.

Because that INSERT includes APPEND in the hint, you can not only bypass UNDO (which happens by default with APPENDno undo is generated for the T_INTERIM table during this initial load) but can also, if you choose, bypass REDO generation for this table as well by altering it to be NOLOGGING before performing the START_REDEF_TABLE procedure. (If you bypass REDO generation, make sure to coordinate with the group responsible for backupsbeforedoing that! They'll need to schedule a backup of the affected data file shortly after this operation to make the new data recoverable).

That START_REDEF_TABLE procedure I just invoked did a couple of things. It copied the data from T to T_INTERIM, copied only the data of interest, sorted the data during the load, and did it all efficiently (bypassing UNDO and optionally REDO). It also set up just enough replication between T and T_INTERIM to enable me to keep them in sync, so that at the end of the redefinition, the two tables are logically equivalentthey have the same number of rows.

Now I need to copy over the dependent "things"such as indexes, constraints, and grants. I can use the COPY_TABLE_DEPENDENTS API call (a feature of Oracle Database 10g and above) to perform this, or I can copy over the table dependents myself, using DDL (along with any option I'd like: NOLOGGING, PARALLEL, and so on). In this example, I use the COPY_TABLE_DEPENDENTS API call:

That did it. If you were to query the data dictionary now, you'd see two tables with equivalent constraints, grants, triggers, indexes, and so on. I am ready to finish the redefinition nowit will involve synchronizing the tables and then locking both T and T_INTERIM for a brief moment to swap their object names in the data dictionary so T will become T_INTERIM and T_INTERIM will become T:

As you can see, table T is now the updated table, with the new FLAGS column and with the existing rows in the table sorted on disk by OBJECT_NAME.

As previously mentioned, using CREATE TABLE AS SELECT to select data into a new table could produce a result similar to using DBMS_REDEFINITION, but the process, including the creation of indexes, grants, and constraints, would be more manual. However, in the standard edition of Oracle Database, where online redefinition is not available, using CREATE TABLE AS SELECT might be the most efficient way to accomplish this.

Natural or Surrogate Keys

The project I'm currently working on has the database design rule that all tables must have a surrogate key. This is required even if a perfectly good natural key exists.

The primary motivation appears to be to improve join efficiency by removing the possibility of having to join two tables on more than one column.

Personally I'm not a fan of surrogate keys in general or of this sort of blanket policy in particular. I believe that there is often much to be gained by having the child table inherit the primary key of the parent table as part of the primary key of the child table.

Do you have any comment about natural keys versus surrogate keys in general? Do you favor surrogate keys over natural keys? How tight would performance considerations have to be to justify such a scheme?

Ahh, the age-old debateone that neither side will ever "win." Having a rule such as this is the perfect example of why I don't really like the termbest practice, or ROTs (rules of thumb). One person's best practice is another person's nightmare.

If you have a natural key, by all means, use it. The natural key should be immutable and sensible; they are sometimes rare in real life, but they do exist.

For example, if I had a DOCUMENT table and a DOCUMENT_VERSION table, I would definitely use document_id (which might be a surrogate) as the primary key of one table and the combined document_id,version# as the primary key of the other table (and DOCUMENT_VERSION might have an association with authors, so its primary key is a foreign key elsewhere in the system too).

for the reasons you just listed. One thing to be very sure of, however, is that the primary key is immutable. Never changing. In this case, the surrogate key in the DOCUMENT table is immutable and the natural key (which happens to include a surrogate key from something else) in the DOCUMENT_VERSION table is as well.

The natural key would have to be present in my table from the get-go, with NOT NULL and UNIQUE constraints on it. The use of a surrogate key here would only add to the work of all INSERT operationshaving to now generate a surrogate key as well as uniquely constrain both the surrogate keyandthe natural key. So, if the natural key is both immutable and reasonable, by all means use it. (In this case,reasonablemeans, for example, that it does not take 17 columns to store the natural keyhaving 2, 3, 4, or maybe even 5 columns is reasonable.) To read the many and varied discussions of this divisive topic, see asktom.oracle.com/pls/ask/search?p_string=%22natural+key%22.

DDL in Triggers

Everything I have ever read about triggers explicitly states that DDL, because of an implicit COMMIT, cannot be used within a trigger. I have, though, seen in blogs and other places that people have claimed to have gotten it to work through various hacks, but I have never been successful in doing so. I have two questions relating to this:

1. Why doesn't using pragma autonomous_transaction solve this within a procedure called from a trigger?

2. Do you know of a workaround for executing DDL in a procedure called from a trigger?

Well, to answer No. 1, pragma autonomous_transaction would permit you to do DDL in a trigger, butthankfullyyou never succeeded. Be very thankful of that. In answer to No. 2, in the extremely rare caseextremely rare casewhere this is actually desirable and necessary, I suggest using DBMS_JOB to schedule the CREATE statementafteryour transaction has committed.

First, think for a moment about the ramifications of doing nontransactional work in a trigger. What happens when you need to roll back? Well, of course, the DDL would not roll backyou would be left "halfway there." The DDL would have happened, but the transaction that caused the DDL would not have happened (having been rolled back). You would be left with a mess.

Whenever you are tempted to do something nontransactional in a trigger, think 500 times more about it and then always decide against it. It can lead only to really bad things.

DO_DDL_SAFELY is a stored procedure you write that does DDL, catching errors if needed, notifying people, and correctly doing whatever needs to be done. It will executeshortlyafter you commit.

And best of all, if you roll back, the INSERT into the job queue does too. You are protectedthe DDL won't happen. Use this approach anytime you are thinking of doing something nontransactional in a trigger.

See this series on a related topic (write consistency), including how triggers might fire more than once for a given statementyet another reason to avoid nontransactional operations in a trigger:

Tom Kyteis a database evangelist in Oracle's Server Technology division and has worked for Oracle since 1993. He is the author ofExpert Oracle Database Architecture: 9iand 10gProgramming Techniques and Solutions(Apress, 2005) andEffective Oracle by Design(Oracle Press, 2003), among others.