Pages

Monday, November 26, 2007

One of 11g new features is Flashback Data Archive, or in marketing terms: Oracle Total Recall. With earlier flashback technologies it was possible to access Oracle's undo information for as long as it was available. With flashback archives you can extend this period for as long as you specify. This makes it possible to implement journaling using this new technique. This blog entry describes how to do this and a number of things you need to be aware of.

Before 11g, journaling was typically implemented using a separate table with the same columns as the base table, plus three additional columns: jn_operation (Insert/Update/Delete), jn_user and jn_datetime. Three after row triggers on the base table made sure every DML was recorded in the journaling table. So let's see how the same can be achieved using flashback archives.

So indeed all journaling information is available here. All, but one: the user who modified the data is not there. So let's audit the table to get this piece of information as well and create a journaling view to join the audit information with the journaling info.

Remember that DDLs that alter the structure of a table (such as drop/modify column, move table, drop partition, truncate table/partition, and add constraint) invalidate any existing undo data for the table. If you try to retrieve data from a time before such a DDL executed, you will get error ORA-1466.

Now this is a serious showstopper: in my experience the tables that need journaling are also the tables that require regular maintenance. And if that maintenance would invalidate existing undo ...

Most of the time, maintenance is done by adding one or two columns, with some additional constraints. The adding of a column is not a problem for flashback archives, but adding constraints was also listed as being impossible. Let's verify that as well, by adding a primary key, unique key, foreign key and a check constraint:

So if you make the mistake of adding a constraint, you lose all journaling information... This sure is something to be aware of. I think the adding of constraints to a table with a flashback archive should also raise an ORA-55610. Hopefully Oracle will fix this in a future release or patch set.

The only way I can think of to keep journaling information while still being able to do the necessary maintenance, is to create a journaling table in which you dump the information from the journaling view before invalidating the existing undo. Far from ideal, but doable as a workaround.

But what do we gain by implementing journaling this way?

The answer is simple: performance. No longer is it necessary to have database triggers. And because database triggers are evil, you win a lot, as can be seen in a little test in which I create 3 tables, one regular table, one table using a flashback archive and auditing, and one regular table with an after row trigger to populate a second journaling table:

As expected, almost no overhead using flashback data archives and therefore a lot faster than using the old fashioned way of journaling. But the maintenance thing described earlier is something to worry about in advance.

Thursday, November 15, 2007

Since version 9 it is possible to do bulk DML using FORALL and use the SAVE EXCEPTIONS clause. It makes sure that all invalid rows - the exceptions - are saved into the sql%bulk_exceptions array, while all valid rows are still processed. This array stores a record for each invalid row, containing an ERROR_INDEX which is the iteration number during the FORALL statement and an ERROR_CODE which corresponds with the Oracle error code.

The error text is not stored. The documentation says:

The individual error messages, or any substitution arguments, are not saved, but the error message text can looked up using ERROR_CODE with SQLERRM ...

Looks reasonable, but in our shop we validate lots of business rules with triggers. When a business rule is violated we do a RAISE_APPLICATION_ERROR(-20000,'APP-12345');

At client side (Webforms) the error message is looked up in the messages table and a friendly message is given. When coding a FORALL with SAVE EXCEPTIONS in such an environment, the error messages become useless as can be seen in the next example:

Note how the SQLERRM message doesn't return anything useful and that the name of the check constraint has disappeared. This is really annoying and can't be circumvented easily in 9i. For better error messages we would have to go back to row by row processing. And that means: very slow.

However, version 10gR2 introduced a feature called DML error logging. I remember reading about it more than two years ago here on Tom Kyte's blog. In this entry and in the documentation you only see examples using SQL, not PL/SQL examples using FORALL. But luckily this works as well:

And you do get to see the error messages (APP-12345 and APP-98765) and the name of the check constraint. Unfortunately, our shop still uses 9.2.0.7...

UPDATE

It is worth knowing that there are some restrictions when working with the DML error logging clause. From the 11g documentation (same as in 10gR2 documentation):

Restrictions on DML Error Logging

* The following conditions cause the statement to fail and roll back without invoking the error logging capability:

o Violated deferred constraints.

o Any direct-path INSERT or MERGE operation that raises a unique constraint or index violation.

o Any update operation UPDATE or MERGE that raises a unique constraint or index violation.

* You cannot track errors in the error logging table for LONG, LOB, or object type columns. However, the table that is the target of the DML operation can contain these types of columns.

o If you create or modify the corresponding error logging table so that it contains a column of an unsupported type, and if the name of that column corresponds to an unsupported column in the target DML table, then the DML statement fails at parse time.

o If the error logging table does not contain any unsupported column types, then all DML errors are logged until the reject limit of errors is reached. For rows on which errors occur, column values with corresponding columns in the error logging table are logged along with the control information.

Wednesday, November 14, 2007

Well, I didn't have a good name for this common SQL problem, so I decided to call it "interval based row generation". If someone out there knows a better or more common name for the problem, then please say so by leaving a comment.

An example says more than a thousand words, so here is an example of what I mean by the term:

And I want to generate rows for all dates between and including the start_date and the end_date. The query would have to generate 10 rows for group1, 1 row for group2 and 3 rows for group3. I know of six ways to do this type of query. There may be more of course. Here are the six alternatives:

Alternative 3 uses a trick: line 8 prevented the ORA-01436 connect by loop in user data error in versions 9 and 10, but not anymore in 11.

Alternative 4 shows a new error message introduced in Oracle11g: ORA-30007. Apparently the use of the connect_by_root function has been restricted more. I don't know the reason why, although Oracle probably will have a good reason for the restriction.

In this recent OTN-thread Adrian Billington (of the excellent oracle-developer.net site) showed me alternative 5, which I had not seen before. He also did a small performance test showing that alternative 5 is slightly better than alternative 6, the model clause solution. I decided to investigate a little more by tracing/tkprof and by comparing latches with Tom Kyte's runstats_pkg.

So, the clunky cartesian joins - alternatives 1 and 2 - seem to outperform the multiset and model clause solutions. These alternatives just generate all rows possibly needed and filter out the unneeded ones.

Conclusion is that Adrian's multiset solution is the close winner. The cartesian product solution will appeal more when the difference in intervals is small and the percentage of filtered rows is small. The model clause solution only wins in readability I guess, although some of you probably won't agree :-)

Monday, November 12, 2007

A few weeks ago, I received a question from a BI-colleague who had a problem with a query using the model clause in a materialized view. He wanted to generate new sets of data according to some rules, but he got stuck. The query he used was huge: approximately 680 packed lines containing a model with 100+ rules. It used 12 dimension columns and operated on a table containing 90K rows.

The question was very interesting though and I’ll try to simulate it with a smaller table like this:

In this table there are 8 sets of data, identified by the column called code. The col1, col2 and col3 values identify the rows within a set. Together with the code column they can be considered the primary key. The purpose is to generate new sets of data, using the same key values col1, col2 and col3, but with a simple formula to calculate the value column. In the example I will use the following rules to simulate the original problem

The value 32 in the required output’s first row is the result from 11 + 21, and 34 = 12 + 22 and so on. Note that the col1, col2 and col3 match exactly within each set.

If you look back at the create table statement, you will notice that are two set types MW001 – MW005, all having four identical key values, and MW011 – MW013 having three identical key values. Of course it is only possible to use sets within one set type in the rules. This is a precondition for this problem and not something we have to validate being true.

My colleague had read the documentation very carefully and used the same construction as recommended in it. Here is a link to that part of the documentation, just scroll down a little bit. He used the construct as described in the part about adding new sales for Poland, which resulted in this query:

And the four rows of set MW007 are not generated. It does not work because the model clause does not insert rows into the table: it only generates a result set. So selecting rows “where code = ‘MW006’” will never retrieve a row. So this line of thought is clearly wrong. To make this query work, it is absolutely necessary to know which base set should be used to generate the rows. If we change the query for the second rule to use the predicate “code = ‘MW001’” instead, the query works:

But why does Oracle advice to use a separate query to generate a new set? It is a new query that has to be executed, and in the original case there would be 100+ of them. Now if it was the only way to achieve the result …, but it’s not: it’s what the UPSERT ALL clause is all about! In my experience the UPSERT ALL clause is rarely used, but this is a showcase. However, you have to know exactly how it works, or you might get surprised by this behaviour:

Here 7 rows are selected instead of 4, because – as documented – all unique combinations that satisfy the non positional references, being col1, col2 and col3, are identified. Since I want only four I have to introduce an extra dimension value code2 based on code:

For this solution it is also –just as the solution using a query per rule- absolutely necessary to know against which base set the rows should be generated. This is a tedious task in a situation with 100+ rules, but there is no way around that. At least a lot of inner queries are avoided using UPSERT ALL.

I ended up suggesting a third alternative. However, looking back on it, I am not at all happy with this advice because the third alternative doesn’t relieve the fact of having to know the base set. I guess I was too disappointed about not getting to the point of not having to know the base set, that I turned down the UPSERT ALL variant altogether.The third alternative was this:

The key point here is to use the FOR loop to generate the rows. To do so I need to index all entries by means of the ROW_NUMBER analytic function, and use a reference model to know how many rows to generate. It works, but it is more complex compared to the UPSERT ALL alternative and there are a lot more rules this way, because of the increase in measure values.

Luckily they turned down the third alternative, but they could have used the UPSERT ALL perfectly. The advice came too late however, so they headed towards a totally different approach. Very likely something which does not involve using materialized views with a model clause query.

The big advantages are that it is now not necessary anymore to refer to an original set and compared to the solution using a query per rule, it is a lot faster.

However, it is slower than using the extra dimension and referring to the base set. A lot of unnecessary rows are generated that are filtered out at the end by the predicate "value is not null". It feels a lot like forgetting a join condition, resulting in having lots of duplicate rows, and filtering them out by using the DISTINCT keyword.

Very ugly, but it has its advantages.

UPDATE TWO

Volder has left a comment showing an alternative query. By putting the col1, col2 and col3 columns as a partition, the rules become very easy. And as can be seen in the motto of this blog, I really like simple solutions. His solution is this:

The query creates seven partitions for the seven unique combinations of col1, col2 and col3. And for each of those partitions six new cells are calculated, resulting in 42 rows. By inlining the query and filtering the ones with value NULL, the query returns the 22 requested new rows.

So in essence it is the same the query that is currently being used, and which I described in the first update section. That query also generated too many rows - also 42 - that had to be filtered out in the outer where. However, you have to agree that Volder's query looks much much easier.

Personally I still like the query using UPSERT ALL and an extra dimension "code2" the most, because only the necessary rows are generated. The new one also feels like forgetting a join condition and using distinct. But Volder's query sure is superior on all aspects to the query currently being used.

Tuesday, November 6, 2007

Today I visited a session at Centraal Boekhuis by Toon Koppelaars about his new product RuleGen. In short, the product is capable of generating all code needed for the implementation of business rules.

The easy ones (attribute and tuple rules) should be implemented by the declarative integrity constraints offered by Oracle: primary key, unique key, foreign key, not null and check constraints. However, the more challenging static constraints: the entity and inter-entity constraints can be handled perfectly with RuleGen. Change event rules (if A happens, then you should do B) should be considered as business logic instead of static database constraints, and are best to be handled in a calling api.

Examples of the type of constraints that were shown in RuleGen, are:- There can only be zero or one employee with job 'PRESIDENT'- Each department with a PRESIDENT or MANAGER should have at least one ADMIN.

Although I have stated that database triggers are evil, the main disadvantages I stated there seem to be addressed by this framework:

1) Database triggers make things happen "automagically"

This does not apply here, since RuleGen only considers the static database constraints.

2) Database triggers are rarely implemented correct

Here the appropriate locking code is generated by a dbms_lock call. In the version shown it was optional, but it will soon be mandatory and it can be derived from the definition.

3) Database triggers slow down DML

This will still be true since row level database triggers are generated and thus the dreaded contextswitches have to be performed. However the trigger code itself seems to be relatively lightweight - I'll have to test this. And more important, multirow DML is capable of checking the rule only once instead of "for each row". Undoubtedly RuleGen will outperform CDM RuleFrame. In the near future, I hope to be able to investigate how well RuleGen performs compared to the api approach.

To be continued...

UPDATE

Friday, November 2, 2007

Yesterday I visited AMIS "knowledge center event" about Oracle11g for Developers. They did a compressed version of the session two times earlier, as described here on their blog. Yesterday was an extended version. Almost all new features were discussed and demonstrated with illustrative example scripts.

The features I am most enthusiastic about are (for now):

flashback data archive

pivot/unpivot

for updated skip locked

the removal of the PLS-00436 implementation restriction I mentioned earlier here

Another lesson learned was why you shouldn't improvise by messing up EMP's data during a session like this ...

Anyway, thank you Lucas and Alex for a very interesting evening. And for the pizza and pasta of course :-)

Pierre has translated an OTN-post of mine called "When your query takes too long ...". The post was originally meant for myself to address all OTN-questions where only a query is given with a question like "Please tell me how to tune this query". It basically tells them to start investigating by collecting some data from explain plan and tkprof. It is nice to see it in French, since I had six years of French lessons at school I am able to read it as well. And the fact that I know what is being said helps too of course.