Monitoring Changes to Table Data

22032012

March 22, 2012

Suppose that you receive a request stating that a particular table in one of your databases must be monitored for changes. One such table might be the one that lists the ERP system’s suppliers and their addresses – you would not want someone to be able to set up a legitimate supplier, and later have the billing remit to name and address changed without being detected.

What approach would you take to solve the above problem? A question similar to the above arrived recently in an ERP mailing list email – the original poster (OP) is using a SQL Server RDBMS, so that might change the proposed solution just a bit.

Here is the “How simple is too simple?” suggestion that I offered:

Periodically, create a VENDOR_SAVE table (drop it if it already exists, or delete all rows and re-insert from the original source table):

CREATE TABLE VENDOR_SAVE AS
SELECT
*
FROM
VENDOR;

Now, just wait a while.

The question then becomes, how do we detect:

A new row (record) added to the original source (VENDOR) table.

An old row (record) deleted from the original source (VENDOR) table.

A change to any column (program field) in the original source (VENDOR) table since the last time the VENDOR_SAVE table was created/refreshed.

Let’s start with the first two bullet points. I will write the SQL statements so that the statements should work with Oracle Database 9.0.1 and above, and SQL Server with very few changes, even if I feel a bit ANSI about doing so. I will use the COALESCE function, rather than the NVL function, and CASE syntax rather than the equivalent DECODE syntax.

To identify cases where a row has appeared in, or disappeared from the original source (VENDOR) table, we can simply perform a full outer join between the original source table and the historical mirror image of the original table (VENDOR_SAVE). We are only interested in cases where the primary key column (ID) is found in exactly one of the two tables:

SELECT
COALESCE(V.ID,V2.ID) AS ID,
COALESCE(V.NAME,V2.NAME) AS NAME,
COALESCE(V.ADDR_1,V2.ADDR_1) AS ADDR_1,
CASE WHEN V.ID IS NULL THEN 'VENDOR ADDED' ELSE 'VENDOR DELETED' END AS PROBLEM
FROM
VENDOR V
FULL OUTER JOIN
VENDOR_SAVE V2
ON (V.ID=V2.ID)
WHERE
V.ID IS NULL
OR V2.ID IS NULL;

So, the above SQL statement satisfies the first two bullet points. The third bullet point is a little more challenging to accomplish… unless of course we employ UNION labor. If we have two row sources with identical columns, and UNION the row sources together, the resulting row source will be absent of any entirely duplicated rows from the two original row sources (two rows will be reduced to a single row). If there were no changes to any of the column values (or if the row was added to or deleted from the original source table), there will be a single row for the primary key column value. If any columns were changed, there will be two rows containing the primary key column value.

Let’s build a SQL statement that UNIONs the rows from the two tables together, and counts the number of rows for each primary key value:

As a final step, we should join the two resultsets into a single resultset using UNION ALL:

SELECT
COALESCE(V.ID,V2.ID) AS ID,
COALESCE(V.NAME,V2.NAME) AS NAME,
COALESCE(V.ADDR_1,V2.ADDR_1) AS ADDR_1,
CASE WHEN V.ID IS NULL THEN 'VENDOR ADDED' ELSE 'VENDOR DELETED' END AS PROBLEM
FROM
VENDOR V
FULL OUTER JOIN
VENDOR_SAVE V2
ON (V.ID=V2.ID)
WHERE
V.ID IS NULL
OR V2.ID IS NULL
UNION ALL
SELECT DISTINCT
VL2.ID,
VL2.NAME,
VL2.ADDR_1,
'VALUE CHANGE' PROBLEM
FROM
(SELECT
COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
VL.ID,
VL.NAME,
VL.ADDR_1
FROM
(SELECT
*
FROM
VENDOR
UNION
SELECT
*
FROM
VENDOR_SAVE) VL) VL2
WHERE
VL2.CNT>1
ORDER BY
ID;

While somewhat limited in usefulness, the above approach will indicate which rows in the original source table should be examined because the column values in those rows changed (or were added or deleted).

Another, potentially more useful approach involves setting up a logging trigger and logging table. I previously shared a VBS script that helps to automate and standardize the process of creating the logging trigger and logging table. A very similar feature is built into my Hyper-Extended Oracle Performance Monitor program – but, much like the VBS script, this solution is useless for the OP who uses a SQL Server backend.

What are the other options? Oracle Database’s built-in auditing capabilities. Auditing built into the ERP system (this particular ERP system writes all audit/change records to a single table that uses a VARCHAR2 column to capture the before/after images of the values stored in columns). Any other options? (The mess that someone can create with a bit of idle time on their hands…)

It appears that the approach mentioned in your blog article is similar to my approach – thanks for leaving the link.

I remember thinking several years ago: “I need to create another silly logging table and trigger that is almost the same as the last 10 that I manually created?” As hard as I tried to standardize my naming convention, it seemed that every logging table that I manually created was just a bit different from the last. That is about the time that I decided to create the Configure Data Change Log feature in my program (https://hoopercharles.files.wordpress.com/2012/03/heopm_configure_data_change_log.jpg ).

Consider triggers on the table for all the DML actions to allow you to put in any logic to prevent “unauthorized” ( defined your way ) insert/update/deletes. This should work for any engine Oracle, Sybase, MSSQL

It is not also very clear whether you just want monitoring or prevent unauthorised changes going through. Certainly triggers can be used for either with rollback trigger if the action is not deemed authorised

Given that some prerequisites (flashback, undo retention, small data volumen, probably not too much undo for your check intervals in the whole database) are meet, one could regurlary substract (MINUS operator) the existing table data from the table data some time ago (and vice versa) with the AS OF syntax to check for changed data.

You could also use “Database Change Notification”, which is called “Continuous Query Notification” in 11g and available even in SE.

Nice ideas. “Continuous Query Notification” is a feature that I was unaware of – probably a sign that I need to stop reading/reviewing books, and instead read the official Oracle Database documentation instead. :-)

I see that your method makes use of two table triggers – does that present more of a performance overhead than a single trigger?

Two of your _HISTORY columns are defined as follows:

dml_usr varchar2(48),
dml_dat date

I would probably create those two columns with the following definitions so that I would not need to populate those columns in a trigger (that is not to say that this method is better than your method):

dml_usr varchar2(30) DEFAULT USER,
dml_dat date DEFAULT SYSDATE

I have not experimented yet, but could a DEFAULT value also be declared for your lv_module column?

What I have implemented in several real life applications is a little bit different from the example I mentioned in my blog. In these applications the table to be audited (emp here) contains the columns dml_usr, dml_dat, dml_pgm so that the historical table (emp_history here) doesn’t contain any trigger. As such there is no trigger at all in the historical table.

Yes, your suggestion to use DEFAULT values could be envisaged and might give the same results.

As per regards to performance, I can assure you this audit history has never caused us any noticeable performance problem. In addition I can tell you that what was initially implemented as an audit task (to maintain a change history), you could not imagine, how it reveals itself as a wonderful tool for debugging and understanding some bugs in PRODUCTION, just by looking at those dml_usr, dml_dat, dml_pgm (which in reality have been implemented as user_ins, dat_ins, pgm_ins and user_upd, dat_upd, pgm_upd)

I am not sure that I understand your question. My VBS script and EXE referenced above allow the user to select which columns of a table to log, and which columns that undergo changes will trigger the logging. The existing and the new values may be captured in the same logging table row. If you capture the existing and the new values, you can determine how the value has changed, and you can determine how the row changed with the assistance of the LOG_TRANSACTION_TYPE column in the logging table.

I think that there are a lot of options here. The first is prevention; stop the changes from being made; sometimes this is harder than it seems to do when specific privs and sweeping privs and privs via roles are taken into account. You should also use core audit toi detect attempted change (assuming its now locked down). Dont copy the table for two reasons. A copy is just as good to steal from. Imagine that the copy is not locked down and someone forces the app to read the copy instead of the real table by use of the owner account via the app or via alter session set current_schema….. or via private or public synonyms…..or….. the second reason is that you should not duplicate the data per se anyway as two copies then need to be secured generally. Other options include redo mining to detect change or to use flashback to detect change or if you already clone the database compare clone to real if you can open the clone. For me its lockdown, audit, triggers… this seems simplest.

I agree with Pete that having another table call it audit is also vulnerable for being changed by anyone who has access to it. Obviously there is a need to record these changes somewhere and excluding flast files a table seems to be the best option.

My understanding was the OP was working on MSSQL so log mining and flashback etc are more appropriate for Oracle. It is also important to note that audit of this type should be available easily. In other words it should be routine to access the audit information at any time.

To this end I would think having triggers on the main table with an audit table kep in a schema accessible to DBA users only will be best.

I also note that Charles wrote his SQL to be sort of generic (ansi complient) and database agnostic. I decided to write similar for Sybase that should work for MSSQL I believe. Anyway here we go:

First let us have a table with two columns only. The first column is identity (read sequence) and the second column is varchar(30). I kep all columns small for the sake of this demo.

inserted and deleted are temporary tables created in the transaction log (redo +undo) of database in the image of the original table. They keep after image and before image of rows much like :new and :old in Oracle. suser_name() is a function that has the login name of the process like user. Program_name and hostname are the application and hostname much like v$session.program and v$session.machine etc. These are all obtained from master..sysprocesses much like sys.v$session

You are correct that the OP was using Microsoft SQL Server – he was previously running Oracle Database but switched to SQL Server to save money.

My comment was intended to state that while I probably could not write a trigger in SQL Server (or Sybase) I could understand what your trigger was accomplishing (even though the trigger syntax appears a bit different than that used by Oracle). I used to be able to say the same about Fortran, COBOL, ARREX and a couple of other programming languages, but those languages are just distant now memories.

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:

<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: