If you are comparing two tables with similar columns, something like a transactional table versus a staging table, then perhaps you can add to both tables a persisted computed column derived from a CHECKSUM of the column list. Once done, add index on PID + the checksum column. If my assumption is correct, the join operation could potentially be covered by this 2 column index, which would be very efficient.

If you are comparing two tables with similar columns, something like a transactional table versus a staging table, then perhaps you can add to both tables a persisted computed column derived from a CHECKSUM of the column list. Once done, add index on PID + the checksum column. If my assumption is correct, the join operation could potentially be covered by this 2 column index, which would be very efficient.

If you are comparing two different tables, then when adding the checksum column, the column list and datatypes should be specified exactly the same to insure successful matching. I don't know what maximum number of columns are supported in the list, but it's far more than 20 at least.

In your JOIN or WHERE clause, you would still be doing a seperate exact match on the key column(s) like PID or ID. So if you want to write a query that compares tables A and B, both keyed on PID, and return those rows for which non-key columns are different, it would be something like this:

select *from Ajoin B on B.PID = A.PID and B.xchecksum != A.xchecksum;

For that reason, the index on each table would be on PID and xchecksum.

If you are comparing two tables with similar columns, something like a transactional table versus a staging table, then perhaps you can add to both tables a persisted computed column derived from a CHECKSUM of the column list. Once done, add index on PID + the checksum column. If my assumption is correct, the join operation could potentially be covered by this 2 column index, which would be very efficient.

curious_sqldba (5/16/2013)[quote]Ok makes sense, i am joining on a table and a view, i might have to add this column on all the underlying tables...

I don't know what lay beneath your view, but indexed hash columns are an efficient alternative to performing non-equal joins on non-key and non-indexed columns. It's useful in a scenario where you need to know if at least one of X columns differ, but you arn't concerned with which specific column.

curious_sqldba (5/16/2013)[quote]Ok makes sense, i am joining on a table and a view, i might have to add this column on all the underlying tables...

I don't know what lay beneath your view, but indexed hash columns are an efficient alternative to performing non-equal joins on non-key and non-indexed columns. It's useful in a scenario where you need to know if at least one of X columns differ, but you arn't concerned with which specific column.

If i join a view and a table, do i need to have that computed columns on all the tables used in the view?

curious_sqldba (5/16/2013)[quote]Ok makes sense, i am joining on a table and a view, i might have to add this column on all the underlying tables...

I don't know what lay beneath your view, but indexed hash columns are an efficient alternative to performing non-equal joins on non-key and non-indexed columns. It's useful in a scenario where you need to know if at least one of X columns differ, but you arn't concerned with which specific column.

If i join a view and a table, do i need to have that computed columns on all the tables used in the view?

Now it sounds as if the table you're updating, what you're calling MyTable in the provided example, is a denormalized and it's columset is derived from multiple other tables joined behind the scenes in a view.

If that's the case, then you'll need to add a column computed on checksum() on MyTable, and then index it on ID and checksum. You'll also need to add the same computed column on the view, but that one can't be persisted or indexable. Still it will be an improvement over what you're doing now comparing 20+ columns against each other.

Lynn Pettis (5/16/2013)I may be wwrong, but since all the columns used to update the table come from the columns defined in the view, I would say the computed column needs to be added to the view.

This is really sounding like a big denormalized table with columns derived from multiple other tables, and now there is a need to keep it in sync with updates.

An alternative to performing perioddic mass updates would be to implment insert / update triggers on the base tables, which would then insert and update columns in the denormalized table incrementally as changes occur.

Or perhaps that big table could be something like an indexed view rather than an actual table.