Best Practice for Data Error Handling in Informatica?

We have a business requirement to do a detail error handling on source data. If a source record violates data integrity rules then that has to be loaded into an error logging table.

We did this using router transformation.

Now the user is saying they need to know which column or combination of columns violated data integrity rules, so we have to put some error message accordingly in the error logging table in another column. The contents of the erroneous record are to be put into one column in the same table and in the same record.

Hi,
More than best practise on error capture it's more to do with requirements and what we want.
Since you have already implemented a router and passing the invalid records to a table it's quiet simple to capture the reason in another column.

For this to implement create a expression after router (invalid records) and for rejectreason you can give the description, you can use decode (its better, same condition which is categorize to flag as invalid) for example for foreign key violation you can use
ISNULL(dociD), 'DoctId not found in Doctable' .
In reject code just keep as 'Forignkey" and session startime to your rejecttimestamp.

The above way can give you description as well as rejection category and auniform way to capture all rejects.

I prefer instead of table to store ina file say xcel and send it to users automatically. So if there are no invalid records you won't get any email otherwise we need to query this table or build a report for the purpose.

Validate the column flag it accordingly based on flag vale capture the error records.
Validating each column and capturing as single error record is not correct way.
Suppose you are doing validation for 5 columns and all five are failed you should put 5 different error records in the error table which will be very clear way of analyzing the error
Even though table will grow it will provide clear message on errors.

I think Varghese is on the right track. Global IT Solutions (GITS) has had a lot of success in the Data Quality realm by taking a mulit-faceted approach when it comes to Data Quality.

Set Up

a) Identify the degree of importance and subsequent action data item/error type. For instance, if you may issue a warning for some fk-attributes that do not have parents and reject records for others. It is interesting to note that the 'type' of data errors is not very high, so this tends not to be a huge exercise. Moreover, each ETL data - error can usually be categorized into a meaningful error-type-category (e.g., Missing Parent, Data Type Mismatch, etc)

Run Time
When an error occurs, the cost-effective method of handling DQ is to capture the error type , column name, record# in the file, user who last updated the record, etc in a custom error structure. This is 'outside of the box' programming. By default, most programmers simply trap the error in an error log or track whether all the records loaded properly. Subsequently, users have to manually read through logs and figure out what went wrong.

GITS recommends a different approach. A few additional DQ structures are used to track the type of errors, and info associated with the File and who should be notified. The interesting thing is, the DQ Code is re-usable, so you write it once and paste into your ETL code across applications, you also include these error DQ structures in addition to your normal Error Log

Post-Mortem DQ Management
The goal of DQ should always be to improve over time. If you capture errors in a smart fashion, generating reports that capture metrics associated with errors is child's play.

Google William Moore, Data Architect if you need more info on this approach. Our website provides a high level description of this methodology.