Subscribe to this blog

Follow by Email

Simplifying your data validation code with Database 12.2

Doesn’t matter who much testing you do (well, it actually does but that’s a whole different issue) you can almost guarantee that at some point your beautiful data validation code, that parses data input from a web form or loads data from some external file, will pop up with the error:

Of course, what’s is really annoying at this point is that you don’t know which column value of the record failed (assuming that you have more than one numeric column)

Managing conversion errors during data loads

What’s to do? Of course the sensible thing is to add lots of data validation checks into your code to try and catch the situations where the wrong type of data arrives from your data source. It’s likely that all the additional validation checks will slow down the process of inserting data, which is not a great result.

If your data is arriving via an external file then you can use the BADFILE clause to capture records that cannot be loaded because of data type errors. But what if the data source for your insert statement is a staging table that was populated by an ETL job or a series of values from a web form?

How to manage conversion errors during INSERTs

Panic over - Database 12c Release 2 contains important changes to the CAST and TO_xxx functions to manage the most common data conversion errors. The CAST function now has the ability to return a user-specified value if there is a conversion error.

I can deal with this situation in a couple of different ways. Firstly let’s try and discover which rows and columns in my staging table contain values that are likely to cause data conversion errors. To do this I am going to use the new VALIDATE_CONVERSION() function which identifies problem data that cannot be converted to the required data type. It returns 1 if a given expression can be converted to the specified data type, else it returns 0.

SELECT VALIDATE_CONVERSION(empno AS NUMBER) AS is_empno, VALIDATE_CONVERSION(mgr AS NUMBER) AS is_mgr, VALIDATE_CONVERSION(hiredate AS DATE) AS is_hiredate, VALIDATE_CONVERSION(sal AS NUMBER) AS is_sal, VALIDATE_CONVERSION(comm AS NUMBER) AS is_comm, VALIDATE_CONVERSION(deptno AS NUMBER) AS is_deptno FROM staging_emp;

this produces a table where I can easily pick out the rows where the data conversion is going to succeed (column value is 1) and fail (column value is 0):

I could use this information to filter the data in my staging table as I insert it into my EMP table or I could use the enhanced CAST and TO_xxx functions within the INSERT INTO ….. SELECT statements.

The CAST function (along with TO_NUMBER, TO_BINARY_FLOAT, TO_BINARY_DOUBLE, TO_DATE, TO_TIMESTAMP, TO_TIMESTAMP_TZ, TO_DSINTERVAL, and TO_YMINTERVAL functions) can now return a user-specified value, instead of an error, when data type conversion errors occur. This reduces failures during an data transformation and data loading processes.

INSERT INTO emp SELECT empno, ename, job, CAST(mgr AS NUMBER DEFAULT 9999 ON CONVERSION ERROR), CAST(hiredate AS DATE DEFAULT sysdate ON CONVERSION ERROR), CAST(sal AS NUMBER DEFAULT 0 ON CONVERSION ERROR), CAST(comm AS NUMBER DEFAULT null ON CONVERSION ERROR), CAST(deptno AS NUMBER DEFAULT 99 ON CONVERSION ERROR) FROM staging_emp WHERE VALIDATE_CONVERSION(empno AS NUMBER) = 1;

which results in five rows being inserted into my EMP table - obviously this means that 4 rows were rejected during the insert process (rows 1, 4, 6 and 8) because they contain errors converting the contents to a number for the empno key. Here is the data that was loaded:

we can see that on row 1 the HIERDATE was invalid so it was replaced by the value from sys date (07-JUL-16). Row 2 the value of DEPTNO is the conversion default of 99 and on row 4 the value for MGR is the conversion default of 9999.

Conclusion

The enhanced CAST function (along with TO_NUMBER, TO_BINARY_FLOAT, TO_BINARY_DOUBLE, TO_DATE, TO_TIMESTAMP, TO_TIMESTAMP_TZ, TO_DSINTERVAL, and TO_YMINTERVAL functions) can help you deal with data conversion errors without having to resort to complicated PL/SQL code or writing data validation routines within your application code.

The new VALIDATE_CONVERSION() function can be used to help you identify column values that cannot be converted to the required data type.

Comments

Post a Comment

Popular posts from this blog

This post covers one of the new SQL performance enhancements that we incorporated into Database 12c Release 2. All of these enhancements are completely automatic, i.e. transparent to the calling app/developer code/query. These features are enabled by default because who doesn’t want their queries running faster with zero code changes?

So in this post I am going to focus on the new In-Memory “cursor duration” temporary table feature. Let’s start by looking at cursor duration temp tables…Above image courtesy of wikimedia.org
What is a cursor duration temp table?
This is a feature that has been around for quite a long time. Cursor duration temporary tables (CDTs) are used to materialize intermediate results of a query to improve performance or to support complex multi step query execution. The following types of queries commonly use cursor duration temp tables:WITH Clause and parallel recursive WITHGrouping SetsStar TransformationFrequent Item Set CountingXLATE
What happens during the …

….and now it’s here in PDF format as well!The free big data warehousing Must-See guide for OpenWorld 2017 is now available for download in PDF format - click here, and yes it’s completely free. This comprehensive guide covers everything you need to know about this year’s Oracle OpenWorld conference so that when you arrive at Moscone Conference Center you are ready to get the most out of this amazing conference. The guide contains the following information:Page 8 - On-Demand VideosPage 17 - JustifyYour tripPage 19 - KeyPresentersPage 41 - Must See SessionsPage 90 - Useful MapsChapter 1 - Introduction to the must-see guide.Chapter 2 - A guide to the key the highlights from last year’s conference so you can relive the experience or see what you missed. Catch the most important highlights from last year's OpenWorld conference with our on demand video service which covers all the major keynote sessions. Sit back and enjoy the highlights. The second section explains why you need to atte…

MATCH_RECOGNIZE and predicates
At a recent user conference I had a question about when and how predicates are applied when using MATCH_RECOGNIZE so that’s the purpose of this blog post. Will this post cover everything you will ever need to know for this topic? Probably!
Where to start….the first thing to remember is that the table listed in the FROM clause of your SELECT statement acts as the input into the MATCH_RECOGNIZE pattern matching process and this raises the question about how and where are predicates actually applied. I briefly touched on this topic in part 1 of my deep dive series on MATCH_RECOGNIZE: SQL Pattern Matching Deep Dive - Part 1.
In that first post I looked at the position of predicates within the explain plan and their impact on sorting. In this post I am going to use the built in measures (MATCH_NUMBER and CLASSIFIER) to show the impact of applying predicates to the results that are returned.
First, if you need a quick refresher course in how to use the MATCH…

Keith Laker

Keith Laker

Disclaimer

Opinions expressed are entirely my own and do not reflect the position of Oracle or any other corporation. Do NOT take anything written here, unless explicitly mentioned otherwise, to be Oracle policy or reflecting Oracle's support policy.

About Me

I have been working with Oracle data warehouse technology for over 20 years working on a wide variety of data warehouse projects both as a consultant and an onsite support engineer. I am now part of the Data Warehouse Product Management Team where I am responsible for analytical SQL. I am based in the UK at our Manchester office.

A key part of my role is to work with our sales teams to brief our customers on data warehousing and analytical SQL: explaining the wide variety of new and exciting opportunities that our DW and analytical solutions can support.

I regularly deliver sales training for data warehousing and analytical SQL across all our sales regions and provide competitive intelligence support across all the major data warehouse vendors.