Oracle Data Warehouse Design and Architecture

Menu

TIMESTAMPs, Old Fogeys, and Data Warehouses

I scanned the structure of a data warehouse staging area yesterday, and developed a mild nervous tic. The cause: finding TIMESTAMP(6) data types scattered around willy-nilly.

Firstly, my natural caution and fear of the unknown was aroused by seeing that microsecond timings were being stored, even if it was to log times of inserts and updates of rows and not related to actual business data. My preference is usually to log all changes associated with a single load process with a single timestamp — in Informatica I derive that from the session timestamp so that it is a little easier to lookup the mapping execution that performed the load. If I want to know about performance of the order in which rows are inserted or updated then I’d probably go for turing on verbose logging in the workflow — I’ve rarely found the need to do so.

Secondly, and this is a more long-lasting and disturbing effect, have I officially turnded into an old fogey now? “I’ll hold no truck with these new-fangled data types … DATE was good enough for me, and it’ll be good enough for these young whippersnaper … if the good lord had intended us to measure in microseconds …” etc etc. Maybe I’m still having trouble letting go of the SQL*Plus COPY command, which I used just yesterday to move data from one test database to another, and which probably saved me about a week of messing around otherwise. TIMESTAMP is not supported by the sadly-deprecated COPY command.

I get really turned off when I see a TIMESTAMP data type. I had really bad luck with them when doing a bunch of export/imports and they were slow as molassas. Also the developers think they are a great idea until they have to CAST…AS DATE everytime they want to do something basic. Yuck, but everybody knows the developers know more about the database than the DBAs.

One sad followup is that I requested the replacement of all of these abominations from the schema, but the response I got from the development team suggested that they needed it in order to store hours, minutes and seconds. I didn’t pursue that line with them because the comment was withdrawn pretty quickly, but I wonder whether the 6 in the TIMESTAMP(6) specificaton was actually intended to store hours, minutes and seconds, rather than microseconds. It’s an intriguing possibility.

On the same day I had a comment from another developer that it was really hard to work with those columns because Informatica converted them to character strings, so point proven on the compatibility issue.

However that same developer was writing interesting code along the lines of:

TO_CHAR(update_dt,’MM/DD/YYYY’) > TO_CHAR($$start_date,’MM/DD/YYYY’)

After sending a corrective email I spent quite a while ranting loudly round the office and frothing at the mouth before I realised that he was sitting just a couple of cubes away.

Could be I suppose, but if it encourages people to use TIMESTAMP instead of DATE then I’d be wishing there wasn’t. It’d be a bit of an odd feature — deploy your table creation scripts on one database and get different data length to the deployment on a different database.

We use TIMESTAMP(6) lots of places, especialy in application where we trace bits of transactions – ie. BPM applications, MessageBrigding – where it matters to the end users if a 2 second duration (DATE) actually IS 2 seconds or just 10 ms because it spans the second limit.

We have not had any problems with it – but we had to read the manuals before doing date arithmatic gymnastics on them though … but that never hurt anyone, did it?