Here you will find posts that are mostly about my work as an Oracle DBA. There may occasionally be other topics posted, but by and large this blog will be about Oracle and other geeky IT topics. Perl will likely be mentioned from time to time.

Thursday, March 05, 2009

About a year ago I worked on collating and transforming data from an application so that it could be imported into another app. I've performed this exercise a number of times in the past 20 or so years, and while it is never boring, it is sometimes quite challenging.

Oft times when trying to make the data suitable for inclusion in the new applications, I ask my self "What were they thinking?"

I will leave the answer to that up to your imagination, as my answers to that particular question are not always complimentary.

One of the problems run into is when the data modeler and database designer chose to allow data with encoded special meanings.

In other words, there is key data in the database, the meaning of which can only be ascertained by asking someone that knows, or finding it in the documentation (Is there documentation?)

The example I will use is a real one I ran into.

Given a table containing projects, and revisions of those projects that are indicated as such by a revision "number".

A sensible series of revision numbers might be a numeric range beginning with 1, or even an alpha range beginning with 'A', or even a combination thereof.

Personally, I prefer the unambiguous nature of an ascending numeric range. 2 is always greater than 1. There is no case involved as there would be with an alpha range.

Which is greater, "a1" or "A1"? You know how this will sort in the database, but will it be what the users expect?

While a numeric range would have been preferable, the data I was given to work with used a range of revision "numbers" that was numeric and alpha, with a numeric value being regarding as greater than the "numeric" value. The "numeric" is in quotes, as this obviously must be stored as a string, and in this case with a leading zero.

Given this rule, a project with revisions of A,B,C,D,01,02 would have a most recent revision of "02". This is not the way it works in the standard collating order in the database.

In a perfect world, the data modeler or data architect would work with the users to create a reasonable versioning method. In this case however there is no choice but to work with what I was given.

From the data provided, only the most recent version was to be included in the imported data.With a versioning system that doesn't follow the normal collating order, this requires some code to determine what is really the greatest version.

If you know a method to accomplish this in straight SQL, please feel free to post it. I could not think of a pure SQL solution.

The following generic function was created to determine the greatest revision value given a table name and the key columns.

-- maxrev.sql-- return the maximum numeric revision-- returna alpha if no numeric exists-- maximum rev is numeric-- revs are alpha - mixed number and character