Let SQL Developer Normalize Your Tables For You

SQL Developer is a pretty mouse-heavy tool. By that I mean that many of the functions are accessed via mouse-clicks. When browsing tables in the tree you may have noticed that you can expand a table node to display the table columns.

Tables in the Connections Tree

There are some pretty cool features available under the context menu when clicking on a table column:

Table Column Context Menu

Some of these actions are available under the table right-clicks, but a couple are not, ‘Encrypt’ and ‘Decrypt.’

One that you may find useful is a dynamic table refactoring feature called ‘Normalize.’

Normalize a Table Column

Sometimes you find a table that has all the data in it. Like a column called ‘Country.’ And instead of having a countryID which points to a child country table record via a foreign key constraint, they just have the text value as part of the ‘master’ record. This isn’t ideal for many situations but it can be a pain to fix after-the-fact.

This SQL Developer feature will ask you a few questions then generate a dynamic script to split that data out to a new table, and populate new IDs in the look-up table.

Supply sequence and trigger name, we do the rest!

Wait, this won’t work! I’m trying to do this on a partitioned table. Lots of things you can’t do column wise when a table is partitioned. So let’s try again on another table – my iTunes table I created in my earlier ‘Free Beer’ post.

Not sure if they’re really “normalisation” actions but I can think of a few table modifications that I’d categorise as “refactoring”:

* split a column into two (given a delimiter character or string), and vice versa; optionally add virtual column(s) to preserve the original column(s)

* split off multiple columns to a separate table (e.g. if I have a “persons” table with some “address” fields which we want to split off into a separate “person addresses” table) – of course, this is just an extension on what you have already.

* “unpivot” – say I have several columns in a table that are the same thing – e.g. Person.Home_Number_1, Home_Number_2, Work_Number, Mobile_Phone_Number, Fax_Number; and I want to split these off into a separate table, which will have a “Phone_Number” column as well as a type column (‘Home’/’Work’/’Mobile’/’Fax’).

* “pivot” – opposite to the above, of course 🙂

* perhaps some transformations that act on an object type column – e.g. to remove the object type and replace with ordinary columns

Just ideas, not sure if all of these are necessarily going to be generally useful.

looks interesting but I have a pit to nick with the generated code, this bit particularly:

if inserting and :new.ARTIST_ID is null then

which is part of the trigger to assign a new artist_id from the sequence. I personally do not like code that allows for subtle future bugs. The code should read:

if inserting then

As it exists, it’s allowing the user to supply a value for the artist_id and as long as that is above the highest value already supplied by the sequence, all is well, until the day that the sequence supplies that same value and the trigger blows up.

Basically, if you use a sequence to supply an ID, the user must not be allowed to ever supply one manually. And by user I also include the application(s) using the data of course.

To be fair, I’d say that it doesn’t matter whether SQL Developer generates the code that way or the alternative, you’re going to have the potential for subtle bugs anyway – so they may as well aim for something that will be most generally useful.

If the trigger doesn’t check if ARTIST_ID is null, and the application were to get set ARTIST_ID from the sequence, the trigger would go ahead and overwrite it with a new value from the sequence; now, if the application uses the original id elsewhere (e.g. for saving into a child table), it will fail (e.g. with foreign key violations).

I’m not saying this is the right way to go, and I’m not necessarily disagreeing with the objection, but either way might be legitimate in certain circumstances – at the end of the day, anyone who blindly accepts the code generated by SQL Developer as is without understanding it, validating it and testing it, deserves what they get 🙂

I don’t trust any code that is written for me, not without verifying it first. I’m a super-proponent of transparency and I want to know what the tool is doing for me so I can approve or improve or even reject it outright.

So if we don’t call it normalize, what SHOULD we call it? I’m not sure that ‘refactor’ on a table object would really tell the user what to expect, whereas I think the ‘normalize’ wording is almost self-explanatory.

Agreed, a menu option that merely says “refactor” is not good enough – in my head was a normalisation/refactoring submenu or popup dialog which allows the user to choose what modification to make to the table.

My main beef with the “normalise” label is that there are a number of different changes I might want to make to a table to make it “more normalised”. What SQL Developer does right now is just one example – perhaps in the future further options will be added that assist in refactoring/normalisation.

In other words, a developer who picks the “normalise” menu option would expect to result in a table that has been “normalised”, as if this feature is the only thing you need to use to make the table such. Whereas, a developer who picks a “refactor: create parent table from column” menu option would have a better idea of what they’re going to get from it. I think it’s more intuitively obvious that the outcome of refactoring is a new set of code; that might subsequently be the subject of further refactoring.

how exactly did the application determine that that value was available? Fine if it got the next value direct from the sequence, but I’ve seen multi-user systems just check the value from user/all/dba_sequences! Go figure! (Yes, of course the app owner schema “needed” dba privs to work!)

Also, exposing the SQL is a good thing in my opinion, it’s how I was able to “nick that pit”, but I wonder how many DBAs using the feature will have less of a programming background than I do, and will simply accept what is offered, without question?

Sequences break? Surely not! 😉 I’ve seen them break in test when a database refresh was done by dropping the tables, view etc, but forgetting to drop the sequences and importing with ignore=y. Oops!

My all or nothing approach works fine for me, I admit, other mileages may vary as we all know. However, it needn’t be the application that sets up a forthcoming failure, it could be a DBA/developer script that decides to shove in a new value. I’m just interested in protecting my data.

Still, the feature under discussion seems like a good one, so I’m not completely against it! 😉

I agree with Andrew Wolfe, this is refactoring, not normalisation. To expand: SQL Developer appears to provide a tool that helps to refactor a physical data model in *one particular way* in order to deal with *one* particular symptom of denormalisation (3NF). In fact, the Relational Model doesn’t actually prescribe that you must have surrogate keys for all entities – you can have an “Country Name” right there in your Brewery relation and still satisfy 3NF – you just need a second relation “Countries” with a key on “Country Name”. It’s more for convenience (and to get around the limitations of the DBMS) that we use a surrogate key (e.g. “Country ID”) instead.
I suspect the opening paragraphs of the wikipedia article cited are not as precise as they could be (but probably simplified for general consumption).

I’m going to nitpick here, only because this is a pain point for me. It takes knowing the semantics (meaning) of a fact to know how to normalize a relation. So in this case you, a human, found a column that needed to be normalized. SQL Developer seems to do a wonderful thing in helping you implement that decision. But you did the normalizing.

I see too many people making arbitrary statements like “X number of joins is too many” (for what?) or “a table can only have 5 relationships” (what DBMS is that?). Database design requires a brain and we seem to be lacking in those in the data world….