I've reviewed some of Date's recent writings on 1NF and have concluded that the table from the article in fact does violate 1NF, but not for the reasons others have mentioned.

I came close when I mentioned that since there is exactly 1 value at each row/column intersection for the strDescription column, the table was in 1NF (this relies on the assumption that the picture in the article is a representative sample of the data).

However, I ignored the strCode column, which allows NULLs. Since a NULL is not a value, the column does not have exactly 1 value in that column for every row. So not only can you not have more than one value (of the particular domain) for the column in a row, you can also not have less than one. ("Exactly one"). Therefore, the table isn't in 1NF.

In the SQL world, we've gotten used to accepting NULL, though, so it doesn't bother us to talk of tables being in 1st (and higher) normal forms even when they allow NULLs in some of their columns.

To close, this quote on 1NF is from "An Introduction To Database Systems (7th Edition)", Chapter 11, section 3, by Date:

"Firstnormalform: A relvar is in 1NF if and only if, in every legal value of that relvar, every tuple contains exactly one value for each attribute."

Well now you've gone and started another holy war! While I agree with Date's findings for the most part, I'm not quite prepared to accept his complete prohibition on NULLs, particularly since our current products are SQL (as opposed to relational).

I try to minimize their use, but since there is no workable alternative available, sometimes you have to deal with NULL.

I know that I'm allowing physical implementation to dictate logical design, but until the TransRelational DBMS becomes available and if it fulfils its promise, we really have no choice.

/*****************

If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

If we accept NULL as a "value" that has some kind of meaning, or that we can at least "deal with", and we assume that the ordering of the columns and rows aren't important, then we might accept that in some sense a SQL table is a faithful representation of a relation... or at least as close as we're going to get with our current products.

From that perspective (which is where I was coming from originally), the article's table is [probably] in 1NF. It's just that, as stated earlier, we're dealing with an overly broad domain for the Description column, namely, something like "the set of all values that are rendered for display as choices within a particular application's interface".

Per the implementation in the article I can add the row (16, 'CA', 'CALIFORNIA', 'Postal.State') even though the row (16, 'CA', 'CALIFORNIA', 'Customer.State') already exists. I'm now storing the same information multiple times in the same database - in the same table even. Talk about your potential for "data drift."

Additionally I can add (28, NULL, 'CALIFORNIA', 'Office.State') to the same table. I can also add 1,000 other rows for "CALIFORNIA".

Since the example in the article doesn't provide the DDL, you might assume that the key has been (or could be) defined across strDescription and strCategory (who names their columns like this?). In all likelyhood, those who use MUCKs are going to declare the key on the intID column, but let's just pretend that this person knows enough to at least declare a unique constraint on the real key...

If that's the case then you also have to assume that there is a valid reason (we're still pretending here) for wanting to maintain the customer.state separate from postal.state. In which case, the two categories would be stored in separate tables in a properly designed database and you could have 'California' in each one.

This just highlights another logical problem with MUCKs. If a "code" in a real database truly can be applied to multiple entity types, then it's easy enough to declare multiple FK relationships. In a MUCK that has a category column it is very messy to accomplish the same thing.

/*****************

If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

As long as we're beating the dead horse, might as well break out the shotgun:

I'm thinking along similar lines as you with the declaration of the primary key. We obviously can't use strCode as a Primary Key or part of a primary key (nullable). Ditto on the naming conventions - VB6 programmers would be the only people I could think of who would name their columns like this - because of an MS aberration of the so-called 'Hungarian notation'.

But anyway, if we declare the PK as (strDescription, strCategory), we're using the data itself as a PK. Sort of defeats the purpose of a PK and any potential optimizations it might provide.

And I agree that there's no reason to store the state information multiple times; although with this type of design you can expect a programmer to add the same attributes into the table multiple times rather than try to hunt through the table for existing codes that might be applicable. If you are listing States that have offices for your company - as opposed to a complete listing of all states - you will duplicate the data over and over using this method.

Since this board's quote feature only shows the first part of the post, here's the part I'm addressing:

"But anyway, if we declare the PK as (strDescription, strCategory), we're using the data itself as a PK. Sort of defeats the purpose of a PK and any potential optimizations it might provide."

This looks like a confusion of logical-physical (the result of SQL implementing PK's as physical indexes). As a logical construct, a PK has nothing to say about performance, and nothing but the data [loosely speaking!] could inform any of the candidate keys which may be designated as the PK. (I may be misunderstanding -- if by "data" you are referring specifically to the strDescription column and no other, then I see your point).

When we talk about the index SQL uses to implement the PK, though, then it may be time to be concerned about the size of the columns involved.

By the way, Don's point about the difficulty of assessing the design in the absence of the actual DDL for the table is a good one. Has everyone noticed how often we have to use the word "assume"?

I was going to ask Mike to clarify that comment too, because if I read it correctly, it betrays a pretty serious misapprehension of both the logical need for keys and the physical implementation of them. But I wanted to give him the benefit of the doubt.

/*****************

If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

Sorry about the confusion. Yes I am talking about using the column strDescription as the PK, or even as part of the PK.

If we imagine a single PK in the generic reference table as being ('CA', 'CALIFORNIA'), then relating it to another table we have to include both columns in the FK; to me this defeats the purpose of the PK/FK system. Might as well go all the way and not even create a lookup table... If you're repeating 'CA', 'CALIFORNIA' in two columns of every California address in an Address table, what's the point of even having a reference table that tells us 'CA' is the code for 'CALIFORNIA' (possibly telling us this multiple times, as well as telling us that 'CA' is the code for 'CANADA' and 'CA' stands for 'COMPUTER ASSOCIATES', all in the same table)?

When I mentioned "optimization" (admittedly not the best choice of words since people automatically assume speed, storage and other compiler "optimizations"), I was thinking in terms of optimal logical design. Basically what normalization provides by removing repetitive data, transitive dependencies, etc. from your database design. The physical side effects in some cases *might* include physical speed, indexing, query compilation and/or storage space improvements as well; though (AFAIK) physical optimizations are not necessarily a requirement of the ANSI SQL Standard.

Sorry for the misunderstanding. As I pointed out in another post, if you are using the strDescription column as any part of the PK, then you will be duplicating the strDescription column in referenced tables under the PK/FK system. Repeating your data in tables throughout the database pretty much defeats the logical need for keys and reference tables; if you're going to do that, you might as well revert back to COBOL or flat files and call it a day.

Basically if you're going to abolish the "relational" from your "relational database" by finding ways to circumvent the benefits of the system, you may as well use another data storage and retrieval system.