There is an abundant amount of information on the posts of this thread but I'm affraid most people wouldn't read it from start to end then same queistions, issues, comments and answers can be found once and again.

Don't you think it is time to close this one and start new, more specific threads on data normalization?

Just my two cents.

Heh... nah... that would normalize the thread. Besides, it's only 3 pages long when you have it set to return 50 posts per page.

--Jeff Moden"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T."--22 Aug 2013

Tom.Thomson (5/10/2011)Well, presumably one of the applicable values that a-null could be is 0, and it surely is not true that 0/0 delivers 0 (it should deliver something like "error: indeterminate value" or "error: zero-divide" if it's a bit less good at distinguishing between errors);

Yes but that's just a neat way to sidestep the real problem we were talking about. If I'd said z * x = 0 instead of z/x = 0 would you be any more likely to agree that MVL without tautology detection is just an accident waiting to happen?

No. A better solution to that might be to define the results of operatins on NULL more carefully - perhaps so that NULL/0 throws an error and NULL*0 is 0. But the best solution of all is for the programmer to be aware of what he is doing - if he is handling data than may involve nulls, he should know that, and write (x=0) or (z=0) instead of z*x=0, or with more complex queries perhaps resort to maybe-joins, maybe-unions, and maybe-restrictions and use their differences from the true-join, true-union, and true-restrict to indicate where there are cases that he needs to take care of. Of ourse this means that programming a database system that allows NULL is difficult, but so are many programming tasks, and all the anti-null rhetoric in teh world does not detract from the fact that the avoiodance of NULLs is not always possible in the real world because we don't always have complete information and restructuring for NULL-avoidance will often result in a schema so complex with any serious application requiring extremely complex joins so complex withextremely complex mutiple record-set (aka derived relatin) results that makes database programming both at the relational calculus (whatever replaces SQL) level and at the application level (which has to handle the complex result structure) a dreadful nighmare and ensures that no currently understood technology could deliver decent performace from such a schema.

You have answered my main argument about any hypothetical system which can support both Normalization and nulls at the same time:

it won't be SQL and it won't be RM2. Nor will it be RM-T.

In other words it is none of the sytems that people are likely to be talking about on SSC. If you have your own definition of such a system then please publish it somewhere so that I can understand and scrutinise it properly. The space in this forum is certainly too small for that - after all Codd wrote 500 pages and still didn't manage it! Until you've done that I'm going to remain as sceptical as anyone else with a scientific mind should. Myself and millions of others who know of no such system are going to stick with what is presently known: that normal forms as originally defined have nothing to do with any table with a null in it. [/quite]I'm fed up with hearing that last assertion. Heath and Codd were discussing join operations that included in the output rows that had no match in the partner relation, and the concept of a NULL value to enable these outer joins, and discussing also what was later called "outer union" which also required NULL, back in 1970 and 1971 while they were working on normal forms; Codd makes that quite clear in his "extending the model to capture more meaning" paper. You can't find any work on normalisation earlier than that carried out by Codd and Heath, and they believed that some sort of NULL would be needed for technical (relational model related) reasons when they were doing that work; Codd's later formalisation of NULL was the result of his realising that this technical NULL could also be used to attack the missing data problem.And I will state once again, for the record, that nothing in the definitions of 1NF, 2NF, 3NF, EKNF, BCNF, or PJ/NF is influenced in the slightest by permitting NULL values for non-prime attributes (attributes that are not in any candidate key), and point out that I've shown you how this works for PJ/NF and that working for PJ/NF implies that it works for all the others. If you have a reutation of the outline proof I posted before perhaps you could post it?

[quote]For 5NF and PJ/NF (why list both, when they are the same thing?

According to some sources (e.g. David Maier, Terry Halpin and this) they are not the same thing. I mentioned PJ/NF to make it clear that I was talking about Fagin's original definition and not any other.

Ok, I see that 5NF (like 6NF) has too many definitions. I need to look at the three variants of 5NF that a quick skim of that paper suggests exist and see what implications, if any, the two extra variants have for non-prime attributes allowing NULL.

There is an abundant amount of information on the posts of this thread but I'm affraid most people wouldn't read it from start to end then same queistions, issues, comments and answers can be found once and again.

Don't you think it is time to close this one and start new, more specific threads on data normalization?

Just my two cents.

Heh... nah... that would normalize the thread. Besides, it's only 3 pages long when you have it set to return 50 posts per page.

Let's push the concept a bit forward and go back to the "single table with no indexes database model" ... con't complain, everything is in there, hello?!

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

There is an abundant amount of information on the posts of this thread but I'm affraid most people wouldn't read it from start to end then same queistions, issues, comments and answers can be found once and again.

Don't you think it is time to close this one and start new, more specific threads on data normalization?

Just my two cents.

Probably a good idea; this thread was hijacked to present the fundamentalist anti-null view in the 5th message, and has never really recovered - apart from a brief ddiversion into the definition of 1NF the whole thread has been about that single pointless debate.

Then the address_line_index represents an attribute by which I may choose to meaningfully order the rows when selecting from the table, but does not represent an inherent ordering to the rows in the table. The conceptual relation for this table itself is thus "unordered".

Similarly, would it violate normal form to have a log table, of the form:

create table event_log ( event_datetime datetime not null, ...)

Where events are inserted into the log in date ascending order (because this is the order of events as they occur)? I don't think anyone would say this violates 1NF but, just as before, we can select the contents of the table in a meaningful order using the event_datetime column.

I take Gus's point on update atomicity and agree that this poses a problem for the addresses structure. I suppose one could argue that the address_line_index, being a well defined domain, contains meaningful values and it would make no sense to insert a new row between 1 and 2. Having said that, this also suggests that "line 1" really means "the street address" part, and "line 2" means "the suburb part", which begs the question as to why they are being stored on different rows as opposed to separate attributes on a single row (returning us full circle to the null/combinatorial explosion discussion).

Ultimately, this "row per address line" structure seems like a kind of specialized entity-attribute-value structure. The specialization, whereby all attributes are for one entity and all attributes are related (common use of the term), makes this look less "EAV-like" than a more common, generic EAV table.

My own personal opinion? This table is in 1NF, just as the event_log table is in 1NF. Our ability to extract rows in a meaningful order does not violate the principle that the relation is unordered. I do not, however, think that the EAV pattern is a good one, and the point Gus made about update atomicity is certainly one reason for that.