SQLServerCentral.com / / Latest PostsInstantForum.NET v99.99.99SQLServerCentral.comhttp://www.sqlservercentral.com/Forums/notifications@sqlservercentral.comFri, 09 Dec 2016 07:30:47 GMT20RE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspx[quote]One other question: why would you want the surrogate to be the primary key in the table that defines it? [/quote]The best example is a value domain table (I call them lookup tables. Others I've seen: options table, domain table) used for restricting an attribute's value to a pre-defined set of values (CHECK constraint in SQL Server, ENUM in MySQL and other programming languages):[code="sql"]CREATE TABLE [dbo].[zIdeaPriority]( [id] [int] IDENTITY(1,1) NOT NULL, [shortDescription] [varchar](15) NOT NULL, [longDescription] [varchar](255) NULL, [visibleFlag] [bit] NOT NULL, /* for deprecating old values with referencing rows in a main entity */ [ordering] [int] NULL, /* order to display in presentation layer */ [appKey] [varchar](20) NULL, /* magic value hard coded (w/ lookup func) in app */ CONSTRAINT [PK_zIdeaPriority] PRIMARY KEY NONCLUSTERED ([id] ASC), CONSTRAINT [IXU_zIdeaPriority1] UNIQUE NONCLUSTERED ([shortDescription] ASC) CONSTRAINT [IXU_zIdeaPriority2] UNIQUE NONCLUSTERED ([appkey] ASC))CREATE CLUSTERED INDEX IXC_zIdeaPriority2 ON [zIdeaPriority]([ordering],[shortDescription] ASC)[/code][Aside: To separate main entity tables from tables use to support application presentation, I prefix lookup tables with the letter z. Also, the appkey is the magic value in application code so users can change shortDescription and code that events off a priority value doesn't break by not hard coding a shortDescription value, nor hard coding the id value. Not hard coding the id (attribute?) of a domain value allows you to truncate and reload the table with different shortDescription values but the same appKey values with no recompile necessary. This was very important on one project during user testing of various sets of a three tier classification scheme. Note: It costs more to code every time the app needs to "look up" an appKey value to get an id value.]Here the id column is technically a surrogate. You can have a discussion all day about does the id column really represents an attribute of a domain value. If it does, then it should be a part of the natural key (along with shortDescription, or maybe not). If not, it's a surrogate and does that make shortDescription the natural key? However, there is a big problem if you say the id column is part of the natural key. For a lookup table, which represents a value domain, a composite natural key of id + shortDescription is bad, bad, bad. {1,'High'} and {2,'High'} isn't allowed for a value domain, but is a perfectly legal natural key! (I don't say the id column should be part of the natural key, so no problem). Anyway, I guess I consider the surrogate key generated for the value domain to be the domain value (mapping to it in code through the appKey attribute). Maybe you can't call that a surrogate key in that case. Maybe the IDENTITY column *is* the natural key! (I gotta stop here a sec and process that...) In any case, you need a unique on shortDescription. I guess this gets down to philosophy about how you model a value domain. I'd love to get Joe's opinion on how he fits in value domains in his models. I gotta go thumb through his books again... Is there a natural key for these tables? Or is a value domain a special enough animal to say there is no natural key. Rows in these tables are just tokens. I don't know. I just know I use a lot of these tables in applications I build so that I standardized on this model. I always use this model for value domains, without even thinking about it. Sun, 20 Nov 2011 01:59:08 GMTquickdrawRE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspx[quote][b]quickdraw (11/19/2011)[/b][hr]Ouch! It sounds so dirty to break 1NF! I never took well to authority. Based on the definition in the article you linked to, I'm not sure I'm breaking 1NF:"A table is in 1NF when every column has a fixed simple data-type that is understood by the database system, and every row has a primary key."How is having multiple assignment columns, or a surrogate primary key breaking that definition? [/quote]There is absolutely nothing wrong with surrogate keys, provided they are done properly (when they are not done properly - eg when there is no UNIQUE constraint on the natural key - all hell can break lose; I've had to fix some of those that I inherited).Multiple assignment columns may of may not be a violation of the spirit of 1NF: are the multiple columns each a genuine attribue of the entity, or are they an attempt to describe a complex non-atomic attribute (the set of assignments)? In the former case they are probably not a violation of 1NF (there are some purists out there who would shoot me down). In the latter, they probably are. Why are these columns nullable? Is it because the value of the corresponding attribute is not recorded in the database, or is it because the attribute may not actually be an attribute of the entity represented by the row? In the former case, there isn't a 1NF issue with the nullability (but there are plenty of anti-null fundamentalists out there who would claim that there is); in the latter case, there certainly is. As I said it isn't clear that in your particular case there is a violation of 1NF - I suspected there was, but did say that there might not be; now that I've seen your code example with separate roles clearly identified as such it seems clear that there isn't (although there are plenty of people out there who will say that even so there is).One other question: why would you want the surrogate to be the primary key in the table that defines it? That seems just perverse to me unless in a specific case there is no use for an index on the natural key. The surrogate can be be the target of foreign keys if it is subject to a unique constraint, foreign keys are not restricted to pointing to the primary key of the target table, and the primary key doesn't have to be the index you cluster on, so the only good reason for not using the natural key as primary key is that it is not useful to have an index on the natural key (which in my experience is a pretty rare occurrence). Of course there are entities whose natural key is an identifier provided somewhere in the system (not necessarily by the database - it could be generated manually; for example a payroll number could be the identifier for a person, and payroll numbers were used for that purpose long before the invention of databases; maybe current name, name at birth, date and time of birth, and place of birth could work as a natural key, but you are fairly unlikely to have that information about all your employees).Sat, 19 Nov 2011 19:35:19 GMTTomThomsonRE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspx[quote]I might have an issue with that; why should URLs have any structural relationship to the actual details of the schema? If you let that happen, then if you ever want to change the schema even a little bit you are probably going to have to change masses of application code; if you hide the base schema behind stored procedures and views, the only extras you need to change when you want to change the base schema are the stored procedures and the views - no application code; of course if the change is to provide new features, you will have some new application code - but existing application code will not be affected by the schema change.[/quote]You're absolutely right. Don't tie any part of the app directly to the schema. I'm just talking about keeping simple the code that writes the URLs. A URL for a db web app can get long as it is. Using natural keys all over the place just makes the URLs really long. I don't ever expect someone to type in a URL. But passing them in emails sure happens all the time, and you don't want a 5 line (200 character, whatever) URL in an email. You *should* use sprocs, views, etc. But the sproc needs to come up with a number to receive as a valid query, and give one for using to create a URL, and that number needs to persist in the database. That means using IDENTITY values as references to entity instances (rows, sorry for the object-speak). So why not just make the IDENTITY value be the primary key? Depending on the topology of the application&lt;&gt;db landscape, it may be hard (read: more expensive) for the app to be abstracted from the db design. If you have a local, embedded database like sqlite at the client, syncing to a central server (and in my current case, pushing and pulling changes directly between clients until the client apps connects back to internet and syncs with the central server), unifying the model between the disconnected app and the central server seems to me to make a lot of sense. [quote]Not if you have queries like "who is currently assigned to three or more ideas" and "list all the other ideas to which people assigned to this one are also assigned" or "list all the ideas to which at least 3 of this set of people are assigned". The multiple null-able columns in Idea make these a real pain to write, compared to trivially simple queries with the separate assignment table;[/quote]I agree. And if the client would pay for it, I would design it with an assignment table. The difference is more than few hours of work, so I give the pros and cons of both ways, and let the client decide, based on their own assessment of future reporting needs. If I perceive they will have more needs than the design will allow smoothly, I try and talk them into a better design. I just don't say "This is definitely how you should do it!". A lot of companies I work with are non-profits, and cash is king! Sorry if I'm wandering off topic, but I think the cost of the application and the budget today play a bigger role than we generally want to allow in these discussions. I guess this site is more about the right, or pure way to do things, for larger companies that have lots of money.This project and the assignment of people might not have been the best examples to use, as I didn't explain more how the data will be used, and how these (your) queries will never come up. I picked person assignment as an example of multiple null (initial value) columns for a record, but that might not have been the best example to use for making a general case about the use of nulls.[quote]if new ideas are the most frequently accessed ones, as seems likely, gives a net reduction in bandwidth between db and client.[/quote]I agree. If bandwidth was king, instead of cash...(boy am I setting myself for beating with all this talk which is anathema in the design community!) [quote]I'm inclined to think that your example is a violation of 1NF -[/quote]Ouch! It sounds so dirty to break 1NF! I never took well to authority. Based on the definition in the article you linked to, I'm not sure I'm breaking 1NF:"A table is in 1NF when every column has a fixed simple data-type that is understood by the database system, and every row has a primary key."How is having multiple assignment columns, or a surrogate primary key breaking that definition? -----------------------------The future scenario* I didn't share for phase two: assignment (person, to the idea) is a relation entity with its own attributes that are specific to the role:(* Could write an article or three about designing databases where the application is built in phases, and cost dictates that you don't use the mac-daddy db design from day one).Idea Assigned To:Idea (key)Person (key)Initial Assessment Due Date (date)Assessment (varchar)Rating (int)Idea Legal Reviewer:Idea (key)Person (key)Waldorf Value Sale (decimal) *Assessment (varchar)Objections (varchar)(and so on....)(* I totally made this attribute up)Sat, 19 Nov 2011 12:08:53 GMTquickdrawRE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspx@archie: The need to answer a question like "show me all records that Bob Smith has anything to do with" comes up so rarely, I don't think about it in the design. I use a Query By Example form, where users would enter a name into the Assigned To field and the SQL would search just that attribute for the given name. If the customer insisted on such a unified search (enter any criteria into a single search field), I would have to build a more complex query that added where clauses (logical OR) for each searchable attribute (with any necessary joins).If you want a single row in a spreadsheet style display (grid control) where one row lists all attributes (a column for each associated person), you have to join the main entity to the person table 5 times to get the people's names:[Columns all in the main entity][code="sql"]SELECT * FROM Idea AS i INNER JOIN Person AS p_ideasource ON i.ideaSource = p_ideasource.personId INNER JOINPerson AS p_assigned ON i.assigned = p_assigned.personId INNER JOINPerson AS p_budgetOwner ON i.budgetOwner = p_budgetOwner.personId INNER JOINPerson AS p_legalReviewer ON i.legalReviewer = p_legalReviewer.personId INNER JOINPerson AS p_financialReviewer ON i.financialReviewer = p_financialReviewer.personId INNER JOIN...snip...[/code][Separate relation table][code="sql"]SELECT * FROM Idea AS i INNER JOIN Idea_Person AS a1 ON i.ideaSource = a1.personId INNER JOIN Person AS p_ideaSource ON a1.personId = p_assigned.personId AND a1.role = 'Idea Source' INNER JOINIdea_Person AS a2 ON i.assigned = a2.personId INNER JOIN Person AS p_assigned ON a2.personId = p_assigned.personId AND a2.role = 'Assigned To' INNER JOINIdea_Person AS a3 ON i.budgetOwner = a3.personId INNER JOIN Person AS p_budgetOwner ON a3.personId = p_budgetOwner.personId AND a3.role = 'Budget Owner' INNER JOINIdea_Person AS a4 ON i.legalReviewer = a4.personId INNER JOIN Person AS p_legalReviewer ON a4.personId = p_legalReviewer.personId AND a4.role = 'Legal Reviewer' INNER JOINIdea_Person AS a5 ON i.financialReviewer = a5.personId INNER JOIN Person AS p_financialReviewer ON a5.personId = p_financialReviewer.personId AND a5.role = 'Financial Reviewer' INNER JOIN...snip...[/code][Disclaimer: never use SELECT * unless there is a good reason!]The model of the later query is obviously superior from purely extensibility and academic standpoints.So, you can take issue with the SQL complexity and extra joins, but really the app coding effort to do all the extra inserts and updates to the relation table is what I am trying to avoid (and mapping role name phrases to magic numbers the code will use to defend against role names being changed) . Instead, I'll just live with null-able columns in the idea table, since there is no effective cost, or none I've ever encountered anyway.Fri, 18 Nov 2011 12:36:18 GMTquickdrawRE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspx[quote][b]archie flockhart (11/18/2011)[/b][hr]@QuickDraw: Not sure on the multiple fields: I don't see why you'd need multiple joins to the same table with a normalised design; and if you put 5 "Person" fields in the main table, how do you search to find a person who may appear in any one of the fields ?[/quote]If he wants to return a single record to the app listing all the people assigned to work on a particular idea he actually needs a query something like this:[code] ;WITH assigned_people(A.pID, P.name) as ( SELECT A.ideaID, name FROM People P INNER JOIN Assignment A ON P.pID=A.pID )SELECT I.ideaID, I.ideaname, AP1.name, AP2.name ,AP3.name, AP4.name,AP5.nameFROM idea I LEFT OUTER JOIN assigned_people AP1 on AP1.ideaID = I.idea_id LEFT OUTER JOIN assigned_people AP2 on AP2.ideaID = I.idea_id LEFT OUTER JOIN assigned_people AP3 on AP3.ideaID = I.idea_id LEFT OUTER JOIN assigned_people AP4 on AP4.ideaID = I.idea_id LEFT OUTER JOIN assigned_people AP5 on AP5.ideaID = I.idea_id[/code]which is indeed a 6-way (actually 7-way, as I've include the People table) join. To get the (unnormalised) result to pass back to the app from the fully normalised database he needs that horrible looking (but actually harmlesds, if the indexing is right) join.Fri, 18 Nov 2011 10:49:24 GMTTomThomsonRE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspx[quote][b]quickdraw (11/17/2011)[/b][hr]And I can count on one finger the number of times over the last 15 years that I've switched RDBMS on a project). How does that approach leave the potential for getting boxed into a corner, down the road, from a modeling perspective?[/quote]In the last 20 years I've seem more such changes than I could count on 2 fingers using binary notation (indeed more than the smallest number that would allow me to say that). Since I've seen five times as many switches as you in only 133% of the time I doubt if either your frequency of seeing such changes or mine, or even the two put together and averaged, gives us a large enough sample to get any idea of how often such changes do in fact happen - - it's the usual problem, there are lies, damn lies, and statistics based on inadequate or biased samples, and here we have definitely an inadequate sample.[quote]When implementing web applications, where a record's detail is viewed using URL parameters, I'd much rather code a URL as somewhere.org/studentcourses/id/1233rather thansomewhere.org/studentcourses/student_name/smith/course_nbr/ENG101, section_nbr/2[/quote]I might have an issue with that; why should URLs have any structural relationship to the actual details of the schema? If you let that happen, then if you ever want to chacge the schema even a little bit you are porobably going to have to change masses of application code; if you hide the base schema behind stored procedures and views, the only extras you need to change when you want to change the base schema are the stored procedures and the views - no application code; of course if the change is to provide new features, you will have some new application code - but existing application code will not be affected by the schema change. And I wouldn't feel that asking the student or whoever is using the website to type a URL or to remember that 1233 is the course number for part 2 of ENG01, instead of allowing him to select ENG101 and 2 from drop down boxes, so the application might be happier with a URL like somewhere.org/studentcourses.aspx?student_name=smith&detcourse_nbr=ENG101&section_nbr=2 anyway.[quote]For my projects, I always use IDENTITY columns as primary keys on every entity (INT or BIGINT or some form of GUID as the case may dictate). I always relate entities using the primary key. If there are natural keys, they get a unique constraint. If the application's primary retrieval query (including any joins) for multiple rows needs to be sorted, and no other requirements create issues, a clustered index goes on the sort columns.[/quote]If I'm using a surrogate key, I tend to make the natural key the primary key in the defining table, and put unique and not null constraints on the surrogate (also of course for each candidate key that isn't the primary key a unique constraint on the column list plus not null constraints on each column). I believe this makes it clear to anyone who looks at the schema (my stuff is often - not always - designed for long life - other people will have to cope with it after I have moved on) that the natural primary key is exactly that - the natural primary key. If I most commonly want to deal in natural key order, I'll cluster on that (and make sure to schedule reorganising that index now an again); if I want to make sure I insert records in increasing primary key order, I'll cluster on the surrogate; and if it's clear that my most common queries will want to scan (either the whole table or a range) in an order based on some other list of attributes, I'll create an index on that list and cluster on it. I guess our only difference really is in not maing the surrogate the primary key in the defining table - it's certainly got to be the key used in all foreign key relationships. [quote]The second issue is with NULL columns. Time is money. It is too expensive to split out attributes into distinct entities for a record that has a lifecycle where, at the beginning will have many attributes containing NULL values but where, at the end of the record's lifecycle most attributes will have a domain value.[/quote]Looking at Joe's article, I don't see him saying anything nasty about nulls - he's a little less nasty about them at one point than I think is proper, but if I were writing something like that I might easily (accidentally) be a bit soft on the myself. I can't imagine how it is expensive to splt out entities except in extreme cases (but I know those extreme cases do exist - somewhere on this site is a big fun bust-up between me an one of the loony anti-null fundamentalist brigade).[quote]My current project tracks Ideas. 5 different people will eventually be assigned to work on the idea, but assignment to those 5 people will trickle in over time. Should I create a separate assignment table with a personId and roleId column? That sure is more extensible if I need to assign a 6th person. But adding a column to Idea is cheaper (on the app dev side). [/quote]Not if you have queries like "who is currently assigned to three or more ideas" and "list all the other ideas to which people assigned to this one are also assigned" or "list all the ideas to which at least 3 of this set of people are assigned". The multiple nullable columns in Idea make these a real pain to write, compared to trivially simple queries with the separate assignment table; they will also perform a lot worse than the corresponding queries on the assignment table. And if you want to get "list ideas to which Fred is assigned quickly" you will need to add five indexes to the ideas table if you don't have an assignment table - that slows down all your inserts and some of your updates, as well as making your DDL a bit more complicated.[quote]Or should I just have null-able columns in the Idea table? For economic reasons I'm doing this. No views required. No complex 6 table join (Idea to IdeaAssignment 5 times) to get all assignments for a single row select of the Idea table. No sub form on my Idea form to list assignments in a grid control, or some such. And, since I'm using Visual Studio and ADO.NET for a Win Forms app, I don't have to worry about this join getting executed a bazillion times as a form's underlying DataSet gets refreshed. I'm not getting paid to be pure in my design. I'm getting paid to solve a business problem quickly. [/quote]I guess your 6 way join consists of the five outer joins to create the alternative version of the Idea table (the one with the 5 nullable assignment columns), and yes, if you split off the assignment table and need to return that information in that form you are stuck with that six way join; but that is just one query that becomes a bit more complex, and the query is only complex because you are doing formatting (conversion from 5 rows sharing a common first column to one row with the first column and all five of what were second columns) in the database instead of in the application; and of course by not doing the join you can return a smaller record set (more rows, but less total data) to the application for any idea that doesn't yet have more than 3 people assigned to it - which, if new ideas are the most frequently accessed ones, as seems likely, gives a net reduction in bandwidth between db and client.So I think I disagree with you about where allowing nulls is appropriate - but of course it depends on all the details of the application and how often various different things are done in practise with that particular application, and we can't sensibly make hard and fast rules about it. I'm inclined to think that your example is a violation of 1NF - that this is one of those cases where using NULLs is not the best solution - look at the last paragraph of [url=http://www.sqlservercentral.com/articles/Normalization/74241/][i]my 1NF article at on this site[/i][/url] and I think you'll see what I'm getting at - but in the partcular cases where you've done it that may not be correct at all.Fri, 18 Nov 2011 10:25:06 GMTTomThomsonRE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspx@QuickDraw: Not sure on the multiple fields: I don't see why you'd need multiple joins to the same table with a normalised design; and if you put 5 "Person" fields in the main table, how do you search to find a person who may appear in any one of the fields ?Fri, 18 Nov 2011 02:46:14 GMTarchie flockhartRE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspx[quote]1) I use the VIN in my car insurance, my DMV forms and when i get a traffic ticket or apply for a parking sticker. Now that the VIN is on the dashboardinstead of the engine block2) That deponds on the role that the person plays. The two I like is their email address (validation is regular expression; validation is a ping) and the DUNS. I had to get a DUNS to consult at DELL (it is free) and it is used internationally. 3) Unh? My artwork_id had nothing to do with the physical location of the record(s) that contain. And it was much easier to [i]use[/i] than a machine generated number of some kind. It was easy to pull out the pieces by artist to compute payout, track sales, etc.Think about using a random number as the key. I still have to have the artist and the media in the table. The locator is just extra storage. The sequence (called an accession number by librarians) was derived by sorting on submission date and alphabetical title . This made a physical inventory search a bit easier. And all of this was shorter than a GUID! [/quote]Interesting, but all 3 have some issues:1 - doesn't work the same internationally. I'm in the UK, and although I know my car has a VIN, I have no idea what it is and have never explicitly been asked for it.2 - people (especially family members) may share email addresses or have none at all; and &gt;99% certainly won't know what DUNS is. People in the same family can also share all of name, address and phone number. For a club or I'd usually allocate a meaningless membership number , and we're back to IDENTITY again as the simplest way to do that ...3 - The physical location of the artwork is "in our art collection". The sequence number that you generate is essentially "the order in which we acquired the item" and is not a property of the artwork itself .I don't see the conceptual difference between an essentially meaningless IDENTITY which the database allocates when you get an item, and an essentially meaningless artwork number that you create in a more complex way.I'm also (partly) with QuickDraw on the balancing the theory with some practicalities: real-world developments need to deal with inconvenient people who don't have a DUNS, or don't know their VIN, or who turn up to start a job without their National Insurance (Social Security) number.Fri, 18 Nov 2011 02:42:13 GMTarchie flockhartRE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspxFirst off- I greatly admire Joe Celko's work. I have his books. I've conversed with him on a SQL challenge years ago. He is the world's SQL guru!Two Issues:First, I echo Tom's comments on being overly harsh on surrogate keys. I guess I just don't see the economic downside to relating entities using an IDENTITY as the primary key (an int is an int on all systems, effectively. And I can count on one finger the number of times over the last 15 years that I've switched RDBMS on a project). How does that approach leave the potential for getting boxed into a corner, down the road, from a modeling perspective?To me, if there aren't significant economic reasons against using a single unique column as the primary (surrogate) key, that leaves only academic reasons, which few of us have time or cost allowance for in the real world (disk space isn't a factor today except for the largest systems). It costs me more to code the application to work with multiple column primary keys than to use a single IDENTITY column. And *not once* have I ever exclaimed: "Damn! why oh why didn't I use the natural key instead of the IDENTITY column!" (although I have wondered why I didn't take the blue pill). When implementing web applications, where a record's detail is viewed using URL parameters, I'd much rather code a URL as somewhere.org/studentcourses/id/1233rather thansomewhere.org/studentcourses/student_name/smith/course_nbr/ENG101, section_nbr/2For my projects, I always use IDENTITY columns as primary keys on every entity (INT or BIGINT or some form of GUID as the case may dictate). I always relate entities using the primary key. If there are natural keys, they get a unique constraint. If the application's primary retrieval query (including any joins) for multiple rows needs to be sorted, and no other requirements create issues, a clustered index goes on the sort columns.If anyone has a visceral reaction to what I just said, please enlighten me how I'm screwing it up. I've made a lot of money on dozens of projects over 20 years for me to give it up without a clear sense of the economic impact to the client for initial project costs, costs relating to refactoring (which happens not a majority of the time), or costs for migration to new hardware or RDBMS (which statistically never happens). -------------------------------The second issue is with NULL columns. Time is money. It is too expensive to split out attributes into distinct entities for a record that has a lifecycle where, at the beginning will have many attributes containing NULL values but where, at the end of the record's lifecycle most attributes will have a domain value.My current project tracks Ideas. 5 different people will eventually be assigned to work on the idea, but assignment to those 5 people will trickle in over time. Should I create a separate assignment table with a personId and roleId column? That sure is more extensible if I need to assign a 6th person. But adding a column to Idea is cheaper (on the app dev side). Or should I just have null-able columns in the Idea table? For economic reasons I'm doing this. No views required. No complex 6 table join (Idea to IdeaAssignment 5 times) to get all assignments for a single row select of the Idea table. No sub form on my Idea form to list assignments in a grid control, or some such. And, since I'm using Visual Studio and ADO.NET for a Win Forms app, I don't have to worry about this join getting executed a bazillion times as a form's underlying DataSet gets refreshed. I'm not getting paid to be pure in my design. I'm getting paid to solve a business problem quickly. Thu, 17 Nov 2011 11:43:38 GMTquickdrawRE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspxI quite like most of this article, in fact most of it is excellent, but there are some parts that aren't up to the standard of the rest of it.The section headed 5NF doesn't actually discuss 5NF at all, and what it does discuss has only the most tenuous connection with 5NF. Suggesting that the move from 3NF to BCNF is done to enable constraints to be correctly represented is strange, since BCNF is a normal form in which it is not possible to represent all simple constraints. Any normalisation beyond EKNF (that's the form generated by Bernstein's algorithm, it is intermediate between 3NF and BCNF) risks losing important FD-related constraints. The article should probably mention that, so that readers are warned to watch out for the problem and avoid going to BCNF for the cases where it is known to lead to failure to enforce required constraints, instead of going blindly to BCNF for every base relation in their database (although going blindly to BCNF would of course be better that not even getting to 1NF, which seems to be a pretty common approach).Also, I think you are too harsh on surrogate keys. I agree that using them because one can not be bothered to find a natural key is insanity. I dislike the idea of using them just to save space, too - ideally the dbms should do any necessary space saving behind the scenes. But making it easier to write queries and and simplifying the ddl for expressing referential transparency constraints (by making it possible to write a single attribute key instead of a five attribute key, for example) are desirable ends, and it is perfectly sensible to use surrogate keys for those purposes provided of course that uniqueness is enforced both for the natural key and for the surrogate key in the table defining the surrogate key and that the remaining attributes of the entity or relationship of which the natural key is a key are held in that defining table, not somewhere else as if the surrogate were an entity in its own right.Mother Celko's heuristics are great - everyone should have a copy firmly imprinted in their thinking about database design. Perhaps the penultimate one could be taken a bit too literally - it implies that nothing is normalised unless it is in Ron Fagin's DKNF, but it has been demostrated that some schema and constraint combinations have no DKNF representation. Also, I think you are not quite harsh enough on NULLs - allowing them to mean different things in different columns, when all they should ever be allowed to mean is "the value that would be here if we had it isn't here".Mon, 14 Nov 2011 03:23:24 GMTTomThomsonRE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspxRows, columns, fields, records: Microsoft themselves use the terms pretty interchangeably - fire up the View designer in MS Access, or try to delete something from a table, and see what terms are used ( "fields" and "records"); do the same in SQL Server and they are called columns and rows. [quote]&gt;&gt; But I've often come across situations where there is no combination of data that is *always* unique and *always* available to the people creating the records [sic], at the time they are creating them, that can be used as a key. &lt;&lt; Give me an example. I have never had that problem and I have been at this for a few decades. I have found that 90%+ of the time there is an industry standard identifier, [/quote]Here are three (there are plenty more) :1: Your own example of identifying vehicles. Outside the car repair industry, VIN is not a useful identifier as most people don't know their VIN even if they know where to find it; licence plate may work in some circumstances but there can be complications caused by transferrable number plates. 2: What unique identifier do you use for people - let's say, members of a club ?3: When you say you created an ID for artworks based on material, artist and a sequence number, were you not just creating a slightly fancier ( and harder to use) form of "count of physical insertion attempts" ?Mon, 14 Nov 2011 02:53:06 GMTarchie flockhartRE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspxI was momentarily lost on the abbreviations FD and MVD until I rewound to their usage. I suggest a simple edit to enclose the abbreviations in parenthesis immediately following the full words."aether" i think should be "either" - though 'marriage...does not belong to aether' either."Following this rule...is the basis for the other neuritics" &lt;-- I think this was supposed to be "heuristics" but it would be amusing if you meant some form of [url=http://www.thefreedictionary.com/neuritic]nerve pain[/url]I have a mortgage, but this Shelton did not look familiar. Google couldn't make sense of it for me either."This is why EAV does not work" - we all know EAV is entity-attribute-value, but I a link to another document explaining it might be helpful. Same with CSV list; most readers know comma-separated values - but in this context might think its something else. (If SQL ServerCentral has a rule against such links, I question the value of such a restriction but C'est La Vie)The Create Table examples are sufficient for those who are already trained to "think in abstractions." I would like to have seen a grid-view of actual data next to those DDL statements. I think many of us also natively think in pictures.If a discussion on normal form(s) is going to start with relational algebra [imo: it should] then I agree with the intent to dissuade thinking of primary key as the auto-increment number column thoughtlessly added as the first field of any table. After the concept of a primary key is formally established, it might be worth noting that many schema architects will use the identity column as a crutch because it's so easy to implement. Of course that becomes an article in itself. Fri, 11 Nov 2011 08:14:49 GMTMike Dougherty-384281RE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspxI'm sitting here thinking about this article, and trying to decide how much I liked it.I think that if I hadn't had any previous knowledge on normalization and database design, I would have found this article a little too academic and difficult to follow.I would have found the layout a little confusing. From my point of view, the article would have been even better if it had contained some bullet lists with the rules of the Normal forms, as an example.On the other hand, this is an excellent article someone with a little technical background on the subject. The black holes are filled with in depth knowledge, from someone who knows what he's doing.Fri, 11 Nov 2011 05:06:06 GMTHåvardRE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspx[quote]The ignorant will declare an IDENTITY as the key and all other columns are NULL-able. They do not know that since the table's IDENTITY property is not a column nor an attribute of any possible data model, it cannot ever be used as a key. It is the count of the physical insertion attempts. It is a way for a noob to mimic the record numbers he knows from magnetic tapes and other sequential files.[/quote]All very well in theory, and in the example Joe quotes, there's a way to change the design to avoid the need to have a table where the non-null requirements of the key fields get in the way of being able to insert data.But of course, it isn't true that an IDENTITY "cannot ever be used as a key". Joe's personal view seems to be that an IDENTITY [i]should [/i]never be used in this way, but putting the statement in that way, in an otherwise educational article, may confuse "the ignorant".For me, a key is anything that lets me uniquely identify a record in a table. If that can be done with a unique social security number or equipment serial number or email address or some other combination of fields, that's fine. But I've often come across situations where there is no combination of data that is *always* unique and *always* available to the people creating the records, at the time they are creating them, that can be used as a key. In that case, either the users need to make up a unique key, or the database does. The database is better at it, and I don't really much care whether it is the count of the physical insertion attempts or a GUID or anything else.Fri, 11 Nov 2011 02:27:35 GMTarchie flockhartRE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspxGreat work Joe. My question is "why is this so late in the design process?" It is so hard (expensive and frustrating) to have to refactor something that is in production.When we decided that we were going to take everything we had learned and build a robust product suite it became obvious that the database design had to be the foundation. During the design meetings we did normalization from the start. Now when we have to add features and capabilities it is not such a big deal. We face an additional set of challenges. Mobile workers. Each carries a subseted replica of the database. Looking at the deployed fleet statistics I was plesently surprised at how small of a footprint it actually takes. Normalization is at the heart of that. It might not have happened had we not done it right up front.Wed, 21 Sep 2011 08:34:31 GMTCharles KincaidRE: Stairway to Database Design Level 9: Normalizationhttp://www.sqlservercentral.com/Forums/Topic1156633-1604-1.aspxExcellent!!!Thanks Joe,ThomasTue, 20 Sep 2011 20:32:38 GMTThomas LeBlanc