Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

Consider we have a large set of statistical data for a record; e.g. 20-30 INT columns. Is it better to keep the entire set in one table as they all belong to a record OR creating a another table connected with a one-to-one relationship.

The advantage of the former is to avoid JOIN and have a quick access to all statistical data for the corresponding record.

The advantage of the latter is to keep the column tidy. The first column is read-intensive, and the second write-intensive. Of course, I think it has no significant effect on the performance, as I use InnoDB with row-level blocking.

In general I want to know if it is practical useful to separate different sets of data for a single record?

'Normalized' means first normal form (1NF) and is a fundamental requirement of the relational model. 'Fully normalized' means 5NF or higher. Your proposed 'one-to-one relationship' table has a better chance of being in a higher normal form (possibly even in 6NF) than your current one because it is decomposed! What normal forms does your existing table satisfy?
–
onedaywhenMar 22 '12 at 14:29

@onedaywhen Like many others I do not follow normalization step by step, as sometimes de-normalization is also helpful. In general, the entire database should have a normalization level between 3NF - 5NF (I always have problem with 4NF!)
–
AllMar 22 '12 at 15:32

2 Answers
2

If it fits within the rules of normalization, then 1:1 relationships can be normalized (by definition!) - In other words, there is nothing about 1:1 relationships that make it impossible for them to obey the normal forms.

To answer your question about the practicality of 1:1 relationships, there are times when this is a perfectly useful construct, such as when you have subtypes with distinct predicates (columns).

The reasons you would use 1:1 relationships depend on your point of view. DBAs tend to think of everything as being a performance decision. Data modelers and programmers tend to think of these decisions as being design or model oriented. In fact, there is a lot of overlap between these points of view. It depends on what your perspectives and priorities are. Here are some examples of motivations for 1:1 relationships:

You have some subset of columns that are very wide and you want to
segregate them physically in your storage for performance reasons.

You have some subset of columns that are not read or updated
frequently and you want to keep them apart from the frequently used
columns for performance reasons.

You have some columns that are optional in general but they are
mandatory when you know that the record is of a certain type.

You have some columns that logically belong together for a subtype
and you want to model them to fit well with your code's object model.

You have some columns that can only apply to some subtype(s) of an
entity super-type, and you want your schema to enforce the absence of
this data for other subtypes.

You have some columns that belong to an entity but you need to protect
these particular columns using more restrictive access rules (e.g. salary
on an employee table).

So you can see, sometimes the driver is performance, sometimes it is model purity, or just a desire to take full advantage of declarative schema rules.

You have some subset of columns that are very wide and you want to segregate them physically in your storage for performance reasons. How does segregating them improve performance (assuming the columns are always accessed every time the main table is)?
–
GiliAug 21 '14 at 12:37

@Gili - If your assumption were true then this case wouldn't apply. Segregating large and infrequently needed columns allows more rows to fit on a page, thereby allowing faster retrieval of the commonly used columns. Obviously reading the segregated columns along with the commonly used columns would be slower since a join is necessary.
–
Joel BrownAug 22 '14 at 12:03

I want to segregate along commonly used columns for design reasons (separation of concerns, increased code reuse). Has anyone posted an estimate of the cost of such joins? Are they negligible or something I should worry about long-term?
–
GiliAug 22 '14 at 14:38

@Gili - re: the cost of joins: There is no right answer to that question aside from "it depends". Join cost is impacted by many factors. Whether they are negligible is even harder to answer, because that is ultimately subjective. The best way to answer you question is to mock up some test data and do volume testing. Try it both ways and see if you can tell the difference using real world data volumes (whatever that implies for your application).
–
Joel BrownAug 23 '14 at 12:56

I did, and got surprising results: dba.stackexchange.com/q/74693/4719 I admit this isn't a typical example of normalization, but it doesn't highlight that JOINs are (still) very expensive.
–
GiliAug 24 '14 at 13:42