If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Enjoy an ad free experience by logging in. Not a member yet? Register.

If you truly have the limitations you specified and you 100%-promise-to-delete-the-entire-site-if-you-change-your-mind-kind-of-sincerity that those limitations won't changed, then a little bit of denormalization as you propose would surely not hurt.

Would it help? I tend to doubt it.

I would like to point out that if you do denormalize then there are several possible future queries that get tougher. Example:
"How many bookmarks use both tag13 and tag93?"
[WHERE (tag1=13 AND tag2=93) OR (tag1=93 AND tag2=13)]
[ugh!]

"How many bookmarks have only one tag?"
[WHERE (tag1 IS NOT NULL AND tag2 IS NULL) OR (tag1 IS NULL AND tag2 IS NOT NULL]
[double ugh!]

"How many bookmarks use exactly the same two tags as another bookmark?"
[much more code than I want to show]

All those--and more--are trivial in the fully normalized ("Toxi") case.

An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.

I would like to point out that if you do denormalize then there are several possible future queries that get tougher. Example:
"How many bookmarks use both tag13 and tag93?"
[WHERE (tag1=13 AND tag2=93) OR (tag1=93 AND tag2=13)]
[ugh!]

I'm misunderstanding why I would need that query? If a user can only search by one tag eg 13, wouldn't...

WHERE BookmarkID = $bookmark
AND Tag1 = 13 OR Tag2 = 13

...do the job?

"How many bookmarks have only one tag?"
[WHERE (tag1 IS NOT NULL AND tag2 IS NULL) OR (tag1 IS NULL AND tag2 IS NOT NULL]
[double ugh!]

...I'm thinking that if only one tag is inserted, I would always make sure it's in `tag1`, so again, wouldn't a search like above for 13 do the job?

"How many bookmarks use exactly the same two tags as another bookmark?"
[much more code than I want to show]
I'm unsure what this means - are you getting at the fact that there would be double ups of the same tag pairs, so therefore extra data? Wouldn't this be offset by the fact that a "Toxi" type system would introduce roughly the same amount of extra data.

I'm not meaning to be argumentative at all - just trying to understand the pros and cons, and pick the right system to give me the FASTEST results when querying.

I would go for the "Toxi" type system, only that if would introduce another join into my queries that already have have one or sometimes two, so I'm researching to see which would be faster - another join, or a denormalized tag system.

To round my questions out, I have other tagging conditions to add:

When searching, user gets to choose...

- one tag from condition one eg 15 (bookmark has two tag columns for this)
- one tag from condition two eg 19(bookmark has one tag column for this)
- one tag from condition three eg 2 (bookmark has one tag column for this)

(queries would be dynamically built to include or exclude ANDs based on what conditions user choose to include)

You know, a lot of this depends on how confident you are that you will *NEVER* want to do anything more than you are planning as of now.

I tend to avoid locking myself into a design just because my experience has always been that the customer (or my boss...same thing, really) comes back 3 months (or 3 years) later and says, "You know, I really want to also do ..."

And if I designed the DB right in the first place, I can say "trivial, let me just change this parameter." But if I did it wrong, then typically it means making changes to the entire DB schema. Which, even if it's not hard, can lead to hard-to-find errors weeks or months after then changes.

You seem so supremely confident that your specifications will *NEVER* change that I would say it doesn't much matter which way you go. I do not think you will see a significant performance difference in this particular case between the fully normalized and the somewhat de-normalized versions. But the only way to really know is run benchmarks.

An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.

Users who have thanked Old Pedant for this post:

I would suggest that unless the benchmark tests indicate that the normalized version may have performance issues AND that the other version is SIGNIFICANTLY faster then there is no reason for undoing any of the normalizations and you should stick with the normalized version.

Never heard of the "Toxi" system - whatever it is has nothing to do with normalizing a database. Normalizing a database is called "Database Design".

To normalize a database you need to design it so that:

First normal form (1NF)
Sets the very basic rules for an organized database:
Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Your alternative breaks this first and most important rule.

Second normal form (2NF)
Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
Create relationships between these new tables and their predecessors through the use of foreign keys.

Third normal form (3NF)
Remove columns that are not dependent upon the primary key.

Boyce-Codd Normal Form (BCNF)
Every determinant must be a candidate key.

At this point you have taken care of functional dependencies - it is the minimal normalization that you should apply.

Fourth normal form (4NF)
Removes multi-value dependencies

Fifth normal form (5NF)
Every join dependency in the database is implied by the candidate keys

At this point you have taken care of all join dependencies. This is usually as far as you need to go with normalization

Sixth normal form or Domain/Key Normal Form(6NF/DKNF)
decompose relation variables to irreducible components so that changes in values can be tracked over time

Are any of the fields in each table unique apart from the id? If so they you might make that the primary key and get rid of the id (unless the id has some meaning apart from providing a unique value to use as the key).

Users who have thanked felgall for this post:

I finally got around to looking at the URL you gave for the "Toxi" scheme.

In case you didn't notice, "Toxi" is just the name (or handle) of a *PERSON* who submitted that scheme to the author of that page.

The page is, quite frankly, ridiculous.

Just for starters, he doesn't address the *HUGE* disadvantage that his "SQLlicious" scheme has. To wit: It CAN NOT take advantage of INDEXES. (It could do so using full text indexes, but those have problems of their own, such as finding matches where none exist, because they treat various forms of a word as the same as the main word.)

It's essentially a useless scheme. ANY good database design book will warn you away from it.

You seem enamored of the second scheme there, the "Scuttle" solution. I've already given my opinion of it. ANY CHANGES you may make in the future *WILL* require changes to your database schema. And to all the code you use to access the database. And... Again, if you really are 100% confident that you are never never going to change how things work, then this method could work.

But I 100% agree with Felgall: The *ONLY* excuse for using this is if you find, via EXTENSIVE testing and benchmarking, that there are potential problems in the normalized scheme *AND* the unnormalized version performs significantly better. To me, here "significantly" would need to be at least one-third faster and I'd rather see at least one-half faster. Again, your tests and benchmarks should include variations in the indexes used. That is, make sure you are using the best indexes you can for MySQL.

An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.

all three columns in `bookmark` are together a unique key
(itemVersion is for display data, it doesn't actually represent another item)

So from what you are saying...

- if the tag names are small enough, you could replace tag_id in tagmap with name (wouldn't this render the tag column useless?)
- get rid of id in tagmap, and make bookmarkID/name the primary key in it

@Old Pedant

I trust your call, but am trying to get my head around how future changes would be more of a hassle with the "scuttle" solution?

For example, to add in a new tag at a later date, wouldn't I need to update MORE tables?

I'm a little confused, as, unless I'm misunderstand, Felgall's suggestion seem to revert the system back to a "scuttle" one, which is against what Old Pedant suggests.

What I have been trying to say is that if the data already contains a relatively small field that contains a unique value for each record then you can use that for the primary key and not add an artificial ID field to use as the key. You should only add artificial fields as the key when there is either no natural key that can be used or the natural key is too long to be practical.

A table can quite reasonably hold just one field where it is a list of those values that are valid in a field in another table.

I don't think I have any id fields at all in any of the databases I have designed (I am aware of a few situations where they would be needed but have never actually designed any system that has the rare requirement of needing one).