The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Tags and Database Structure

I'm in a situation where I need to create database structure to tag items and display various statistics connected to tagging. Basically, the most similar thing I need is something that You Tube is doing, having tags on video's and connecting them with categories as well.

At first, I thought that delicious database structure presented would be easiest and should be good enough. However, I'm concerned about its speed.

Now, I'm thinking more about normalized solution with 3 tables, items, tags and m:n relation between them with a slight denormalization of putting tags in one field in items table in character form just to have original order once they get inserted from the form field.

I don't like the fact that I will have to unlink all tags connected with an item once they need to be updated.

Anyway, I need your opinion. What would you do, which database structure would you choose and why?

Now, I'm thinking more about normalized solution with 3 tables, items, tags and m:n relation between them with a slight denormalization of putting tags in one field in items table in character form just to have original order once they get inserted from the form field.

I don't see why you have to denormalize like that. If you insert the tags/taggings in order you should be able to get them back the same way with an ORDER BY.

I'd go with your 3-table structure minus the denormalization since I don't particularly see a good reason for it.

Off Topic:

If you're doing this in Ruby on Rails, the acts_as_taggable plugin uses the 3 table structure and it's really easy to add to models.

What do you think about counting how many items are connected with tags? Would it be good to have a counting field in tags table? I'm concerned about COUNT function speed if table gets too large.

What are the odds of your table getting large right away? If you expect that kind of activity within your first month then go ahead and add the count field, but if this is a new site/app starting from scratch, then don't worry about it until slowdowns actually happen. Get to 3NF now, then denormalize later when you really experience bottlenecks.

I would go with the three table option, normalized. The guy's "disadvantage" about there being possible orphan tags on a deletion of an item is a little overplayed. If you are using a newer version of MySQL and the InnoDB table type then you can implement ON DELETE CASCADE for things, so the problem goes out the window. So my design would be as follows:

Now that I look at it, the only reason that you would have orphan tags is if you did not write good delete code.

It would require some conditional checking to see if the tag is still linked to some other item though. Deleting the tag itself should delete tag/item references, and deleting items should delete tag/item references too, but wouldn't get rid of tags without that checking.

Personally I don't see the big deal with orphan tags unless you have tons of them. But in that case, a quick query once in a while can remove them pretty easily:

Code:

DELETE FROM tags
WHERE id NOT IN
(SELECT DISTINCT tag_id FROM items_tags)