Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

Then you need to do stuff in your application code to split that comma list into individual numbers, then query the database seperately:

SELECT DealCategory.Name
FROM DealCategory
WHERE DealCategory.DealCategoryID IN (<<that list from before>>)

This design antipattern stems from either a complete misunderstanding of relational modelling (You don't have to be scared of tables. Tables are your friends. Use them), or a bizarrely misguided belief it's faster to take a comma-separated list and split it in application code than it is to add a link table (it never is). The third option is that they're not confident/competent enough with SQL to be able to set up foreign keys, but if that's the case they shouldn't have anything to do with the design of a relational model.

SQL Antipatterns (Karwin, 2010) devotes an entire chapter to this antipattern (which he calls 'Jaywalking'), pages 15-23. Also, the author has posted on a similar question over at SO. Key points he notes (as applied to this example) are:

Querying for all deals in a specific category is rather complicated (the easiest way to solve that problem is a regular expression, but a regular expression is a problem in and of itself).

You can't enforce referential integrity without foreign key relationships. If you delete DealCategory nr. #26, you then, in your application code, have to go through each deal looking for references to category #26 and delete them. This is something that should be handled at the data layer, and having to handle it in your application is a very bad thing.

Aggregate queries (COUNT, SUM etc), again, vary from 'complicated' to 'almost impossible'. Ask your developers how they'd get you a list of all categories with a count of the number of deals in that category. With a proper design, that's four lines of SQL.

Updates become much more difficult (i.e. you have a deal that's in five categories, but you want to remove two and add three other ones). That's three lines of SQL with a proper design.

Eventually you'll run into VARCHAR list length limitations. Although if you have a comma-seperated list that's over 4000 characters, chances are parsing that monster is going to be slow as hell anyway.

Pulling a list out of the database, splitting it up, and then going back to the database for another query is intrinsically slower than one query.

TLDR: It's a fundamentally flawed design, it won't scale well, it introduces additional complexity to even the simplest queries, and right out-of-the-box it slows your application down.

I wasn't entirely sure whether they wanted to have a many-to-many relationship between Deals and Categories, or some sort of heirarchy of Categories. Either way, it was a sideline to the main point, that being comma-delimited fields instead of a link table are a bad idea.
–
Simon RighartsMay 14 '12 at 22:47

That's actually a good design if you only need to query for the categories for a given deal.

But it's terrible if you want to know all the deals in a given category.

And it also makes it really difficult and error-prone to do anything else -- like updates, counts, joins, etc.

Denormalization has its place, but you have to keep in mind it optimizes for one type of query at the expense of all others you might make against the same data. If you know you will always be querying in one pattern, then it might give you an advantage to use the denormalized design. But if there's any chance you could need more flexibility in the types of queries, stick with a normalized design.

Like any other form of optimization, you need to know what queries you are going to run before you can decide if the denormalization is justified.

Do you really think a string with comma separated child IDs is helpful? I mean, the application had to read first, then parse the IDs and query all the children, like select * from DealCategories where DealId in (1,2,3,4,...). You have more experience, regarding database design, than me, so maybe you have good reason in some cases for such "extreme tuning" in very specific cases. My only idea to justify this is a very high select load on Deal/DealCategory. This looks to me much like some outsource team without any DB design knowledge, beyond creating tables, created it.
–
Erik HartJan 1 '14 at 14:48

1

@ErikHart, this is denormalization, and it can be helpful, but my point is that it depends entirely on the queries you need to run. You're right that denormalization makes all queries perform worse, except the one query it optimizes for. If you only need to run that one query, and you don't care about the other queries, it's a win. But these are rare cases, because typically we want flexibility to query the data in a variety of ways.
–
Bill KarwinJan 1 '14 at 16:36

1

@ErikHart, if that outsource team was given project specifications that included only one query against this data, they could have designed an optimization for that specific query only. In other words, "you asked for it, you got it." But the outsourcing provider has no reason to plan for future uses of the data -- they implement the application to the letter of what's written in the spec.
–
Bill KarwinJan 1 '14 at 16:40

It's also absolutely no speed gain, since the tables are to be linked in the database. You have to read and parse a string first, then select all categories for the "Deal".

The correct implementation would be a junction table like "DealDealCategories", with DealId and DealCategoryId.

Bad hierarchy implementation?

Also, an FK in DealCategories to another DealCategory looks like a bad implementation of a hierarchy/tree of DealCategories. Working with trees through a Parent ID (so called adjacency list) relation is a pain!

Check for Nested Sets (good to read, but hard to modify) and Closure Tables (best overall performance, but possibly high memory usage - probably not too much for your DealCategories) when implementing hierarchies!