If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Hi everyone!
I have what might seem like a simple/stupid question, but I'm new so please bear with me.
I'm trying to build a products database with the following relationships for a web application. Each product will have certain features that can change fairly rapidly and I'm wondering how to store the features information. Basically I have two ideas as shown below:

In this setup we have one Product_Features table with all data in it which means if I want to find all products with two features I have to do a join as follows (note this may be erroneous thinking on my part):
SELECT Product_ID FROM Product_Features
WHERE

Product_ID IN (SELECT Product_ID FROM Product_Features WHERE Feature_ID = 2)

Disadvantages: This can get really redundant since some of the products will have up 15 features and we might want to search on all of them. It just seems like bad planning to have to join a table to itself at the design phase.
It also gets tricky if a feature has anything other than numerical values. For example if a product is available in several colors.

Product_Features(Product_Feature_ID, Product_ID[FK], Feature_1, Feature_2, Feature_3, etc (Not sure if this is necessary, but hold information as to whether a certain product has a feature with only yes or no values.

..... (Lots of different individual feature columns)

In this set up we break each feature into its own table for example, the Product_Color table would look like this (comma delimited, bold=column names, ends with ):Product_ID[FK], Product_Red, Product_Black, Product_White
1, Y, Y, N
2, N, Y, Y
... etc

Disadvantages: If the features are changing fairly regularly the database structure would also be constantly changing (which I'm told is bad).

QUESTIONS: Performance will also be very important, and I really don't know which would do better performance wise.

NOTES: I honestly prefer option 2, it seems cleaner and more efficient to me, but I really want it to be right. I learned most of what I know about databases and design while I worked for a company that had a horrible data model (eg there were tables that had columns called date_added, date_modified and date_deleted that would all update when a record was modified, empty code tables, and absolutely no documentation -- most of the time to find data you had to use the testing application to reverse engineer data ). Since working there I am now terrified that I will fall into the same mistakes that they did in order to get things working.

this approach also allows you to select products which have a minimum number of features... for example, the IN list could contain 6 feature ids, and the HAVING clause could specify COUNT(*) >= 4

you can't do that easily at all with joins

Thank you so much!!! This makes sense.

I have one more question, how do I deal with features that are not numerical, such as color? Would I have to add a row for every color in the features table?
I guess i could add a feature group column which will allow me to group feature id's by type such as color, style etc?

Depends. I'd like to know what "features" are in this model. Are they really a collection of the same type of attribute, same data types, same constraints and logic that apply in each case? If so, then option 1 looks like it could be on the right track. Not sure why the min and max columns are necessary though.

If however the features are truly different things with different data types or constraints or other logic then they are truly different attributes of a product. In that case option 2 looks much better and I would try to avoid option 1.

In either case, don't be afraid of change because change is good. If a business isn't changing then it is dying. Data models are a representation of some portion of the business and therefore it is natural that data models must change too. The only question is how you manage change, not how to avoid it.

Depends. I'd like to know what "features" are in this model. Are they really a collection of the same type of attribute, same data types, same constraints and logic that apply in each case? If so, then option 1 looks like it could be on the right track. Not sure why the min and max columns are necessary though.

The reason that I preferred option 2, was because not all features will have the same data types. I didn't post the specifics is because of the nature of the products and I was afraid of getting banned. Here is an example that hopefully wont offend anyone.
One feature is speed, not all products will have a speed setting, some will only have one speed and others may have 3. However on some products the speed functions are customizable and the min and max are to reflect that. For Example
If a product has a max feature speed of 0 then it doesnt have a speed feature
If has its min and max equal and not equal to 0 then speed is not customizable
If it has its max > the min then it is customizable
(guess there could be a Product_Customizable table that could store that information)

On the other hand the unit, Max and Min mean nothing to a feature like color. In that case there would have to be a different feature id for every possible color. (note: I'm thinking that we would have to have Feature_Group Table and add a Feature_Group_ID to the features table that way we can always tell when a feature has a group associated with it. Which may solve the speed type problem)

I hope I'm making sense. I'm getting to the point where I'm thinking aloud but it helps to have a sounding board.
THANKS SO MUCH FOR THE HELP!