If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Multiple Values in One MySQL Column?

I have a site with a product table in a MySQL Database. The basic structure, simplified for example, is as follows:

Item Name | Item Description | Color

My problem is, some items have multiple color choices. For example, I might have something similar to the following in my Database:

Item One | This Is Item One | Blue
Item Two | This Is Item Two | Green
Item Three | This Is Item Three | Blue, Purple, Green
Item Four | This Is Item Four | Green, Red

There are several colors that need to be included for items, but I am unsure how to store them. I need the user to be able to select the color from a drop down next to the item, which will then fill in a PHP generated PayPal button.

Multiple data elements in one column is a no-no (or "non-normal" if you want to get more technical ).

You probably want to create a separate table which would be a many-to-one relation to the item table. It would, at a minimum, have two columns: item_id and color. You could even have a color table that would have an auto-increment key and the color name, in which case the color relation table's columns would be item_id and color_id. You then would just use a JOIN or two in your queries to get the item and its color.

I'm still wondering, what would the structure of the Color table be? I think I've been guilty of the "Spreadsheet Syndrome" in the past, as the article you posted describes, and now "Normalizing" is throwing some learning curves my way, haha.

Also, how would I go about joining the tables? If I have multiple colors for some items..

I'll continue researching, but any additional information you could provide would be greatly appreciated.

Before you spend hours normalizing your database and writing complex queries to deal with normalized data, you need to determine whether the intended normalization is beneficial. (sometimes its not) People can get carried away with normalization. And, before you can normalize your data in a beneficial way, you need to determine which of your fields or attributes are really also objects.

Some questions questions to consider:

Does each color (like "blue") have it's own properties? (that you need to store)

Will you ever need to query based on the a single item from the list of attribute X? (do you ever need to select all products where color contains blue?)

Multiple data elements in one column is a no-no (or "non-normal" if you want to get more technical ).

... points to an important question: Is it a list of colors or string of colors? To illustrate the point, consider the Item Name field. Is it a list of characters or a string of characters? Or, is it a list of words or a string of words? To assist in making this decision, refer to previous list of questions ...

Now, in your case, NogDog is probably right. Color may best be reference to another entity. But, before you make the design decision--be sure. And, if you're just getting into database design and normalization, you need to take care to think in terms of how you intend to query the database and think about the data. You can't correctly normalize without first philosophizing.

And then, after crafting the ideal image of your data, you need to reconcile it with efficient operation. There's no sense thinking of colors as a separate entity if it only serves to complicate your queries and/or force you to pull more rows (which you'll ultimately recompile into a single record).

... sorry. I don't mean to sew confusion--even less do I mean to contradict NogDog. He's absolutely right. But, I also mean to spread a word of caution when normalizing data: it can become overzealous and detrimental very quickly ...

you need to determine whether the intended normalization is beneficial. (sometimes its not)...

Care to venture an example? I can't think of any situation in which you'd want to be using text fields with csv data, or loads of order1, order2, order3 etc columns in a table, or directly writing things like colour names into a field which could just hold a simple indexed integer foreign key.

I would be very wary about following advice which disagrees with the importance of normalizing database tables. To quote the article posted above, "with this goal come benefits in the form of reduced redundancies, fewer anomalies, and improved efficiencies" - not sure why anyone would advise against that?

I'd like to semi-firmly disagree (if any experts would like to shut me up, please do so) with the idea that multiple values in a single field is a bad idea. I know, most everybody here said it's 'usually' a bad idea or something along those lines, which I completely agree with.

But what about the 'set' type of a column in mysql? It was created for a reason, wasn't it? I've recently worked on a project in which the client gathers large amounts of data about lots of different people in which they can dynamically create fields which are, among other things, drop down, or multiple select fields. I found set and enum fields very useful for this type of project...

Last edited by aj_nsc; 03-11-2010 at 03:06 PM.

I've switched careers...
I'm NO LONGER a scientist,
but now a web developer...
awesome.

Before I head off to my meeting, I'd like to quickly point out that I'm not advocating widespread denormalization here. I wasn't originally intending to advocate denormalization at all. My point more along the lines of taking care when distinguishing between what is and is not an entity: entities are independent of other entities in some significant capacity.

A list of colors ... well, each of those colors may be a normalizable entity. On the other hand, maybe it's just a string ... Again--not saying it IS just a string in this case, just that it could be.

And, even if it IS a set of entities, you may want to denormalize for the sake of efficiency. I mean, are you really worried about having to perform a mass-update to change all occurrences of blue to ... not-blue?

... just ... don't get zealous. Application performance suffers under two conditions: under-zeal and over-zeal. Don't zeal too much in either direction. that's all ... let's not blow **** out of proportion here.

I think in large part it comes down to whether you may be (now or in the likely foreseeable future) wanting to search the database based on one of those individual values. If so, it should be more efficient to use a separate relational table with individual values, allowing an index on that column to help optimize searches and sorting. If you clump them into one field, say "blue,red,green" and then want to search for all items which include "green", the DBMS will likely have to do a full table read each time in order to parse each field based on some sort of regexp, LIKE, or other parsing mechanism.

If, on the other hand, all you expect to do is extract the colors and display them in some manner (say, a select form object to choose which color to purchase) and you have no reasonable expectation to use it in any other manner where you would need to search/sort/group by color, then I would have no real problem with keeping it "clumped" (though probably I'd still separate it into another table, as it's really not all that difficult one you get the hang of doing simple JOINs, and then I'm ready for any future enhancements where I might decide I need it in a separate table).

I'm still wondering, what would the structure of the Color table be? I think I've been guilty of the "Spreadsheet Syndrome" in the past, as the article you posted describes, and now "Normalizing" is throwing some learning curves my way, haha.

Also, how would I go about joining the tables? If I have multiple colors for some items..

I'll continue researching, but any additional information you could provide would be greatly appreciated.

I'd like to semi-firmly disagree (if any experts would like to shut me up, please do so) with the idea that multiple values in a single field is a bad idea. I know, most everybody here said it's 'usually' a bad idea or something along those lines, which I completely agree with.

But what about the 'set' type of a column in mysql? It was created for a reason, wasn't it? I've recently worked on a project in which the client gathers large amounts of data about lots of different people in which they can dynamically create fields which are, among other things, drop down, or multiple select fields. I found set and enum fields very useful for this type of project...

Using ENUM is not the same thing - ENUM fields are stored internally as separate, integer values - data is not actually stored as text in the same way as stuffing csv data into a field. The performance issues alone with the CSV approach is enough to make it a bad idea, and that's before you even get in to the other implications such as future expansion and changes. It's just a bad idea, good, normalized database tables are always best.

The performance issues alone with the CSV approach is enough to make it a bad idea, and that's before you even get in to the other implications such as future expansion and changes. It's just a bad idea, good, normalized database tables are always best.

There's a lot more debate out this in scholarly settings than you're letting on (or are possibly even aware of). Denormalization is a very common, well-accepted practice. If you're going to insist that it's not acceptable, you need to justify the position.

Please explain why it is always inefficient to store comma-delimited data in an table column. Take various access patterns into account.

For instance, I would argue that storing colors as a comma-delimited string lowers the complexity of accessing a produce from O((M/N)*Log(M)*Log(N)) to (Log(M)) where N is the total number of color assignments used (relationships between a color and a product) and M is the number of products in the products table.

Log(N) is the cost of the product lookup, assuming a BTREE index.
Log(M) is the cost of the color lookup, assuming a BTREE index.
M/N is the average cost of joining the product to its colors.

So, by denormalizing the data, you're looking at saving a M/N Log(N) lookups, each of which will be a higher cost than the single Log(M) lookup needed for the denormalized form. The cost is the inability to efficiently search, update, or remove single colors globally across all product rows. Of course, making those processes efficient requires yet another table, which introduces another lookup ...

So, if you don't need to do global color updates on a regular basis or search for products by single colors, it's probably more efficient and sensible to store them in a single field. For this application, I'm assuming the ability to do these things needs to be present, making this approach incorrect--but you cannot generalize and say that this a universally incorrect approach. That is not only incorrect, but irresponsibly mis-informative!!!