The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

I don't think that would work too well. A product could potentially have over 10 multiple choice attributes, each with potentially over 10 values - assuming they all modify the price in some way, you start hitting 1000's of possible combinations (in my solution it would only store the 100 possible values, not all the combinations). Pre-calculating all of the combinations and storing them in a table is incredibly inefficient - but on that note, if you're dealing with SKU's that's essentially what you have to do anyway. But on top of that, how would you know which product/price goes with which combination of attributes?

I just see your solution as being more complex to achieve a less flexible result - in fact I don't think it would work at all when you consider variable pricing.

Could you also explain why you think putting prizes in the product table wouldn't work?

I would think that if you have a green T-shirt in size XL, then that's one product with one prize. So that T-shirt has an unique Id and a unique prize. Put the Id, product name and prize in one table. Following normalization rules, put the attributes in different tables with foreign keys.

One way or the other, these 1000's of possible combination's will be there. Whether you put them in one table or in 10 tables. I don't know the specific requirements of the project of the original poster. Depending on those, you would choose a very normalized approach or otherwise a less normalized one. I'm also not familiar with the prizing scheme used. Probably depending on that you could choose a different approach

@allspiritseve: yes, I can see how an AEV approach makes the search queries a bit more difficult. Maybe looking into how different tagging schema's are used in the bigger web apps like Delicious would be interesting.
A somehwat older article, but still interestinghttp://www.pui.ch/phred/archives/200...nce-tests.html

Could you also explain why you think putting prizes in the product table wouldn't work?

I would think that if you have a green T-shirt in size XL, then that's one product with one prize. So that T-shirt has an unique Id and a unique prize. Put the Id, product name and prize in one table. Following normalization rules, put the attributes in different tables with foreign keys.

So you're basically saying, with your solution, you can't have attributes change the total price of an item (and instead, need to add a completely different product for each price combination).

The great thing about my solution (simply as an example), is that it's very user friendly. They don't need to sort through 100 different t-shirts to try and find the one with the right, size, colour and material. Instead, they can just find the t-shirt product, and select exactly what they want out of the available options. It creates a much simpler, friendlier user experience.

So you're basically saying, with your solution, you can't have attributes change the total price of an item (and instead, need to add a completely different product for each price combination).

No. Attributes do change the prize of the product. The only difference is in how to store the different "products/combination's" in the db.

Originally Posted by Wardrop

The great thing about my solution, is that it's very user friendly. They don't need to sort through 100 different t-shirts to try and find the one with the right, size, colour and material. Instead, they can just find the t-shirt product, and select exactly what they want out of the available options. It creates a much simpler, friendlier user experience.

The user friendliness has nothing to do with it. I thought we were talking about the model and database schema, not the user interface. I assume that in the application to be build the user interface and how it looks and functions is in a separate layer and completely independent from the back end.

Allspritieve: did you make any progress on your problem or find out anything new? Have you also asked on one of the db-forums?

One way or the other, these 1000's of possible combination's will be there.

You don't want to be storing all the possible combinations though. You only want to be storing the available options which result in all the different combinations.

Originally Posted by matthijsA

No. Attributes do change the prize of the product. The only difference is in how to store the different "products/combination's" in the db.

Could you please demonstrate that then, as so far you haven't shown how that's achieved.

Originally Posted by matthijsA

The user friendliness has nothing to do with it. I thought we were talking about the model and database schema, not the user interface. I assume that in the application to be build the user interface and how it looks and functions is in a separate layer and completely independent from the back end.

It has more to do with it than you obviously think. How your store your data can have a big impact on the different ways the data can be displayed.

By the way, can I ask why you refer to the price as the "prize" (if that's what you're referring to). I don't mean to have a go at you, but I want to make sure that we're both talking about the same thing here.

Again, I really can't say which db schema is "better". That will depend on so many specifics of the project I don't know. Like how a "product" is defined, does it have a unique number to identify it, on what is that number based, what attributes are possible, how are they going to change in the future, what technical limitations are there, which databases are available, etc. I am just suggesting one possible scheme.

It has more to do with it than you obviously think. How your store your data can have a big impact on the different ways the data can be displayed.

Not sure I understand what you mean. In a normalized db schema, it's possible to get out whatever you want joining tables together in various ways. Of course, depending on which schema you choose the queries you have to write to get something out of the db might be more or less complicated, but I'll leave that to the programmers to figure out

By the way, can I ask why you refer to the price as the "prize" (if that's what you're referring to). I don't mean to have a go at you, but I want to make sure that we're both talking about the same thing here.

This example demonstrates a lack of dynamic pricing (or a flawed model if anything). What you're doing is linking an attribute value to a product with a set price, which will fail as soon as you add two separate attributes to a product, which have a conflicting price.

I'll rewrite your example using the same schema, so you hopefully understand what I'm saying.

Now, assuming you would group the products by title (so Shirt is displayed as a single product, not as 5 - you'd have to do this, as otherwise you wouldn't have any sort of variable pricing), what would the price of the shirt be if you take into account what's in the Product2Attributes table?

Let me explain what's happening here. Seeing as though all the shirt products would essentially be treated as a single product in the front end, you'd get a conflict when someone tried to order a green shirt of any size. That's because the green colour value is linked to the $22 shirt, but all the shirt sizes are linked to the $20 shirt. So how much does the shirt cost? As you can imagine, it gets even worse when you introduce more than two attributes.

Do you see my point matthijsA?

Originally Posted by matthijsA

Not sure I understand what you mean. In a normalized db schema, it's possible to get out whatever you want joining tables together in various ways. Of course, depending on which schema you choose the queries you have to write to get something out of the db might be more or less complicated, but I'll leave that to the programmers to figure out.

As a simple example. If you stored a large green shirt, small red shirt and medium orange shirt as separate products in the database (so as separate rows in the products table), you would be forced to display those products as three separate items. If however you stored just a shirt as a product, and all the different attributes (size, colour) were stored in an attributes table, you'd be able to display only one item, with drop-down menus to select from the available attribute values. In both scenarios, you don't really have to many other options. The schema of the database determines how you can display the data.

No, I'm sorry, I don't understand it. Maybe the confusion is caused by the fact that you think that an attribute is linked to a price. But it's not. A single, unique product has only one single price.
(also, don't pay attention to the exact numbers in my example schema, they are not correct)

What is a single product? Think about a real physical t-shirt. That single t-shirt has several attributes. Color, size, material and price. Of all those attributes there's one which will always be there for every product: price. So we can leave that attribute in the db table products.

For all the other attributes we face the problem that each product can have one or more attributes. A t-shirt has maybe 3 attributes, a bottle of wine has maybe 10 different attributes. So we can't put them all in the same products table, as that would mean you would have a) too many columns b) a lot of repeating data

So therefore you create a separate table for the attributes. And a look-up table to link each single product to more attributes. One product has one price but many attributes.

.. The schema of the database determines how you can display the data

No it does not and should not. By joining tables, having GROUP BY closes, or doing multiple queries, you can get anything you want out of the db. Sure, it can be difficult sometimes, but it's always possible

Maybe I'm the only one who read the original post. allspiritseve said that each attribute was to have a different price. I provided a solution that allowed any number of attributes and values to be combined, while at the same time, allowing attributes to dynamically change the price of a product depending on the values that were selected.

This whole time I've been trying to prove to you that your solution didn't support such dynamic pricing, and only supported one price altering attribute (when there could potentially be many). However it's clear now that you weren't trying to achieve such a thing, which makes me ask what relevance your solution has to the original post, as the whole point of this thread I thought, was about linking prices to attributes.

On that note however, I found it odd that allspiritsteve said that your EAV-like solution was "the most flexible solution", which makes me wonder if even he knows what he's after. Maybe he didn't realise the fact that your schema only support a single price altering attribute?

No it does not and should not. By joining tables, having GROUP BY closes, or doing multiple queries, you can get anything you want out of the db. Sure, it can be difficult sometimes, but it's always possible.

I completely disagree. You tell me then how I could display a t-shirt product, with drop-down boxes for the available attributes (colour, size, etc), if all the different t-shirts were stored as separate products with different titles (ie. Large Green T-Shirt, Small Orange T-Shirt, Large Orange T-Shirt).

The database schema (or how the data is stored, take your pick as to which wording suits you best) determines to some extent, the format the data can be displayed in. If you're storing all product variations (colour, size) as separate products, then it's virtually impossible to provide an interface which offered a number of attribute selections (drop-down menus, radio buttons, etc).

The same data, stored in different ways, alters how that data can be displayed. It's simple.

If you're storing all product variations (colour, size) as separate products, then it's virtually impossible to provide an interface which offered a number of attribute selections (drop-down menus, radio buttons, etc).

Try displaying those 5 separate products to the user as a single item, with multiple multi-choice attributes. Clearly, in this circumstance (which is a more extreme example - intended that way to get my point across), how the data is being stored determines the ways it can be displayed.

You just proved my point. You had to change the schema of the database in order to change how the products can be displayed. Proof that how the data is stored (despite being the same data), directly affects how the data can be displayed to the user.

Maybe I'm the only one who read the original post. allspiritseve said that each attribute was to have a different price. I provided a solution that allowed any number of attributes and values to be combined, while at the same time, allowing attributes to dynamically change the price of a product depending on the values that were selected.

This whole time I've been trying to prove to you that your solution didn't support such dynamic pricing, and only supported one price altering attribute (when there could potentially be many). However it's clear now that you weren't trying to achieve such a thing, which makes me ask what relevance your solution has to the original post, as the whole point of this thread I thought, was about linking prices to attributes.

On that note however, I found it odd that allspiritsteve said that your EAV-like solution was "the most flexible solution", which makes me wonder if even he knows what he's after. Maybe he didn't realise the fact that your schema only support a single price altering attribute?

of course each t-shirt with different attributes has a different price. In my example solution that is also the case. I don't know why you think that isn't so.

Let us start with one single product. A t-shirt. Which is: green, large, cotton and costs $20. we fill in the db tables and end up with this:

So you see that we end up having one row in the products table for that shirt, which costs $20. We end up having a couple of rows in the Attributes table and a couple of rows in the product2attribute table, linking that one product to a few attributes

Now we add another t-shirt, this time also green and cotton, but size medium. Now we end up with:

On that note however, I found it odd that allspiritsteve said that your EAV-like solution was "the most flexible solution", which makes me wonder if even he knows what he's after. Maybe he didn't realise the fact that your schema only support a single price altering attribute?

Look, if all you're going to do is bash someone else for offering an idea, I'd rather you posted elsewhere.

The fact is, both of your solutions are EAV-based. Both are flexible for storing data, but are detrimental for searching. We've established that Magento uses EAV, and we've established Magento is horribly slow. Therefore, I'm trying to find out what schema is NOT horribly slow when doing searches, while still allowing for as many attributes and values of attributes as needed.

Have you measured performance problems for real already? (maybe in a test db full of dummy data)
I don't know anything about the inner workings of magento so can't tell why it would be slow. There could be more reasons then the db schema. And maybe, if there are performance problems caused by the db schema, there are other solutions to handle those (caching, memcache, seperate search db's, etc)

I'm reading up on tagging and category db schema's and performance, if I find anything interesting I'll post it here

You should know that a major player in the CMS industry, eZPublish, also hosts a kind of an EAV-solution – and is, incidentally enough, about as slow as a pack of mollusks put together. Not very fast indeed.

You should know that a major player in the CMS industry, eZPublish, also hosts a kind of an EAV-solution – and is, incidentally enough, about as slow as a pack of mollusks put together. Not very fast indeed.

Yep I know a lot about ezPublish. They split EAV into four tables-- entities, attribute types, attributes, and attribute values. Probably even slower than a normal EAV solution, though less duplication.Edit: Oh, maybe you weren't talking to me!

Look, if all you're going to do is bash someone else for offering an idea, I'd rather you posted elsewhere.

I'm not bashing him for thinking of an idea. I've been trying to prove that his schema doesn't fulfil the requirements you mentioned in your original post. If I posted a proposed schema (which I have), I'd personally want to know if I didn't fulfil the requirements.

@matthijsA...
I've come this far, I may as well keep going.

Your schema does not allow any more than one price changing attribute. If you had a product which had the sizes small, medium and large available, and it also had two different types of material, cotton and polyester, your schema WOULD allow the price to change depending on the size selected, but if you also wanted to charge an extra $2 for the shirt in cotton, then you wouldn't be able to. To achieve that, you'd have to create a totally separate product, and you'd have to display it like so in the frontend of the application.

In summary, my proposed schema supports fully customisable products; exactly like Dell allows when you purchase a computer from their website, you can add any number of options to your purchase, each of which add or substract from the final price. Your system on the other hand only supports a single price altering attribute with as many additional non-price altering attributes as you want). That's the point I've been trying to make.

How I understood the original post, was that allspiritsteve wanted a schema that allowed fully configurable/customisable products, with the ability to have multiple optional attributes affect the final price of that product. That's my reason for trying to prove the point I'm trying to prove - your schema does not meet those requirements. Unfortunately it's taken me many posts to try and explain that.

On a completely separate note, everyone keeps saying that EAV-like schemas slow down searching. Can anyone explain this in more detail as it doesn't make much sense to me as why this would be the case? What exactly would you be searching for which would cause the slow down.

Your schema does not allow any more than one price changing attribute. If you had a product which had the sizes small, medium and large available, and it also had two different types of material, cotton and polyester, your schema WOULD allow the price to change depending on the size selected, but if you also wanted to charge an extra $2 for the shirt in cotton, then you wouldn't be able to. To achieve that, you'd have to create a totally separate product, and you'd have to display it like so in the frontend of the application.

No. I am not sure how to explain it any better. In my schema you can have as many attributes as you want, and each combination has it's own price. Below, a shirt in 2 sizes and 2 materials, which gives us 4 variations of the product:

And about the display in the front end: we shouldn't keep discussing that, as it's not relevant. With each schema you can display what you want in the front end.

Is this the right solution for this specific problem? I don't know, maybe not. I hope some other people will come up with other possible solutions and with their experiences in different schema's and their (dis)advantages to help allspiriteve with his problem

With the example you posted above, what you're doing is essentially storing possible attribute combinations, which ultimately makes the data less flexible and results in having to store more data. As I mentioned earlier, if you had a product which a lot of customisable attributes (for example, a car - which may include wheels, colour, engine, interior, etc), you could potentially have 1000's of possible combinations, all of which would have to be stored in the database.

Let me demonstrate. Let's say we have a single product (T-Shirt) which has a size attribute, which consists of the values: small ($19), medium ($20), and large ($21). Let's also say we have a material attribute which consists of the values: polyester, cotton (an extra $2), satin (as extra $4). Let me first demonstrate how this would look in my proposed schema (a schema that doesn't store possible combinations)...

Two tables, 7 rows total, easily understood by humans at a glance (which is always a bonus) and complete flexibility over how the data can be displayed (and as I've proven, with the help of r937, the database schema can affect the format in which data can be displayed).

Now let's look at how the same thing is achieved in your schema (a schema that stores possible combinations - and that also stores each attribute value which makes up those combinations as a separate row - that's the number of combinations multiplied by the number of attributes)...

You can't tell me that schema is suitable for storing products with any more than 1 price-altering attribute. Notice how I needed 18 rows in the products2attributes tables (the number of combinations (9) * the number of attributes (2)). Also notice how I needed to add what appears to be duplicate products (same title, same price, but different ID).

I'm not saying your schema doesn't serve a purpose, it's just it's not a practical solution when you need to store multiple attributes with multiple price-altering values per attribute. A schema like yours used in such an application as this, would also be very slow.

Now we are at last getting some where. You finally seem to understand that my schema is possible. It is storing all the products/variations of products. Just in a different way.

You are correct that in my schema there are more rows. But they are rows of simple integers. In your schema, you have a lot of duplication of attribute and value names (which goes against normalization rules). But honestly, I really don't know enough about the domain model of allspiriteve's project to say what is "better". I just proposed an alternative schema. I really don't feel like or need to defend it.

Wardrop, please explain to me, using your proposed schema, how choosing 'cotton' as the material could increase the price by $2 if the shirt is in medium, and by $3 if large is selected. Not that you'd necessarily need to, but I still think that's another potential shortcoming of your table structure.

Oh, and don't be fooled into thinking that more rows = less efficient, or that more rows = lack of flexibility for displaying the data at the front end. If data is stored in an efficient manner, you will be able to display it however you want. Trust me on this

Now we are at last getting some where. You finally seem to understand that my schema is possible. It is storing all the products/variations of products. Just in a different way.

I don't think there was a stage where I didn't understand your schema and that it was "possible", it was just so damn impractical I thought that maybe you were missing something.

Originally Posted by matthijsA

In your schema, you have a lot of duplication of attribute and value names (which goes against normalization rules).

I disagree, and not just because we've always disagreed :-)

The attribute column is probably the only true form of duplication in my schema (more specifically, in the attributes table), but it's absolutely pointless to separate a single column out into another table - and there's a reason I why I wouldn't pair it up with the 'value' and 'price_diff' columns to be separated into another table, let me explain in list form...

An attribute may have a different set of values between products. For example, one product may have sizes small, medium and large, but another may only have medium and large, or short, medium and long.

Different attributes may contain the same values. For example, the value '1 litre', may be associated with the attributes: size, capacity, content, etc.

An attribute with the a certain set of values, which are identical between multiple products, may have different pricing modifiers. For example, we may have a t-shirt, and a pair of pants which share the same attribute and associated set of values (ie. Size, which consists of small, medium and large), but the price modifiers for the t-shirt may be +$1 for the medium and +$2 for the large, where as the price modifiers for the pair of pants may be +$2 for the medium and +$5 for the large.

As you can see, data integrity may be more at stake if those three columns WERE normalised, than if they weren't. God knows how a catalogue manager would be able to select and maintain attribute sets which only had a single difference between them. It would be extremely impractical. Take the following four attribute/value sets as an example...

It's completely impractical to normalise such data, and where's it's impractical to normalise, you generally SHOULD NOT normalise. If I did normalise those columns, you can see that there's still plenty of data duplication. This is an example of why normalisation shouldn't be just simply enforced everywhere. A database designer should use their own experience and judgement to determine where it should and shouldn't be used.