Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. Join them; it only takes a minute:

I am doing some work for a client and they are creating a database of products of which I need to consume product data when it's created and updated to store in a third party service and then present on a website.

They have come up with 2 different data models and I am trying to understand why they believe them to be a good idea and would appreciate people's opinions on the pros and cons. I personally do not think they are sensible approaches to managing product data, but I am up against 2 architects and a couple of developers who seem to think the approaches they've come up with are good long term solutions.

Background

This is a pure product catalogue. It doesn't contain price/stock/promotion data. They are intending to use MongoDb and are taking data from excel spreadsheets which they will map into their db. They are basically intending to pass me the same data model as they use for the schema of the product database to consume my third party service.

The types of products are complex. They include food, health supplements and an example might be something like protein shakes. Data would include complex models such as nutrition information, ingredients, allergen info alongside the standard product name, brand, description etc. The data is multi-lingual and each product could be in up to four languages. However the different languages would still be products for a single store and so there is no need to worry about them being for different markets, currencies etc.

I have discussed with them the idea of parent product data which is data that would be the same across all variants/skus of a product. This could include product id, brand, product name, category, overview description and how this could be represented with a dictionary where it's multilingual such as

And then data which is specific to variants such as flavour, size, ingredients, nutrition information.

The product catalogue is not designed to be consumed directly via api/websites, but acts more like a master catalogue which is given to third party services which store their own copies. These third party services would be where product data was exposed to websites etc.

Model 1

Their first model was a flat document for each variant and language. For a single variant this would include all of the parent information in a single language.

This would mean that each document would include the common data such as brand, product name etc as well as the variant specific information.

This model would be for a single language and so would be repeated in four separate documents if there were four different languages.

If there were multiple variants imagine different flavours and different sizes all the common data would be repeated for each flavour in each size so the total number of documents would be flavours x sizes x languages - 100 documents for 5 flavours, 5 sizes and 4 languages.

Pros

Its quick to get product data by sku and with it comes all the parent
data

Cons

I worry about maintaining the data, with all the parent data repeated
across all the variants in all the languages, if they go out of sync,
which one is the truth.

For me consuming the data means repeating the same operations on parent data
every time a sku is changed. If a
parent field is changed they will have to update every document for
every variant in every language, this seems like a lot of work.

It
seems inefficient for storage as there is a lot of repeated data, but
their argument is that storage is cheap.

Model 2

They have suggested a recursive model. Basically they would use the same flat structure as in Model 1, however they would populate different fields depending on the level of the data. Imagine a product having 2 levels, level 1 being the common parent data, level 2 being the variant.

They have designed this so it can have 'n' levels and be able to have an intermediate level which could create groups of skus so grouping by flavour of protein shake for example.

To be clear they would use exactly the same data structure at each level, but would create a reference to a parent document.

They would fill in the fields they had at each different level of the product data so brand would be filled in at each level. Product name might be the root product name at the parent level but then have a flavour appended to it at the variant level. Any fields which weren't used would be included but null.

There would be rules so if a product model didn't have a referenced parent then it would be a parent/root product model. If it had a sku then it would be the bottom level. If it had a productId and a parent productId but no skuId then it would be one of those intermediate levels such as a flavour group.

They would still repeat all the documents for all the different languages. So this could still be four copies of all the documents for each product at each level. I don't know what they'd do if they had an intermediate level with no language specific data.

Pros

They can separate out the parent data from the variant data.

Possibly easier to maintain than model 1 as they would only have to edit parent data in the parent documents.

This is very flexible as they can include any piece of data at any level.

Cons

I think this is a very complex model, it seems strange to include fields at the parent level model which would surely never be used. I don't really understand why they can't just decide which fields are parent fields and have a set parent level model.

I think this is hard to consume in an api. Especially as their initial idea would be to send every document separately so you'd just get a parent level document, or a variant level document or a group level document. Without being able to see the entire product model their could be complications as the platform I input the data to only has a product parent and then variants underneath it. Also for consuming things like product names, where it changes as it goes down the model this could be complex to understand the rules.

I think this data is harder to understand for humans and anyone who maintains the database and could lead to problems in the future although I'm struggling to articulate the exact issues they may run in to.

I'm worried about referential integrity and the possibility of orphaned records, but I guess this is possible anywhere where you separate variants from parent data.

My suggestions

I have suggested they look at their products and figure out which data is common to all products with some fields being required and some optional. They put all this common data into a model which is the product parent - product id, brand, product name, overview description, warnings etc.

I would create any fields which would be translated into a map as I described at the beginning to stop duplication of documents for different languages.

This parent model might include a field which references how products are grouped with a field such as "groupOn": "flavour" for display on a website.

I have suggested they create a separate product variant model which includes all the data which is specific to a concrete product. This would include sku id, size, flavour, weight, nutrition, ingredients etc. I do think they could normalise this further but I really don't think it would be accepted. The product variant would reference a product parent via a productId.

In theory they could just include all the variants in an array inside the parent model if they want to have a single document for each product, but I don't see a big issue with splitting them up into separate entities.

Pros

This is much simpler to understand. It's clear what's common and what's variant specific making it easy to consume via an api.

This is easier to maintain. All the parent product data is in a single document. All the variant specific data is in a single document for each variant. The relationship is clear and simple.

This won't repeat data and so take up space to store.

Cons

If something is common to a few variants e.g. a flavour that has different sizes then you would obviously have to update multiple documents.

It may be slightly more difficult to retrieve a parent and all of it's skus but this could be arranged differently to have an array of skuIds under the parent instead of a productId on the sku.

The Question

I'd really appreciate some input specifically if there is anything I am missing that means that what they're doing is a really good idea or clever in a way that I haven't understood.

If there are solid reasons against the two models they've suggested I'd really appreciate them as well. The client doesn't seem to have anyone else able to cast a critical eye over what's been done, but it's difficult to argue against a seemingly experienced team on my own - of course I could be wrong about all this and it may make perfect sense to more experienced people.

Is there another way which I may have missed. I appreciate we could go fully down the relational database route but I think that's out for now on their end.

Also is there anything good or wrong about the model I've suggested. I think it's probably a bit closer to a standard way of doing things, but is this a bad thing?

If you've made it this far, well done and thanks for sticking with me!