Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

3 Answers
3

Denormalization is the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data. In some cases, denormalization helps cover up the inefficiencies inherent in relational database software. A relational normalized database imposes a heavy access load over physical storage of data even if it is well tuned for high performance.

The answer is always "it depends", so here's my rule of thumb:

If ...

the amount of data is not large

you aren't doing a ton of joins already

and/or database performance is not currently a bottleneck

then stay normalized. Yes, denormalization is faster, but it also means you have redundant data in the system -- data that has to be maintained and kept in sync. There is no longer "one source" for that data, but multiple sources that can deviate. This is risky over time, so you shouldn't do it unless you have very good reasons to do it, backed by some benchmarks.

I would only denormalize when ...

the amount of data is very large

joins are expensive and you have to do a lot of them to get even trivial queries returned

database performance is a bottleneck and/or you want to go as fast as possible

While this seems less concise and a good query planner will see what you are trying to do and run your correlated sub-query as the above join instead, a bad query planner may end up doing an index scan of payments.id_customer (assuming you have a relevant index) (or worse, table scanning) instead of doing things the more efficient way. Even a good query planner may fail to see the optimisation if the arrangement of this query is wrapped in something more complicated. Expressing the relationship as a join rather than a sub-query may make more difference than changing your data structure.

As Jeff says, any denormalising should be considered with care - it can bring easy performance boosts, particularly for some reporting purposes, but can lead to inconsistency due to bugs in the supporting business logic.

As a side note: Obviously I don't know your business so I could be missing something, but your table relationships seem odd to me. They imply that you can never have more than one project with the same customer which is usually not true in my experience, at least over a long period.

@Brian: very valid point. And as well as the potential performance implications avoiding * in select clauses also avoids problems with column ordering in views-on-view in MSSQL if sys.depends gets out of kilter due to DROP VIEW+CREATE VIEW being used instead of ALTER VIEW.
–
David SpillettJan 10 '11 at 10:43

The project is a thought more as a independent application with its on domain and belongs to different customers so a customer cannot cannot have the same account on different projects
–
solomongabyJan 10 '11 at 11:05

In some databases you have the possibility to create "Materialized Views" instead of complex VIEWS with a large amount of data, based on a complex query.This can be used to avoid denormalization in a historical grown application system.If you decide to use "Materialized Views" you have to have a clear idea of refresh methods and the amount of storage which will be used by the Materialized View...