Tables are created with appropriate normalization rules, However the database is performing slow. [Ie.: The select, insert statements are taking time to do his operation.] What are areas we need to look to improve the database performance.

I understood this will be a vague question. Please provide some suggestions. And also what are the areas we need to look in Query Execution plan?

Sharing your research helps everyone. Tell us what you've tried and why it didn’t meet your needs. This demonstrates that you’ve taken the time to try to help yourself, it saves us from reiterating obvious answers, and most of all it helps you get a more specific and relevant answer. Also see How to Ask
–
gnatOct 8 '13 at 6:32

He was probably just trying to find out if you knew what an index is. Though my spidey-senses would tingle if they actually implied the queries were slow 'despite' appropriate normalization.
–
GrandmasterBOct 8 '13 at 6:47

8

The more you normalize, the less efficient your SELECTS get. Normalization is to make efficient INSERTS, denormalization is the path to the most efficient SELECTS. Most applications must figure out a balance of the two.
–
GrahamOct 8 '13 at 13:46

2

If you deal with a sufficiently large database you will learn that you have to skip normalization in favor of performance...
–
RigOct 8 '13 at 15:45

1

Nah, Denormalization is not that important. RDBMS are optimized for joins and can handle pretty much everything if they're configured correctly for the task. This is an arcane art, especially on Oracle. Give that thing enough hash area size and it does not care how many large tables you join in terms of performance.
–
FalconOct 8 '13 at 21:13

4 Answers
4

I would talk about how there are many things which can be done to improve performance. The first thing is always to investigate if the correct indexes are in place. Of particular concern in a normalized database is making sure FKs are indexed. Likely this would fix many performance issues.

Other things to look at would be rewriting the SQL code to use more performant techniques such as getting rid of cursors and correlated subqueries and making where clauses sargable. You would want to review the worst performing queries individually. You would also want to review queries that are frequently run (especially if multiple users run them simultaneously) as a small change in those could multiply through the system. If your worst queries are coming from an ORM, they might need to be rewritten as stored procs so they can be performance tuned.

You might also want to make sure you have a performance problem. What you might have is really a blocking problem where performant code is being blocked by other processes and has to wait.

Then you would look at hardware, if you have underpowered hardware and network connections, likely no other change is going to fix that.

In a large enterprise system, you might consider data partitioning.

Denormalization is a technique to improve performance but it is the last thing you would want to consider. First, you have the risk to the data of changing the structure that drastically. Converting the data to the new structure is something that can go very badly wrong if a mistake is made and it is more time-consuming to make this type of structural change than any of the other possible performance improvements. It would also be irresponsible to denormalize without creating triggers to make sure the data stays in synch as it is changed in the denormalized tables. This may mean selects are imporved but action queries are slower, so performance may not be imporved as much as you think. It is also a concern that in denormalizing, you may be making the tables significantly wider and that can affect performance negatively if you have wide tables.

Sounds to me that your interviewer was not looking for a data scientist answer but was simply looking to make sure you understand that "normalization" != "performance". So I'll keep this answer at the level that I'm guessing he wanted.

Normalization means minimizing redundancy in stored data. Instead you setup relationships (often with foreign constraints) between multiple tables. However, while normalization might lead to a smaller amount of stored data, often it creates performance problems because now many queries end up joining multiple tables. Same thing with adding data where you might now have to update multiple tables at once.

Often, speed gains could be achieved by de-normalizing the data. You are storing more and there might be duplicates, but when it comes to running most frequently used queries, all your data would now be in one table. Getting results from one table is usually much easier on the hardware than having to join multiple tables

I would never suggest denormalization as the first performance fix. This is the fix of last resort as it often creates data integrity problems. And I certainly would never mention it without discussing how I would avoid data integrity problems and how denormalizing could slow inserts/updates/deletes but speed up selects. Likely proper indexing would make the normalized tables perform well without the risks of denormalizing.
–
HLGEMOct 8 '13 at 13:50

@HLGEM: It is not "the first" performance fix and I never meant to imply that. But IMO "level of normalization" vs. "ease of query" is one of the most basic balances in schema design.... after you already looked at everything else, disk, transaction logs, indexes, etc... But this question isn't asking about all that, the context is normalization vs. performance
–
DXMOct 8 '13 at 14:51

2

I would disagree, if I asked this question in an interview I would expect the interviewee to say a lot of other things before mentioning denormalization. Denormalization in an already existing database is a serious risk, I would expect the person to not suggest that as the first alternative, although I would expect given the way the question is worded that they would mention denormalization and both the pros and cons.
–
HLGEMOct 8 '13 at 14:59

2

They could've talked for 2 hours in that interview, but OP asked about one (of what I assume was many) questions. And look at the title: slow performance even after normalizing tables. If I ask that question in an interview, and I have before, I would expect the person to immediately say, your question is wrong. Performance and normalization isn't the same thing and often the opposite. Then talk for 2-3 minutes in supporting that answer. Not go into every nut and bolt and talk for an hour about DB optimization.
–
DXMOct 8 '13 at 15:25

Making INSERT statements run faster is a bit of an arcane art. But that's probably not the focus. The point of a database isn't putting data into it; it's getting it back out in interesting and useful ways. So the main things to focus on are SELECT statements.

The first thing I would look at is checking the query plans on slow queries. See if you have any table scans that are taking up a significant percentage of your time. A table scan is when the database engine has to examine every row individually to see if it meets a WHERE criterion. If you find one of these, you can make the query run faster by indexing the table on the appropriate WHERE criteria. This can take search times from O(N) down to O(log N) or even O(1).

Some databases will make it easy on you: their query plan analyzer will point out that you're missing an index and suggest what you ought to create.

Also, check out the joins on your query. Make sure they aren't using too broad of joining criteria, and be careful that you're not using left outer joins when a full join would work. Both of these issues can cause a poorly-written query to produce too many rows and take longer to run.

If you don't have missing indices or bad joins, a more advanced trick is denormalization: setting up columns on tables that duplicate data that can be found in other tables, to allow you to avoid joins or aggregates that can be expensive. This has to be done carefully, though, with triggers so that the data remains in sync, and is best done only if you know what you're doing and if there aren't any better alternatives available.

One of the "better alternatives" to de-normalization in many, though perhaps not all, cases is a materialized view (aka indexed view).
–
JasonTrueOct 8 '13 at 5:00

1

And check to see if all the joins are used. I have seen this espcially when people are using views that call views where lots of unneeded work was being done that would not have happened if they had written the code directly. But I have seen this in nonview code as well. And of course not selecting information you don't need (such as using select * in any query using a join).
–
HLGEMOct 8 '13 at 14:08

Specifically, in the Query Execution Plan look for actions that are table scans instead of index seeks. It is a hint that you might want to add an index to say a column that represents foreign key (they don't get created automatically)

Other options would be to put your data files on different physical disks. Using RAID for your partitions might work as well. At the very least, you want to separate log files from that data files...so that writing to the log does not impact the write time to the data file.

More advanced scenarios include clustering and sharding to allow the load for searches to be spread across multiple nodes.