Wednesday, February 25, 2009

Is SQL Slow?

Last time I demonstrated a case where stored procedures are slow when they have to do computationally expensive work. The more interesting, to me at least, question is how slow is SQL? The answer to that is far more complex.

For example, this simple SQL takes 2.2 seconds to run on MySQL. This is a painfully faster than the 696 seconds it took a stored procedure to produce similar results.

As demonstrated in the previous article, the equivalent C# code took 1.4 seconds to produce the same results. Some may find it surprising that it take less time to ship 1 million rows out of the database and then summarize it in code than it does for MySQL to summarize the same data in the database.

In this simple case the performance difference isn’t much and is not worth the extra code complexity. For more complex SQL, with more joins, perhaps nested case statements and temp tables, and similar standard SQL techniques, it is often much faster to do the transformation logic in non-SQL code. I’ve found cases where I was able to improve performance by over ten times by using C# or Java code over SQL. Still, my inclination is to always see if I can’t get acceptable performance from SQL first as less code is generally required, and usually far less code. SQL is an exceptionally expressive language for certain types of problems.

Plus, the performance advantage of C# or Java won’t be true in all cases. Stating the obvious, shipping out all the sales data won’t make sense for more selective queries that query only a few sales. In this case it makes far more sense to write in SQL.

Deciding where to execute code, in the database or elsewhere, is a complex problem that would require a series of articles to answer reasonably (a hint, think about using sequential IO as much as possible). For now I just want to point out that running everything in SQL isn’t always the best performing method.

One of the problems with moving query processing to the client is that optimization from that point on is purely manual. Depending on your data set site what is fast now may be very slow later on when the data volumes or distributions change. Also, you will find that there are very subtle dependencies between clients and and the underlying database schema that will make maintenance a lot harder. All in all it seems as if you are better off using alternative strategies like caching in the client or fixing queries so they generate better plans.

Robert, I agree if you don't have high volume complex problems this type of solution would be overkill. I also agree that is results in more complexity, as I stated in the article, and should only be done when performance is an issue.

But there are times when the data volumes are high and the solution to be solved is complex, times when non-SQL solutions are required for performance reasons. Even the best SQL execution plans are not fast enough for some problems.

How does one create that cache in the first case if it takes hours to create it in well tuned SQL and one only has 30 minutes to create it?

Map/reduce is a great example of moving processing out of the database that fits your description perfectly. For conventional applications I try to stick with the DBMS as much as possible, because it minimizes work over time.

Yes, you are right.They promise a disk version within two month but now they have only in-memory one. Nevertheless, there are lots of databases that fit in memory. I'll check their site in two month. Their test are very encouraging.