MySQL Stored Procedures Ain’t All That

I give quite a lot of presentations. A whole lot less than I used to, but still quite a few per year. Most of the time, the presentations are on performance tuning MySQL.

Almost every time I give a presentation on MySQL performance tuning — and this happens 100% of the time if I am presenting to a Windows SQL Server crowd — I get the following question:

Why don’t you cover using stored procedures in order to increase performance? Wouldn’t that be the easiest way to get better performance since the stored procedures will only be parsed once and then the compiled bytecode would be efficiently executed from then on?

Every person that asks this question assumes something about MySQL’s stored procedure implementation; they incorrectly believe that stored procedures are compiled and stored in a global stored procedure cache, similar to the stored procedure cache in Microsoft SQL Server[1] or Oracle[2].

This is wrong. Flat-out incorrect.

Here is the truth: Every single connection to the MySQL server maintains it’s own stored procedure cache.

This means two very important things that users of stored procedures should understand:

If you operate in a shared-nothing environment — for example, the majority of PHP and Python applications that do not use connection pooling or persistent connections — if your application uses stored procedures, the connection is compiling the stored procedure, storing it in a cache, and destroying that cache every single time you connect to the database server and issue a CALL statement

If you use stored procedures, the memory usage of every single connection that uses those stored procedures is going to increase, and will increase substantially if you use many stored procedures

Ooops, I Invalidated Everything Again

So, what happens when you CREATE, ALTER, or DROPany stored procedures? Since MySQL stores all stored procedure execution code on the connection threads, each of those connection threads must invalidate the procedure in its caches that has changed, right?

No, it’s worse. Every time ANY stored procedure is added, dropped, or updated, ALL stored procedures on ALL connection threads will be invalidated and must be re-compiled. Here is how the “caches” are invalidated:

from /sql/sp_cache.cc, lines 193-197, in MySQL 5.5

/*
Invalidate all routines in all caches.
SYNOPSIS
sp_cache_invalidate()
NOTE
This is called when a VIEW definition is created or modified (and in some
other contexts). We can't destroy sp_head objects here as one may modify
VIEW definitions from prelocking-free SPs.
*/void sp_cache_invalidate(){
DBUG_PRINT("info",("sp_cache: invalidating"));
thread_safe_increment(Cversion, &Cversion_lock);}

It’s a bit misleading, since it actually doesn’t invalidate anything at all. What the above code does is increment the global “Cversion” variable. When a connection thread attempts to execute, drop or insert a new procedure, it will notice that it’s local cache’s version number is less than this Cversion number and will destroy the entire cache and rebuild it gradually as procedures are affected or executed.

So, Should You Use Stored Procedures in MySQL?

The above warning doesn’t necessarily mean that you should never use stored procedures? No. What it means (besides being a bit of a rant on the implementation of MySQL’s stored procedures) is that you should be aware of these issues and use stored procedures where they make the most sense:

When you know that you will be executing the stored procedure over and over again on the same connection — for instance, in a bulk loading script or similar

When you know that you will not be disconnecting from the MySQL server at the end of script execution — for instance, if you use JDBC connection pooling

When you know that you have a limited number of stored procedures and the memory usage of connections won’t be an issue

Finally, if you see benchmarks that purport to show a huge performance increase from using stored procedures in MySQL, be careful to understand what the benchmark is doing and whether that benchmark represents your real-world environment. For instance, if you see a huge performance increase in sysbench when using stored procedures, but you have a PHP shared-nothing environment, understand that those benchmark results mean very little to you, since sysbench connections don’t get destroyed until the end of the run…

Your last points on when to use stored procedures are spot on — again, assuming that the goal is “faster due to compiling once”.

I am onsite at a client today (a rarity for Pythian) and they have an ETL server that's CPU bound that processes 40,000 queries per second, with about 6,000 commands sent to the mysql server (basically on average a stored procedure does about 7 queries)…..

They aren't using persistent connections or calling the same queries over and over in one connection, but they found that with the *volume* of queries they were able to do, actually having less network overhead made a huge difference for them (ie, instead of making 7 calls to the server, making 1 call to a stored procedure to do those 7 tasks).

Everyone's mileage may vary, and you're 100% correct with your points….it's ironic that today in particular I ran across someone using stored procedures correctly and improving performance but not because of compilation.

Keith Murphy

Jay,

I would be curious as to your thoughts on this from a Drizzle perspective. I know there are plans on implementing stored procs in Drizzle — will it be with the same centralized cache architecture as SQL Server and Oracle?

thanks,

Keith

http://joinfu.com/ Jay Pipes

Hi Keith!

Drizzle has no stored procedures. I do not know about a timeframe for adding them back, if at all.

Cheers!

jay

http://twitter.com/billkarwin Bill Karwin

Great post! It gives a reason for why Oracle and Microsoft database developers are advocates of having large libraries of stored procedures, at least more often than MySQL database developers are.

And Sheeri is spot on: there are performance scenarios that benefit enormously from using a stored procedure, but this has little to do with procedure compilation. I've seen this happen too. But it's hard to come up with a generalized, concise rule for when to use a stored procedure, because it involves too many factors.

brooksjohnson

There are reasons to use stored procs, but performance is usually not one of them (there are always exceptions). There are typically other, more standard (cross database), ways to achieve performance roughly equivalent to stored proc performance.

For example, batch sql, the sending of multiple sql statements to the database at once, is often just as fast as a stored proc that is equivalent to multiple sql statements.

Even with Oracle and MsSQL, both have a sql execution plan cache, which results in reusable complied sql statements that are usually just as fast as a stored proc.

The performance benefits of stored procs have generally been greatly misunderstood and overstated. This explains why ORMs, like hibernate, have become so popular. If you know what you are doing, hibernate is fast enough for most database interactions.

http://kostja-osipov.livejournal.com Kostja

<quote>Every time ANY stored procedure is added, dropped, or updated, ALL stored procedures on ALL connection threads will be invalidated and must be re-compiled.</end quote>

This is no longer the case in 5.5. We invalidate stored procedures on demand. The dependency tracking mechanism is still primitive, but there is no more sweeping invalidation of all caches.And we're working on a global cache as well. No forward-looking statements though (not because of Oracle, but to be truthful).

http://kostja-osipov.livejournal.com Kostja

Let me clarify: the sp cache version is still global, and invalidation is global.But you no longer take a hit trying to flush the entire cache whenever the version changes.You only flush the procedure that is about to be used. Thus the overhead of invalidation is distributed.It was an easy thing to do and we did it as part of 5.5 bugfixing.

http://joinfu.com/ Jay Pipes

Gotcha. Thanks for the clarification

http://joinfu.com/ Jay Pipes

Heya, Kostja!

Good to hear on a global cache; that's very good news! And, yes, I understand about forward-looking statements

As for the “on-demand” invalidation, it is still the case that any modification of any stored procedure will cause all other stored procedures to *eventually* be invalidated, though. Unless, of course, you're talking about code in 5.5 that's not on Launchpad? Or am I misunderstanding the point of the Cversion code?

I saw that you were the original author: “Added rewards_report stored procedure, submitted by Jay Pipes”.

“The rewards_report stored procedure generates a customizable list of the top customers for the previous month.”

This produces an “Empty set” since the data is from 2006!

Sorry if this is not the best way to pass this comment on…

http://joinfu.com/ Jay Pipes

No, not the best way to comment on it, but no worries!

That stored procedure was meant as an example of a reporting-type stored procedure, nothing more. You can modify either the data in the database (just run an UPDATE on the dates of things, updating from 2006 to 2010) or you may modify the stored procedure to accept a data or date range to pull the older data.

Cheers!

jay

http://pulse.yahoo.com/_R5BYQPWQWTILFU66NLZGURYZSU Jane

Thanks: I’m an Oracle/SQL Server coder being asked to work in MySQL, and I’d been wondering if I’d get the same performance increases I’m used to. This plus the comments gives me exactly what I needed to know.

http://twitter.com/hotproject Hot Project

Do you have any information on the security implications of Stored Procedures? I have read a lot about stored procs for MSSQL improving security for posted data – and have assumed the same for MySQL. Is that true?