I heard an excellent presentation by CF giant Charlie Arehart yesterday. It was one of the "unconference" sessions title CfMythbusters. Later that day I was priveledged to share the mic with Charlie and talk bout CF Troubleshooting. Anyway, while discussing cfqueryparam Charlie said something that made me sit up a bit and say "huh?". It has long been the contention of myself and others that the use of Cfqueryparam benefits both security and performance when used against an RDBMS like SQL server or Oracle. While this is a generalization it usually holds true. Charlie, however, illustrated to me a case where cfqueryparam might be detrimental to performance and he was so convincing I thought I would share it with you.

NOTE: Check out the comments for some cavaets and opposing viewpoints. Also note that the tip on constants may not hold water. See this discussion on Brad Wood's blog for more insight on that item

Excution Plans

First a quick primer to refresh your memory. The reason that cfqueryparam is often faster is that it allows an RDBMS to take advantage of pre-compiled execution plans. Consider this query.

The first time it sees it the DB server will pick out the best way to execute this query and save the query pattern allong with the execution plan. The DB Server might say to itself:

DB, everytime you see a query that matches "select fname, lname from users where ID IN (*an array of ids*)" make sure and use the xyz index.

Once that plan is in place the DB listener simply looks for the same pattern query. When it sees "select fname, lname from users where id in (*an array of ids*)" it says "Aha!" (or possiblye Eureka! if you are using the Greek character set) I know what to do with this one. So cfqueryparam does usually result in performance gains because it allows the DB server to figure out way in advance the best way to execute your queries and then use that method (the "cached" execution plan) without the need to figure it out again and again. Seems very simple right? What could go wrong?

One possible Problem

Even given the facts above there are times when the use of cfqueryparam does not improve performance? At least one reason why has to do with that idea mentioned earlier of the first time. Think about it for a minute and it will become clear. If the efficiency of the execution plan is determined by the first time the query pattern is established, then that first query is important. It becomes the basis for any subsequent queries that match the pattern. But as you might have guessed, the most efficient way to run a query does sometimes depend on the variables you pass in to it. Take the example above. If the first time it is run the "idlist" is sequential as in 1,2,3,4 - then hypothetically the system might say to itself, "DB, the id is the primary key, clustered and unique - let's use a table scan" (again, hypothetically - this is not a discussion of DB execution plan algorithms).

Now consider what would happen if a subsequent query is a non-sequential list as in (55,18,20038,72,1). Even though an index might be a better choice for this second query, the cached execution plan will stick with the first plan. Do you see the problem? Subsequent queries that might benefit from a different plan are going to be shoehorned into whatever the DB came up with to begin with. The DB is going to sit in it's rocking chair like an old geezer and say, "I a-been a-doing it this way for 10 hours and I ain-a-gonna change fur you or anybody!"

The Fix

I'm still wrapping my head around this issue, but here are a couple possibilities. First, make sure your queries use the same "type" of data in the same way. For example, I've seen function that are set up for search also be used to retrieve individual records. Consider the likely groups of params that will be passed into the query and build your queries to service that particular type or group of params. Don't be so married to code reuse that you build enormous queries that do everything for you in one query when 2 or 3 might actually be more efficient (do you really need to join the same table 4 times?). Finally there is the Calvinist method. Add index hints to predetermine the execution plan for the DB engine. I'm looking forward to some suggestions from the savvy CF/SQL readers in the audience.

An Additional Free Tip

NOTE: I used to have a tip here about making sure that static variables (i.e. "active = 1" where also paramed to insure the use of the plan cache. Now it turns out I was mistaken about that. See the comments below for some more definitive information. The comment from Christoropher Secord explains a way to test when your cache is updated or not (very useful - thanks Chris).

Hopefully this post gives you one more way to analyze your queries and a new insight into cfquery. As always, if you have constructive input I welcome and look forward to your comments.

You really need to change your hack-proofing script!! The message "this seems to be a hacking attempt" doesn't give me any information about why or how I might fix it if it's legitimate, it just makes it nigh-impossible for me to participate.

Something that Brad mentioned in particular that stuck with me was that the qep was more likely to vary when the cardinality of the data is low. So for example, if the column being evaluated is the primary key, it's not likely to change the execution plan much. On the other hand if you're evaluating a foreign key column from your "product" table to your "client" table and you've got millions of products but only a few dozen clients, then you've got really low cardinality (lots of dups in the clientid column) and there's a good chance that the execution plan that gets cached might not be the best plan for all those clients.

But I really don't know how often this sort of thing happens honestly... My gut says it's not very often.

I'm wondering too about the active = 1 causing it to be unable to cache the execution plan... I'm not convinced that's necessarily true. I'm under the impression that it actually would cache the plan, but that once you changed the 1 to a 0 then it would re-evaluate the query and either get a different cached plan or it would cache a new plan. I would think it would also have to do something similar if for example you asked it for active = othertable.othercolumn or if you asked it to filter on a column in (subquery). And of course there is a limit (albeit high) on the number of plans it will cache for a given database, so there's some very small risk of generating too many unique sql statements.

Doesn't really matter in my case because DataFaucet automatically adds cfqueryparam tags for all my filters. ;) I'm just playing devil's advocate. But I'd have to put together a test case to know for certain whether that's accurate.

Everything I've read about query optimizers leads me to believe that, given the way CF does parameterized queries, no optimizer could generate 2 different plans for the same parameterized query. That is, in the case of a parameterized query with type information, the optimizer is going to pretty much ignore the data in the actual parameters and pay attention only to the columns the parameters are for. In fact, you can have the optimizer show you the execution plan for a parameterized query without ever setting the values for the parameters. I can't see a scenario where changing the value of the parameters would make the optimizer switch to a different plan. Honestly though, I'd love to see a specific example of how I've got this wrong.

Second, I'm also pretty sure you are wrong about unbound types causing the plan to not cache. Anecdotal testing as I write this leads me to believe that my plans are getting cached just fine, even with them. Again, I'd love to see some specific examples of it happening.

@Rick O: You don't have to wonder what plans are cached if you run SQL Server 2005. SQL server will tell you what it has cached. The post I linked to above has a SQL statement you can run that I tried to post, but it "appears to be a hack attempt" Mark is too smart for me :)

I've run into a different type of performance problem with CfQueryParam. I was looping over a CSV file with 60000+ records and recording information from each one into a database, CfQueryParam was eating up memory pretty fast, until there was nothing left and Coldfusion shuddered to a halt.

Removing CfQueryParam made a huge difference in memory use (as did creating smaller SQL, buffering SQL to every 50 records and removing any use of Coldfusion components). I did quite a lot of testing and decided that when doing that many queries for trustable data that's being processed by simpler queries such as INSERT's, UPDATE's and small SELECT's, that it's best to avoid CfQueryParam as much as possible.

Anything that's not trustable or on a smaller scale I definitely use the tag. :)

Regarding active = 1. The 1 is not going to be bound is it? If that is the case isn't the SQL server going to have to evaluate the variable for type so it knows what to do with it?

I need to find a way to test this through the drivers (maybe tracing would help). I can analyze in query analyzer and see the execution plans, but typing into query analyzer means everything is a constant so I'm seeing the plan after compiling.

I can view the cache hits in perfmon, but not on a per DB basis (I don't think). So I'm struggling with finding a way to test wether I'm hitting the cache or not without cfqueryparam.

I'm not surprised by the results of a 60k record import. CF is not the best tool for this task, and yes, it requires more memory to use cfqueryparam because the SQL passed from the driver now includes an array of variables and is more of an object instead of just a string - so I could see how this would be the case for such a task. What you have really done is a division of labor. Your web server has some breathing room and your DB works harder - and maybe this is good and accomplishes what you want.

However, I still hold that in most cases cfqueryparam will result in better performance - and my experience bears me out. What database server were you using?

@David & Mark - I agree about the bulk import work. I rarely find myself in the situation of needing bulk imports but if I needed them for more than a fairly small number of records (100 or so) I would definitely start looking for a method of handling them that involved minimal ColdFusion, such as BCP.

@Mark re: active = 1 -- as opposed to looking at the actual plan, wouldn't just executing the query a few times and seeing if the execution time changes tell you if it's using a cached plan? Otherwise I'm told that using the sp_executeSQL stored procedure in SQL Server with input parameters will use the plan-caching the same way, so that may show up in query analyzer.

I suppose execution time might tell me if I could completely isolate the environment - but many many things can effect query execution time. Regarding the sp_executeSQL ... I thought SPs had a separate compilation method and caching. I gather you are saying that, while the SP exucation plan is one thing - when used to execute dynamic SQL it actually affects the "other" cache (the query plan cache) - yes?

re: stored procedures -- well, what I've been told is that the sp_executeSQL stored procedure very specifically uses the prepared statement caching *instead of* the normal stored procedure caching... that is of course, assuming I haven't misinterpreted what I've heard. :)

re: just checking the time -- I do most of my work on my notebook these days and so I test these things locally where I'm the only user and although there are other things that could affect it, the environment isn't as open as it would be on even a company dev a server. So I'll run it, restart the cf/sql services and run it again just to be sure. I know it's not the most precise method, but it works for me. :)

Certain changes in a database can cause an execution plan to be either inefficient or invalid, given the new state of the database. SQL Server detects the changes that invalidate an execution plan, and marks the plan as invalid. A new plan must then be recompiled for the next connection that executes the query. The conditions that cause a plan to be invalidated include:

- Any structural changes made to a table or view referenced by the query (ALTER TABLE and ALTER VIEW).- New distribution statistics generated either explicitly from a statement such as UPDATE STATISTICS or automatically.- Dropping an index used by the execution plan.- An explicit call to sp_recompile.- Large numbers of changes to keys (generated by INSERT or DELETE statements from other users that modify a table referenced by the query).- For tables with triggers, if the number of rows in the inserted or deleted tables grows significantly.---------------------

I can also speak to SQL Server 7 and DB2.

For SQL Server 7 it was a known bug that if you altered the underlying table you had to go in and alter the text inside your CFQuery or CF would try to keep using an invalid statement handle (cached QEP). I believe this was worked-around in CF, but I believe it was also fixed between either SQL Server 2000 or 2005.

For DB2/400 .... woof. Query caching will be the death of me. DB2/400 is much, much, much more sensitive about these things. The optimizer is also significantly dumber than SQL Server's, so it'll do things like keep trying to reuse a QEP when it really really shouldn't. Occasionally it gets so confused that you have to actually go in and delete the QEP cache. Conversely, woe unto you if you delete the QEP cache and don't restart anything that was touching the database at the time (including the CF service).

As for how to see the plan without entering the variables, check out two things in SQL Server. You can do it manually (SET SHOWPLAN_TEXT ON), or you can do it automagically in the Query Analyzer by highlighting a block fo text and hitting Ctrl-L, which is a shortcut for Query - Display Estimated Execution Plan. Put a tracer on and let CF do its thing and you'll see how it walks through sp_prepare and that family of procedures. You can then go in and try one on your own, and get it to show you a plan without ever entering a single value. (I'd copy and paste it here, but I understand your comment filter is tricky.)

I'm using SQL Server 2000. Managed to get things running great when I grouped all the SQL into a stored procedure and buffered the calls into an array for executing every 50 together in one CFQuery tag with no cfqueryparams. Monitoring memory and performance this seemed by far the best from the Coldfusion side of things.

I totally agree with using CfQueryParam for everything else though :)

@ike

Haven't done much with BCP before, I'll take a look at it but from what I can tell it might not do enough for the job.

Awesome awesome information - thank you so much for posting. That is really helpful. I never say that "estimated execution plan" choice before. I always set it to on in the menu and the ran an actual query. Feel free to enter your comment too ... I have altered how the "comments" field works.

David,

It sounds like you have your process well in hand. I would only add one tip. For reading in a text file you should consider creating a DTS package. You can still call it from CF if you like. There are lots of possibilities including dynamic file names and even automatic FTP retrieval. We do a DTS package for option quotes that consumes a 500k+ record file in about 12 seconds using the SQL DTS task scheduler. The fact is that a file DB type driver is much better at reading and parsing than anything you can do in CF (at least that's my experience).

@David - I haven't used it much either... once or twice and it's been several years. My understanding is that BCP is pretty efficient for dumping a CSV into a db table. If you need more than that, like translating column names, etc. then I would lean in the direction of getting the CSV into a temp table using BCP and then running a stored procedure or possibly a DTS package (if you have really complex needs) to translate and move the data from the temp table to wherever you need it.

Or if you wanted to be really slick, you could put a trigger on the temp table and have the trigger do whatever you needed on insert into that table -- and that could even include executing DTS packages if necessary, since you can use a stored procedure to execute them.

Hi:I'm really unsure about this assertion that the values being passed by the <cfqueryparam> tag can influence how the execution plan is created. Certainly *looking* at the execution plans being generated by SQL 2005 does not bear this out (I haven't checked Oracle yet).

If one has a hard-coded SQL statement which has WHERE x IN (1,2,3,4), then I could see how the execution plan might go "right, well it's those exact rows I need to fetch, and they're the first four rows (how it divines this, I don't know), so I'll just do a table scan". Looking at the plan generated from this sort of thing, no such inferences are made about the proximity to the "start" of the table, but still: I could see how in theory it could work that way.

However if one's passing in parameters for the list elements, eg:WHERE x IN (?,?,?,?), then the DB is not going to consider what the data could be (that would just be dumb), it's going to base its plan on the schema of the objects involved ("x has got a unique index on it, I'll use that").

However, in considering all this, one interesting thing I've discovered - which didn't occur to me before - is that if the length of the list in the <cfqueryparam> varies, a new execution plan is generated for it, because there's a different number of parameters being passed in: each element in the list is a separate parameter; it's not simply one parameter (I knew this, I just didn't see the ramifications of it); this means the SQL string being passed in is different, and the reuse of an execution plan is keyed on the SQL string (or, I imagine, a hash of it).

Mark, there's some incorrect information in this blog post: "active = 1 is going to cause the SQL engine to have to recompile this query each time it is passed. "

That isn't true, at least not with Microsoft SQL server. The handles that point to the plans in SQL server are stored in the table, sys.dm_exec_cached_plans. Count the rows in that table, then execute the example query you have in your blog (a query with a constant, like "active=1" and a cfqueryparam). You'll see that the row count increases by 1. The query has been cached.

Now change the value of the param. This will tell us if SQL server actually *uses* the plan. You'll see that the row count doesn't increase this time. However, if you change the constant to "active=2" then you'll see the row count increase.

Regarding the post David made below... the severe memory use is due to some bug in CF/MSSQL... this usage of memory and crashing of the server will happen to any of you using CFQUERYPARAM for high traffic sites IF you have the CFA DataSource setting:

Max Pooled Statements

set to 1000 (which I think is the default)... I am not sure what the effect of other settings are, but if you set this to ZERO you will no longer have the memory leak crash your server... I had a server that was running great... over time we started adding CFQUERYPARAM to all our SQL and then our server slowly became less and less stable ultimately requiring 4 to 5 restarts of CF a day (and RAM usage going to 900+ mb)...

We set this to zero and the server has not been restarted since and CF hovers at 625mb of RAM... it has made a world of difference on every CF server we have set this to...

Someone who knows better than I do may say that setting that to zero means CF cannot actually cache any execution plans... that is fine by me until this bug is fixed... :)

This happens in CF8 and maybe CF7 (but haven't verified)...

A bigger deal should be made about this in the CF community... :)

Patrick

----- david's original post -----I've run into a different type of performance problem with CfQueryParam. I was looping over a CSV file with 60000+ records and recording information from each one into a database, CfQueryParam was eating up memory pretty fast, until there was nothing left and Coldfusion shuddered to a halt.

I think the cached execution plans that SQL Server uses internally are separate from the pooled statements controlled by that setting... I believe the pooled statements are something that happens in the Java area just outside the db server in say the area of the driver, so it shouldn't impact the server's internal plan caching. But you can actually test to be sure -- leave the connection pooling set to zero and run a query against a big table several times with and without cfqueryparam. If you get better performance with the params in, then SQL Server is still caching its execution plans. I'm pretty sure I've done this on a DSN with the statement pooling turned off like that -- as a matter of fact I think I tend to turn off statement pooling when I create a new DSN just as a matter of course and have never noticed that it changes performance any... so if it's positively associated with a memory leak under load then I'd think there's definitely no reason to leave it turned on.

@Ike: You beat me to this comment. I was still Googling to try and make sure I had my facts halfway straight. :)

@Patrick: You said, "Someone who knows better than I do may say that setting that to zero means CF cannot actually cache any execution plans". From my understanding that would be incorrect. Whether or not you allow pooled statements, SQL Server will continue to use a cached execution plan and will not recompile. (i just tested this and watched the results in SQL profiler)

As Ike said, Pooled Statements != cached execution plans. Execution plans are created and cached at the database level. You can see all your database's cached plans along with some stats in the master.dbo.syscacheobjects table. Pools statements are something that J2EE apps do (among others) in which they prepare a SQL statement using sp_prepexec (or possibly sp_prepare) which returns an integer "handle" that can be used to re-execute that same prepared SQL at a later date via sp_execute using the same database connection. When you set your "Max Pooled Statements" eq to 0, ColdFusion un-prepares the statement every time it is called with sp_unprepare. Frankly, at this point I'm not sure why CF doesn't just use sp_executesql instead which is for a single-use paramaterized query.

Even though the DB is still holding on to (and using) the same execution plan, the Application is not bothering to use the resources to keep track of all the prepared statements it has used. It simply discards them immediately.

Other than the fact that a call to a prepared statement does NOT pass the entire SQL over the wire, but rather a the integer handle that points to it, I'm not 100% positive specifically what is supposed to be faster about the prepared statement since compilation and caching of the execution plan still only happens the first time you call it.

Also, a random note on the topic. The "Max Pooled Statements" setting is only useful when you have the "Maintain Connections" box checked. If you don't maintain connections, ColdFusion logs in and out of the database with every request. Before logging out, it must unprepare and prepared statements it had laying around.

And one last thing (sorry, this got so long)... "the severe memory use is due to some bug in CF/MSSQL"

I have never seen evidence that this is a bug. In my opinion, the reason for implementing queryparamed cfqueryies as prepared statements was under this assumption: Most applications have a limited number of SQL statements that they will use and generally most apps will wish to resuse the same statements over and over. Therefore, it is worth the over head to keep track of up to a default of 1000 such statements per connection.

These sort of assumptions wouldn't hold true if you had some sort of import process running thousands upon thousands of DIFFERENT statements. (which is what started this sub-discussion) I think this is one of those things where CF makes it INSANELY simple to do paramaterized prepared statements, but with a few assumptions made that apply to 98% of all use cases.

Hmmm, perhaps this warrants a full blog entry instead of filling up Mark's comments. :)

@mark - queryparams yes absolutely do affect query performance. I think you misread my comment -- I was saying I'd never seen a change in the cf admin pooled statement setting affecting performance personally... but then I've not fiddled with it much other than turning it off and I'm not sure but I think I generally leave maintain connections unchecked too, at least on my dev machines, which given Brad's comments may be part of the reason I've never seen it affect performance.

Ah... I see. Well that would be interesting to test to be sure. I would think that connection pooling would definitely impact performance - especially on a busy SQL server... eliminating the overhead of actually making the connection, finding the right IP address, permissions etc. Right? To be honest I never turn it off :)

@Mark - I think this may also be a case of confusing terminology... I'm not certain but I think the pooled statements is also separate from "connection pooling". I don't believe connection pooling actually can be turned off, although honestly that's not one of my stronger areas of expertise. :)

@Ike: They are are in fact separate, yet related. Connection pooling is simply keeping open connections to the database for use across multiple requests. Pooled Statements is keeping a list of prepared statement handles laying around that have been created on one or more of your pooled connections for reuse.

My understanding is that unchecking the "Maintain Connections" checkbox in your data source settings disables connection pooling (since each data base connection is disconnected at the end of the page request) and as a side affect also rendered statement pooling useless since there are no connections left open and prepared statements are connection specific.

I'm not so sure. I read the post and took a look at the example. I have a great deal of code like this running successfully (large import queries from a loop and using cfqueryparam) - so I suspect there is something else going on.

Wow, I'm only coming upon this discussion more than a year late. Bummer, as some of the early comments want to assert that I (and Mark and Brad) are wrong in our assertions about the impact of CFQUERYPARAM and possibility of it caching a plan that's sub-optimal for later values. I do realize that the suggestion seems heretical.

Some did mention in later comments that there are ways to prove this. You don't need to take our word for it. I'll point especially to a talk I did at cf.objective 2007 (http://www.carehart.org/presentations/#dbplancachi...) where I not only offer demonstratable code examples but more important offer useful SQL Server monitoring statements that can show clearly if/when this is happening, and I show things for SQL Server 2005 and 2000. I also mention some related monitoring for other DBMS's, as indeed the issue is definitely not limited to SQL Server.

I'll say also that some comments have hinted at another useful point: there can be differences in specific SQL used that can affect how likely this issue will happen. In my talk, besides all that I try to summarize, I also point to several substantial resources from others (engineers related to the DBMS's themselves and docs) which demonstrate and explain this in far more detail.

And let's clarify that it's definitely not a CF-specific problem, either, as I explain in the talk. CFQUERYPARAM just creates a prepared statement (or parameterized query) and other languages support these as well.

You may wonder, then, why this isn't more well known and broadcast. I honestly don't know. Part of it is that coders often don't pay attention to what's going on "behind the curtain". Or they accept as fact assertions made by the majority. In this case, I'm not asking anyone to "believe me". I offer clear evidence that you can test for yourself.