execution plans

Grant Fritchey has a book on execution plans, revised in 2012, available as a PDF for free from the Redgate site where print and Kindle editions are also available for purchase. There are MANY free books available on Redgate’s site, highly recommended reading!

And there’s SQL Sentry PlanExplorer, in both a free and paid version, it’s an amped-up version of the basic SSMS tools. Like Mladen’s tools mentioned in the previous post, it is invoked by a right-click on an execution plan object and opens in its own program window. There’s a lot of interesting aspects to this tool, such as the Join Diagram that shows you sort of a relational diagram of your query. One thing that’s quite nice is that each operator’s cost is shown above the icon with the most expensive operator in red and the next in yellow. If you hover your mouse pointer over any tool you’ll get the same information as what you’d see in SSMS, but if you hover over the first (top left) icon, it will tell you if a good enough plan was found or if the optimizer timed out.

Interesting stuff. With the student information system that I’m developing, there’s one query that’s central to all users, it’s called vwStudentsFiltered. Since I need to enforce row filtering based on what district(s) a user has access to, this became the core. The original view looked something like this:

SELECT st.*
FROM Students AS st
JOIN SeekerUsers AS u
ON u.SeekerUserDBLogin = SYSTEM_USER
AND st.StatusCode in ('0', '1', '9')
AND (u.SeekerUserRole = 1 --superuser sees all, otherwise filter
OR st.DistrictNum IN (SELECT td.SeekerUserDistrict
FROM SeekerUsersDistricts AS td
WHERE u.SeekerUserid = td.SeekerUserid)
);

Obviously I wasn’t doing a Select *, but it’s a long field list and irrelevant to the discussion.

Student Status Codes of 0, 1, and 9 indicated an active student. District users were only allowed to see students that were not assigned to a district, or students assigned to their district. And SeekerUserRole 1 is a superuser who is allowed to see everyone, regardless of status code. The problem is that if you looked at it through SSMS Tools Pack by Mladen Prajdić, it would report that the query had an estimated cost of 200%. (SSMS Tools Pack is pretty neat, I recommend checking it out) The SeekerUsersDistricts table consists of two fields: SeekerUserID and a district number. For district users, they’d have one record in this table, teachers could have more than one district number.

The problem is that the u.SeekerUserRole = 1 OR st.DistrictNum IN confuse the optimizer: it could execute either path, and you’ll only know at runtime. Kind of hard to optimize for something like that. Turning on client statistics gives us the following:

There are three types of users: superusers, district users, and teachers. The front-end app gives them different menus to control what they can do and what data they can see. In this post I’ll be looking at how to improve the view but only for the superuser.

Here’s the code that I came up with:

--Variation 1
SELECT st.*
FROM Students AS st
WHERE 1 = (SELECT 1
FROM SeekerUsers AS u
WHERE u.SeekerUserDBLogin = SYSTEM_USER
AND u.SeekerUserRole = 1);

--Variation 3
SELECT st.*
FROM Students AS st
WHERE EXISTS (SELECT 1
FROM SeekerUsers AS u
WHERE u.SeekerUserDBLogin = SYSTEM_USER
AND u.SeekerUserRole = 1);

Running all three in the same query window shows each result set having the same number of rows returned, 939 students. So far, so good.

Variation 3 made sense to me. There are no common fields between the Student table and the SeekerUsers table, so a join, on the surface, doesn’t make sense. So why not a Where Exists? The devil, as they say, is in the details. The most glaring thing to show in the graphic execution plan was an operator that I had never seen before: a Row Count Spool (Lazy Spool) with a cost of 78%. With all three queries running together, Variation 3 had a cost relative to the batch of 70%, the other two each being 15%. The Row Count Spool was doing rebinds and rewinds, which are not good. The estimated number of rows and the estimated row size on that operator were farcical: it estimated one row and returned 939 rows (no, updating statistics and reindexing did nothing to improve that mismatch). On top of that, the entire estimated subtree cost was 0.12, the other two variations were each 0.026.

So variation 3 is out. What’s the difference between 1 and 2? The graphic plans are identical, as are the numbers on every operator, but there’s a subtle difference: the cache plan size for #2 is 96 kb, #1 is 88 kb. Kind of trivial, but this plan is going to be called regularly, so it’ll live in the cache pretty much forever. At this point there’s no way to forecast how much cache memory pressure there will be, but 8k is 8k, so since every statistic between the two variations is identical, I’ll go with #1. Aesthetically, I personally prefer #2, but I can live with #1.

Here’s the execution plans:

Yes, I’m getting scans on my Students base table. Right now the queries are returning 99% of students, so a seek doesn’t improve things since a covering index would require every field in the table. As the number of students grow with diverging status codes, then indexing might come in to play.