Pages

Friday, July 13, 2012

Optimize Queries

Microsoft’s
Measured Skill description: This objective may include but is not
limited to: understand statistics; read query plans; plan guides; DMVs; hints;
statistics IO; dynamic vs. parameterized queries; describe the different join
types (HASH, MERGE, LOOP) and describe the scenarios in which they would be
used

What
I see:

·understand statistics

·query hints

·statistics IO

·join types (HASH, MERGE, LOOP)

Understand Statistics

Statistics are the way that SQL
Server records and uses the data distribution for tables and indexes. They allow the query optimizer to choose an
appropriate plan based off of row count, histograms, or page density. Fresh statistics are necessary for the
process to make the best possible decision, but stale statistics can fool SQL
Server into thinking it has found the best plan, when in fact it is a
sub-optimal plan. For a great read and
more information on statistics, see this Idera
post by Donabel Santos on Understanding SQL Server Statistics.

Query Hints

Query hints are a way to tell
the optimizer what to do, regardless of what the optimizer might have done
originally. A few popular query hints
are KEEP PLAN, MAXDOP, OPTIMIZE FOR, and RECOMPILE. For instance, MAXDOP will override the
configured instance max degree of parallelism.
RECOMPILE will cause SQL Server to discard the query execution plan
after the query has completed as opposed to persistently storing it for later
use. Please see BOL for a full list of
Query Hints and corresponding explanations.
All of these query hints are probably fair game on the exam, so a
cursory knowledge of what they do will benefit you.

STATISTICS IO

The set statement, SET
STATISTICS IO, is used to output statistics regarding disk activity for the
executed T-SQL queries. To see a working
example of this, execution the below T-SQL and view the Messages window to see the disk/cache activity:

useAdventureWorks2012;

go

setstatisticsioon;

select*

fromHumanResources.Department;

setstatisticsiooff;

This gives
us information such as scan count, logical reads (from the data cache/memory),
physical reads (from disk), read-ahead reads (read from disk to cache for
future page reads), and the LOB equivalents to the aforementioned statistics. This is a great way to see if a query or a
subset of queries is hitting the disk too often.

Join Types

There are three particular join
types the optimizer can choose to utilize:

Hash Join – this join takes the smaller
of the two sets to join and makes a hash table and fits that in the memory
grant. Then it takes the other set and
probes by computing a hash value for each row and comparing it to the hash
table. To see this join in action,
utilize the following query (notice a relatively small table that can easily
fit into memory as a hashed table):

useAdventureWorks2012;

go

select*

fromHumanResources.Departmentd

innerjoinHumanResources.EmployeeDepartmentHistoryedh

ond.DepartmentID=edh.DepartmentID;

The
execution plan should look like the following:

Merge Join – this join goes through the
inputted rows only once, and this can show performance gains through sorted
data:

useAdventureWorks2012;

go

select*

fromPerson.Personp

innerjoinPerson.BusinessEntitybe

onp.BusinessEntityID=be.BusinessEntityID;

The
execution plan will resemble the following:

Loop Join – this join does just as its
name states: one of the data sets will have every row of data iterated for each
row of the other data set: