Optimizing SQL Strategies

How many of you have come across SQL code that, for whatever reason, wasn’t written as good as it could have been? I’m not talking about going into the inner workings of your database’s engine and milking every possibly nanosecond off the execution plan.

I’m talking about code where you see something more like the following. This code represents about half the update statements (the other half left out for brevity). The stg1Table table being updated had several million records with no indexes. Can anyone guess before reading any further why my jaw dropped when I saw this?

The Bad SQL Code

Transact-SQL

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

-- beginning of the updates

UPDATEstg1Table

SETAccType='RE'

WHERECHARINDEX('Romeo',CatchPhrase)>0

UPDATEstg1Table

SETAccType='TL'

WHERECHARINDEX('Tango',CatchPhrase)>0

UPDATEstg1Table

SETAccType='VA'

WHERECHARINDEX('Victor',CatchPhrase)>0

UPDATEstg1Table

SETAccType='WC'

WHERECHARINDEX('Whiskey',CatchPhrase)>0

UPDATEStg1Table

SETAccType='BT'

WHERECHARINDEX('Bravo',CatchPhrase)>0

UPDATEStg1Table

SETAccType='SC'

WHERECHARINDEX('Sierra',CatchPhrase)>0

UPDATEStg1Table

SETAccType='EC'

WHERECHARINDEX('Echo',CatchPhrase)>0

-- : more similar updates omitted for brevity

-- : This was unbelieveable!

I constantly come across SQL code where performance is obviously not considered. So that’s when this curious consultant decided to document some basic Optimization SQL Strategies in hopes of helping some SQL coders “lift their game”. 🙂

With that, here we go in no particular order:

Whenever possible, DELETE first, UPDATE second, INSERT third

I came across a stored procedure that was operating on a table with approximately 26 columns and several million records. There was no index on the table other than the primary key which was just an IDENTITY column. The stored procedure was run approximately once per week. With each run, here’s what it was coded to do:

INSERT several thousand records marking them as “new”.

DELETE “expired” records

UPDATE “old” records

The records were mutually exclusive of each other. That is, any records that were “new” wouldn’t be affected by the UPDATE or DELETE operation.

Ask yourself why this isn’t the best approach. Where is the inefficiency? Answer: in the order of operations.

By INSERTing several thousand new records first, that means the database engine has to search through that many more records to find those to DELETE and UPDATE later. If the INSERT were performed last, that obviously saves the database engine from having to scan through all the potential new records that are being INSERTed.

Therefore, you should always DELETE first whenever possible as this will present the database engine with less records to search when performing subsequent UPDATEs.

Next, UPDATE current records before performing an INSERT whenever possible because again, that’s less records the database will have to scan through.

Finally, perform your INSERT last whenever possible.

Use @TableVars for small amounts of table data instead of #temp tables

They’re small. Fast. Require zero disk I/O.

Here’s the perfect example I came across in numerous stored procedures for one client:

Unnecessary SQL #temp table

Transact-SQL

1

2

3

4

5

6

7

8

9

10

11

CREATETABLE#DaysOfWeek

(

[Day]char(3)

)

INSERTINTO#DaysOfWeek([Day])VALUES('Sun'),('Tue'),('Fri')

--

-- a few queries with joins on #DaysOfWeek

--

DROPTABLE#DaysOfWeek

I promptly modified the code as follows, which actually increased the overall stored procedure runtime performance:

Converted to a @Table var

Transact-SQL

1

2

3

4

5

6

7

-- Replaced the #table with @table! Why waste time on

-- disk I/O for 3 records?? Seriously!

DECLARE@DaysOfWeekTABLE

(

[Day]char(3)

)

INSERTINTO@DaysOfWeek([Day])VALUES('Sun'),('Tue'),('Fri')

Eliminate Joins On Massive Tables When Most Columns Aren’t Used

I was modifying a stored procedure which was updating fact tables in a data warehouse in Database1. This stored procedure had several joins to a significantly larger, staging table (3 dozen columns, several million records, no indexes on the columns being joined on or selected) in Database2 on the same server.

In this scenario, I managed to shave several minutes off the stored procedure runtime by creating a smaller #temptable from the joined table in Database2, using only the 3 columns that were utilized. I then created a CLUSTERED index on the one column in the #temptable that was contained in the WHERE clause.

Here’s the old code:

Old SQL Code

Transact-SQL

1

2

3

4

5

6

7

8

9

10

11

12

-- This was the original code.

-- Several queries similar to this where only

-- the 3 columns from [Database2].[dbo].DB2Table1

-- are used out of the 36.

INSERTINTOOurDataTable

SELECTDISTINCT

CAST(J.idasint)asid,

cast(J.codeasCHAR(1))ascode

FROM[Database2].[dbo].DB2Table1ASJ–has36columnsandmillionsofrows

LEFTOUTERJOIN[Database1].[dbo].DB1Table1ASDONJ.id=D.id

WHERED.R_ManISNULL

ANDJ.branch_codeNOTin('44','61','63','64')

The new code thus becomes:

New improved SQL code

Transact-SQL

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

-- Added for performance gains.

-- Only get the 3 columns we're concerned with.

-- Takes just a few seconds

select

CAST(Idasint)asId,

cast(Codeasvarchar(1))asCode,

cast(Branch_Codeasvarchar(2))asBranch_Code

into#tempTable

FROM[Database2].[dbo].DB2Table1ASJ

-- Create an index on the one column

-- that's used in the WHERE clauses

CREATENONCLUSTERED

INDEX[UIX_tempTable]ON[#tempTable]([Branch_Code])

WITH

FILLFACTOR=100

,STATISTICS_NORECOMPUTE

ON[PRIMARY]

-- Now do what we need to do joining

-- on the smaller #temp table.

-- It runs significantly faster.

INSERTINTOOurDataTable

SELECTDISTINCT

CAST(J.idasint)id,

cast(J.codeasCHAR(1))ascode

FROM#tempTableASJ-- for performance. Smaller table with only the columns we need + an index

LEFTOUTERJOIN[Database1].[dbo].DB1Table1ASDONJ.id=D.id

WHERED.R_ManISNULL

ANDJ.branch_codeNOTin('44','61','63','64')

See what I’ve done? It made a HUGE difference in the run time!

As an exercise to the reader, you can get the SQL code to execute faster if you add a WHERE clause to the SELECT statement when creating the #tempTable.

If a #temptable is used once, consider putting it in a join in the query to increase parallelism

Conversely, I’ve found #temptables in code where it hasn’t been advantageous. More disk I/O and resources were wasted, as well as limiting the database server’s ability to implement parallelism.

Here’s an example of code I rewrote:

Example SQL code rewritten

Transact-SQL

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

-- They created a #temptable

SELECTColA,ColB,ColC,ColD,Col4,Col5

INTO#table2

FROM[Database2].[dbo].SomeTable

-- Lots of SQL code here, then

-- a join query. Only place where #table2 is used.

SELECTa.Col1,a.Col2,a.Col3,b.ColA,b.ColB,b.ColC,b.ColD

FROMTable1a

innerjoin#table2bona.Col5=b.Col5anda.Col4=b.Col4

innerjoinTable3cona.Col1=c.Col1anda.Col3=c.Col3

-- Some more miscellaneous SQL was here

DROPTABLE#table2

-- -- -- -- -- -- -- -- -- -- -- -- -- --

-- So instead of doing a single query to create #table2,

-- which forces the code to execute in serial,

-- and also wasting time with the associated explicit Disk I/O,

-- rewrite the query as follows to increase the server's ability

-- to implement parallelism:

SELECTa.Col1,a.Col2,a.Col3,b.ColA,b.ColB,b.ColC,b.ColD

FROMTable1a

innerjoin

(

-- While the server is performing this query for this join...

SELECTColA,ColB,ColC,ColD,Col4,Col5

FROM[Database2].[dbo].SomeTable

)bona.Col5=b.Col5anda.Col4=b.Col4

-- ...it can perform this join between Table1 and Table3

innerjoinTable3cona.Col1=c.Col1anda.Col3=c.Col3

Indexes on #temptables

This trick has to be determined on a case-by-case basis because basically you have to weigh up the time difference between:

Creating and loading the #temptable; then performing the queries AND

Creating and loading the #temptable; creating the index on the necessary columns; then performing the queries.

Sometimes the time and associated costs for the server to perform #1 is practically the same as performing #2. This especially happens when the #temptable as a relatively low number of records it takes just as much time to scan the table as it would to create then scan the index for the values.

My general rule of thumb for considering whether or not to do this is:

The #temptable has at least 100,000 records in it

The #temptable is in several JOINs.

Use LIKE instead of LEFT() or SUBSTRING() in WHERE clauses when Indexes are present

Unless absolutely necessary. Why? LEFT, SUBSTRING, and similar functions are just that – functions. They do not allow the database engine to take advantage of indexes.

SQL

Transact-SQL

1

2

3

4

5

6

7

8

9

10

11

12

13

14

-- Do not do this if there is an index on Column3

SELECTColumn1,Column2

FROMTable1

WHERELEFT(Column3,1)='a'

-- Do not do this either if there is an index on Column3

SELECTColumn1,Column2

FROMTable1

WHERESUBSTRING(Column3,1,1)='a'

-- Instead, use LIKE

SELECTColumn1,Column2

FROMTable1

WHEREColumn3LIKE'a%'

As a hint, if you’re going for your SQL Server MCSE Data Platform certification, questions similar to the above scenario were on the SQL Server 2012 exams, so you should know this. 🙂

Don’t Sort Data in a Table Contrary to a Clustered Index

Obviously pay this no mind if the clustered index is solely on an identity column. Otherwise, it generally just wastes time.

Let’s say you have two columns in a table forming a clustered index as follows:

SQL

Transact-SQL

1

2

3

4

5

6

CREATECLUSTEREDINDEXIX_TheIndex

ONTestTable

(

NameDesc,

DateAddedASC

)

And if you’re lucky like me (let me roll my eyes for you at that sarcasm), you find an insert statement similar to the following:

Filter by Subqueries in JOINs instead of WHERE clauses

The majority of the time, filtering data by using a subquery in the WHERE clause is slower than filtering by JOINing to a subquery. I believe this is because an SQL server engine can optimize and implement parallelism within JOIN clauses more efficiently than in WHERE clauses.

SQL

Transact-SQL

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

--Most of the time, a query structured like this:

SELECTa.Col1,b.Col2

FROMTable1a

INNERJOINTable2bona.Col4=b.Col4

INNERJOIN

(

selectCol3

fromTable3

)conc.Col3=a.Col4

--executes faster than a query structured like this:

SELECTa.Col1,b.Col2

FROMTable1a

INNERJOINTable2bona.Col4=b.Col4

WHEREa.Col4in

(

selectCol3

fromTable3

)

Combine multiple UPDATE statements

From my opening example. Yes, I seriously came across this production code:

SQL example of bad use of multiple updates

Transact-SQL

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

-- beginning of updates

-- Each UPDATE requires an entire table scan!

UPDATEstg1Table

SETAccType='RE'

WHERECHARINDEX('Romeo',CatchPhrase)>0

UPDATEstg1Table

SETAccType='TL'

WHERECHARINDEX('Tango',CatchPhrase)>0

UPDATEstg1Table

SETAccType='VA'

WHERECHARINDEX('Victor',CatchPhrase)>0

UPDATEstg1Table

SETAccType='WC'

WHERECHARINDEX('Whiskey',CatchPhrase)>0

UPDATEStg1Table

SETAccType='BT'

WHERECHARINDEX('Bravo',CatchPhrase)>0

UPDATEStg1Table

SETAccType='SC'

WHERECHARINDEX('Sierra',CatchPhrase)>0

UPDATEStg1Table

SETAccType='EC'

WHERECHARINDEX('Echo',CatchPhrase)>0

-- and yet more similar update statements were here

Rewritten by doing some simple optimization SQL magic:

SQL rewritten to use one UPDATE statement

Transact-SQL

1

2

3

4

5

6

7

8

9

10

11

12

-- Rewrite the code; just do the UPDATE once!

UPDATEstg1Table

SetAccType=Case(CatchPhrase)

WhenLIKE'%Romeo%'Then'RE'

WhenLIKE'%Tango%'Then'TL'

WhenLIKE'%Victor%'Then'VA'

WhenLIKE'%Whiskey%'Then'WC'

WhenLIKE'%Bravo%'Then'BT'

WhenLIKE'%Sierra%'Then'SC'

WhenLIKE'%Echo%'Then'EC'

-- remaining When clauses

End

‘nuf said. 🙂

I hope you enjoyed this article!

Do you have any SQL optimizing tips and techniques? If so, leave a comment and share with your fellow readers! Let’s make the SQL world a better, happier place for everyone. 🙂