Category Archives: Performance

Last week at the TechCon2013, sponsored by the SQLBangalore User Group and other User Groups in Bangalore, I did a talk on how to optimize your backup/restore. I focused on the three main aspects of Backup/Restore and talked about some of the things which can be done to optimize them

Read from Disk

Store in Memory

Write to Disk/Tape

For the Demos, I used 3 databases as described.

Database 1

Size – 12 GB

DB Name – AdventureWorks2012SingleFile

Number of Data Files: 1

Disk Allocation Size: 4KB

Database 2

Size – 12 GB

DB Name – AdventureWorks2012MultiFile

Number of Data Files: 4 (spread on 4 different physical disk)

Disk Allocation Size: 64KB

Database 3

Size – 4 GB

DB Name – AnotherSingleFileDatabase

Number of Data Files: 1

Disk Allocation Size: 64KB

The slide deck for the presentation is included below.

Optimizing Reads

To optimize reads, I focused mainly on the following

Reading from a single MDF file vs. reading from multiple Data files

Using higher disk allocation unit, I would be writing another post on effect of disk allocation unit size to SQL backup.

All backups were performed to a Null device. The idea was to test the read performance. I saw the following backup performance for the databases mentioned above.

We got better write performance when using a 64K allocation unit disk and when using Multiple Files. Further performance improvement was observed with Compression and by increasing the MaxTransferSize for the backups.

Optimizing Memory

To optimize reads, I focused mainly on changing the BufferCount option for the backups. This would help create more buffers in the memory for the buffer.

Total Memory used = BufferCount * MaxTransferSize

While i did not have a demo for the performance improvement when increasing the number of buffer, I had a demo on the side affect of increasing the BufferCount/MaxTransferSize to be very high.

1:backupdatabase [AnotherSingleFileDatabase]

2:todisk = 'C:\AnotherSingleFileDatabase.bak'

3:WITH COMPRESSION, MAXTRANSFERSIZE = 2097152, BUFFERCOUNT = 3000

4:Go

5:-- Error Message

6: --There is insufficient system memory in resource pool 'default'to run this query.

If we increase the BufferCount and the MaxTransferSize to be very high, we would get into memory issues on the server.

SQL Server 2012, introduces a new Database level option “Target_Recovery_Time” to better control and predict the time it takes during Recovery for the Database. The reason for this option was simple, the “Recovery Interval” option in sp_configure was really not reliable enough.

Before we talk about the new option, lets talk a bit about the older option.

With the Recovery interval option (set in minutes, default is 0), SQL Server would schedule the Database checkpoints in such a way that the time taken to recover the database is similar to what is specified in the Recovery Interval setting. What happens in the background is that SQL Server tries to schedule the checkpoint based on the number of DML operations that are taking place, but fails to take into consideration the number of DB Pages being modified by these operations.

As shown in the Video, SQL Server first calculates the Checkpoint interval for the Database number of DML operations and the value of “Recovery Interval” option.

When the first transaction completes (which modifies about 160 pages) and the subsequent fires, we flush all these 160 pages to the disk an move the Min_Recovery_LSN forward to the checkpoint LSN. This Min Recovery LSN is the what is used during the Database recovery, to Predict what operations are to be repeated in the Redo Phase of Recovery. During the Second Transaction (which is modifying close to 12K pages), if there were no problems, during checkpoint we would have flushed these 12K pages to the disk, which would result in a spike in the Disk Write Activity. Assuming that there was a crash, just before the Checkpoint happened (the transaction did commit), because of the large amount of pages which needs to be loaded and modified, the time to recover the database might vary and would be unpredictable.

The current Checkpoint algorithm induces the following problems, which are indeed the reason why the new Database was introduced.

Massive spikes in the Disk Write activity, during checkpoint.

Unpredictable recovery times for the Database.

The idea was to smooth out the disk write activity, so that we no longer see the massive spikes and to also have better predictability of the recovery time of a database.

First it introduces a background task to continuously flush the Dirty buffers from the disk. What SQL actually does is it calculates a min number of Dirty buffers it would keep. Every time this threshold is exceeded, the background process (called the background Recovery writer) kicks in and flushes the dirty buffers to the disk.

Continously move the Min_Recovery_LSN forwards to the LSN, which was most recently flushed. Since the Min_Recovery LSN is continuously moving forward, in case of a crash happens, the amount of work done during the Recovery would be small.

The video, below illustrates how this is happens.

With the new setting, the writes are no longer in spikes (smoothens out because of the background recovery writer) and the amount of work which needs to be done during the Recovery Phase is also reduced, providing more accurate estimations of the Recovery Time for the Database.

Word of Caution

If you set the Target_Recovery_Time to a very small number (like 1 sec) on an OLTP environment, SQL Server might just throttle the disk with the massive amount of Write operations which needs to be done as part of the Background Recovery Writes. Set this option, only after carefully examining the workload pattern and your SLA commitments. Most often the older “Recovery Interval” option from sp_configure works well.

In the past, if you have been plagued with Slow performance of your data warehouse queries, there is some good news for you. With SQL Server 2012, we can now create ColumnStore Indexes, which would improve your query performance significantly. ColumnStore indexes are targeted for queries which use a Aggregations over columns or a set of columns.

Traditional SQL Server Approach

SQL Server has traditionally been using a row store based storage. Under this approach the complete data row is stored on a Data Page, and is read or written to as a complete unit. As shown below, the Row is Stored on the page

In the traditional approach, lets say if we had a table, with the given definition

create table t1 ( a int, b int, c datetime, d varchar(300) )

Assume we are running the following query against the table and that the table has about 1 billion records on approximately 40 million pages.

select SUM(a),AVG(b), MAX(c) from t1

In this case, we just need to do a sum of the values of column a. Given the row store approach, we would need to read the entire row (all the columns) to execute this query, which means we would need to read all the 40 million pages for this query and then perform the aggregation operation.

How ColumnStore Index Helps

When we create a column store index on the table, it changes the way how data is organized. Data for each column which is included in the ColumnStore index is arranged separately. For example if we created a ColumnStore index on the above table on the columns (a, b and c). The columns would be organized independent of each other on different set of pages. as shown below.

Additionally, ColumnStore uses page compression by default. Most of the column values are similar if not same, and hence can greatly benefit from compression.

Now, with this arrangement, if we run the above mentioned query, we would only need to read pages for Column a, which in the above case would be only about ~300K(assuming 40% compression) of pages to read and sum the values of a, then another ~300K(assuming 40% compression) pages for the average of b and finally ~500(assuming 50% compression) pages to calculate the Max of c.

As mentioned, this type of organization helps queries where we are returning only some of the columns from a table. For queries where we need to return all the columns in a row, Row-Store approach works better. Typically, OLTP workloads benefit from row storage organization of data, while warehouse queries would benefit from ColumnStore index.

Restrictions on ColumnStore Indexes

The MSDN Article describes the restrictions or limitations to creating ColumnStore indexes. One thing to keep in mind, that when u create a ColumnStore index on a Table, you cannot Update the table. In Order to update the table, the quickest way would be to drop the column store index and then perform your insert, update, deletes. Once you are done with the DML’s, the ColumnStore Index can be recreated.

Example

I create two tables with the same schema but one with only a ColumnStore Index and another with a Clustered Index and a non clustered index. Both the table have about 7.7 million records.

This tables have been created on the AdventureWorks2012 sample database, which can be downloaded from here.

TableWithoutColumnStoreIndex

1:use AdventureWorks2012

2:go

3:

4:

5:select * into dbo.TableWithoutColumnStoreIndex from Sales.SalesOrderDetail

I executed the below mentioned query on the two table to compare the performance. In the Query below, I am just substituting the Name of the tables([TableWithColumnStoreIndex], or [TableWithoutColumnStoreIndex]) While executing.

The Merge statement in SQL Server 2008 can be used to perform multiple inserts, update and delete operations. Assume for example, in your organization you have a Source Table and a destination table. Everyday at the end of the business, data from the Source table is appended to the Target table. With SQL 2000/SQL 2005, the only way of doing this optimally (assuming that the tables are very large) was to find the Delta of all the changes in the source table and then update the same in the target table. The problem was that you would have to write your own logic or use other SQL Features like SSIS or CDC for finding the delta and then updating the target table.

With the Merge statement you can do the same without much hassles. We can also use the Merge statement in an SSIS package or a job to automate the entire sequence.

Using the MERGE Statement

Syntax for the MERGE statement is simple to understand. First we have the Target table specified by the INTO clause, followed by the Source Table specified with the USING clause. Then we have the Join parameters. The join parameters are important, because this is what will govern which records are to be inserted or deleted or updated in the target table. Then we have the WHEN clause to define the criteria for INSERT, UPDATE and DELETE.

For example, assume that your organization maintains 2 tables, one for daily purchases and another for Reporting which has the last purchase date and the total amount of purchases made by each customer.