Retry Logic for Transient Failures in Windows Azure SQL Database

Retry Logic for Transient Failures in Windows Azure SQL Database

If you wish to contribute to this page, use the
Edit tab at the top (sign-in required). If you wish to provide feedback for this documentation please either send e-mail to
azuredocs@microsoft.com or use the Comment field at the bottom of this page (sign-in required).

SQL Database queries can fail for various reasons – a malformed query, network issues, and so on. Some errors are
transient, meaning the problem often goes away by itself. For this subset of errors, it makes sense to retry the query after a short interval. If the query still fails after several retries, you would report an error. Of course, not all errors are
transient. SQL Error 102, “Incorrect syntax,” won’t go away no matter how many times you submit the same query. In short, implementing robust retry logic requires some thought.

The Windows Azure AppFabric Customer Advisory Team (CAT) together with the
patterns & practices team has developed a
library that encapsulates retry logic for SQL Database, Windows Azure storage, Azure Caching, and Service Bus. Using this library, you decide which errors should trigger a retry
attempt, and the library handles the retry logic. This tutorial will show how to use the library with a SQL Database.

Testing retry logic is an interesting problem, because you need to trigger transient errors in a repeatable way. Of course, you could just unplug your network cable, or block port 1433. (SQL Database uses TCP over port 1433.) But for this tutorial, I’ve opted
for something that’s easier to code: Before submitting a query, hold a table-wide lock, which causes a deadlock or a timeout. When the lock is released, the original query can be retried.

In this Article

This tutorial uses a trivial database for illustration. The sample project includes a SQL script to create the database. The script file is located in the root folder of the project
file, and is named CustomerOrders.sql.

In this lesson, you will add some basic SQL queries and then trigger a transient failure. In the next lesson, you will add retry logic.

Note: If you want to get directly to the retry logic, you can skip this lesson and go to lesson 3

Set the Connection String

Download the project files and open the solution named RetryLogicTutorial_Skeleton.sln. Right-click Form1.cs and select
View Code. Then add a SqlConnectionStringBuilder member variable to the
Form1 class.

This function is called from a background worker thread. The query returns a list of product names, which are passed to the UI thread by calling
BackgroundWorker.ReportProgress. The READCOMMITTEDLOCK clause blocks the query if the table is locked.

Add a LINQ Query

Now we’ll add a second query using LINQ. First, create a LINQ to SQL data model for the CustomerOrders database, as follows:

Add a new data connection:

If the Server Explorer window is not visible, on the View menu, click
Other Windows, then click Server Explorer.

This code creates a LINQ query that returns all of the customer orders.

Run the application and click the “ADO.NET Query” button and the “LINQ Query” button. You should see the query results populated in the UI.

Add a Blocking Transaction

One thing that’s tricky about retry logic is actually inducing a transient error for testing. For this purpose, we will intentionally cause a deadlock. Locate the function named
longTransaction_DoWork and add the following code:

This function starts a new SQL transaction and holds an exclusive lock on the Products table. It sleeps for 30 seconds before rolling back the transaction. Next, locate the function named
Deadlock and add the following code.

Run the application and click the “Deadlock” button. While the progress bar advances, click the “ADO.Net Query” or “LINQ Query” button. You should see an error message, SQL Error -2 (timeout) or SQL Error 1205 (deadlock).

Add a new class named MyRetryStrategy that implements the
ITransientErrorDetectionStrategy interface. This interface has a single method,
IsTransient, which takes an Exception object and returns
true if the exception represents a transient error:

The implementation shown here is meant only to illustrate the use of the library. But the basic pattern is typical: First, filter for exceptions of type
SqlException. Then look at the error numbers in the SqlException.Errors
collection. Return true for any error that should trigger a retry, and
false for all other errors.

Next, we'll modify the two queries to use MyRetryStrategy.

Add Retry Logic for ADO.NET

Modify the AdoQueryWorker_DoWork function as follows:

At the start of the function, create a new instance of the MyRetryPolicy class.

Change the SqlConnection.Open call to SqlConnection.OpenWithRetry.

Change the SqlCommand.ExecuteReader call to SqlCommand.ExecuteReaderWithRetry.

The RetryPolicy<T> class implements the retry logic. The parameter
T must be a type that implements ITransientErrorDetectionStrategy. In the
RetryPolicy constructor, you set the maximum number of retry attempts. Optionally, you can also set the retry interval, or use an exponential backoff.

The OpenWithRetry method is an extension method, defined in the CAT library, that adds retry logic to the standard ADO.NET
SqlConnection.Open method. If an exception is thrown while opening the connection, the
RetryPolicy object passes the exception to ITransientErrorDetectionStrategy::IsTransient. If
IsTransient returns true, the RetryPolicy waits for the interval and then retries, up to the maximum retries. Note that this blocks the calling thread, so avoid doing this on a UI thread.

Similarly, ExecuteReaderWithRetry adds retry logic to the
SqlCommand.ExecuteReader method. Extension methods are also defined for
ExecuteNonQuery, ExecuteScalar, and ExecuteXmlReader. When using these methods, you should always consider whether a particular SQL command is safe to retry.

Add Retry Logic for LINQ to SQL

Direct calls to ADO.NET are fairly straightforward. But many applications use frameworks such as WCF or LINQ to SQL, which abstract the underlying database calls. For this type of scenario, the CAT retry library provides a way to wrap a block of code into
a scope that can be retried as a unit. To see how this works, modify the
LinqQueryWorker_DoWork functions as follows:

The RetryPolicy.ExecuteAction method takes a lambda expression. The code in the lambda expression is executed at least once. If a transient error occurs, the
RetryPolicy object retries the entire code block. You should place exception handlers outside of the
ExecuteAction statement, and not inside the statement, so that the retry policy gets the first look at any exceptions.

Replace MyRetryStrategy

The implementation of MyRetryStrategy that was shown here is only to demonstrate the retry API. Here are the main transient errors that you should consider in production code:

Error Number

Description

20

The instance of SQL Server does not support encryption.

64

An error occurred during login.

233

Connection initialization error.

10053

A transport-level error occurred when receiving results from the server.

10054

A transport-level error occurred when sending the request to the server.

10060

Network or instance-specific error.

40143

Connection could not be initialized.

40197

The service encountered an error processing your request.

40501

The server is busy.

40613

The database is currently unavailable.

The CAT retry library contains a class named SqlAzureTransientErrorDetectionStrategy that you can use, either as-is, or as a starting point for your own implementation of the
ITransientErrorDetectionStrategy interface.