Understand the retry policies in Episerver

This is pretty internal stuffs – you will almost never use it yourself (you technically can, but there is no guarantee of backward compatibility whatsoever). I learned the stuff recently and found it interesting enough to share.

No matter how powerful your server(s) are, and how reliable SQL Server is, there are chances that a query can result in error. Hell, there are bunches of error codes that can be returned (and I never remember one, Google made a pretty bad habit of forgetting everything). But they can be divided into two categories:

The un-retry-able errors, such as a grammar error in your stored procedure, or a parameter is missing, or mismatch. These errors need human-interaction to correct them before continuing.

The retry-able errors, which means if you re-run the query, there is a chance that it would eventually succeed. Compared to the un-retry-able errors, these are in smaller number. A deadlock can be considered as retry-able (with the hope that the deadlock conditions are no longer present, the other threads finished their works and released the locks). A timeout error can be considered as retry-able (with the hope that the server can run the query faster, this time). In the end, your query can succeed, after some tries.

You might ask, why retry? The question is, why not? Retrying the query can make it works, and as it’s transparent from end-users perspective, it might provide a smoother experience for them. Say, they are trying to place an order. Because of your highly successful campaign, there are a lot of customers trying to do the same thing at the same time, resulting in some occasionally deadlocks. You’ll be much happy to silently retry and have another order in place, than showing “Something went wrong” to the customers. Of course, the previous errors will still be logged for further analytic, but you’re good for now.

And here the list of error codes which can be retried by Episerver database-handling:

121: A transport-level error has occurred when receiving results from the server: The semaphore timeout period has expired. ,

1205: Transaction (Process ID) was deadlocked on resources with another process and has been chosen as the deadlock victim. Rerun the transaction ,

10928: Resource ID: %d. The %s limit for the database is %d and has been reached. For more information, see http://go.microsoft.com/fwlink/?LinkId=267637.

10929: Resource ID: %d. The %s minimum guarantee is %d, maximum limit is %d and the current usage for the database is %d. However, the server is currently too busy to support requests greater than %d for this database.

10053 : A transport-level error has occurred when receiving results from the server. An established connection was aborted by the software in your host machine.

10054: A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 – An existing connection was forcibly closed by the remote host.)

40143: The service has encountered an error processing your request. Please try again.

40197: The service has encountered an error processing your request. Please try again.

40501: The service is currently busy. Retry the request after 10 seconds.

40540: The service has encountered an error processing your request. Please try again.

40613: Database XXXX on server YYYY is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of ZZZZZ.

And it’s not query only. Opening a connection to the database receive the same level of treatment:

-2: Timeout expired

20: The instance of SQL Server you attempted to connect to does not support encryption.

64: A connection was successfully established with the server, but then an error occurred during the login process

233: The client was unable to establish a connection because of an error during connection initialization process before login.

11001: The client was unable to establish a connection because of an error during connection initialization process before login.

10060: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 – A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)”}

Again, these stuffs are internal, so you should not try to change these error codes (you technically can, but there are no guarantee that they will work). The things you can try to change are the settings related to retry policies. They are all in <episerver.dataStore> section:

<episerver.dataStore>

<siteDataSettings>

<addsiteId="string"

connectionStringName="string"

retries="int"

retryDelay="timespan"

databaseQueryTimeout="timespan"/>

</siteDataSettings>

</episerver.dataStore>

By default, retries – the number of retries before giving up – is 5 and retryDelay – the timespan to wait before retrying – is 100ms. These settings apply for the connection specified by connectionStringName. (So technically for Commerce site you can have two sets of settings for the two databases)