How to Sync SQL Server Data with Azure Search

A common request we receive in Azure Search is how to extend data in SQL Server databases to Azure Search. This may be because people wish to offload the full-text workload from their on-premises SQL Server database to the cloud or they are simply looking to make use of Azure Search’s advanced search capabilities. In many circumstances, data changes need to be reflected in the search engine at close to real-time levels. This can be a challenging task because tracking rows that changed can be an expensive task computationally if not done properly. In this blog post, I want to explain how to use a capability in SQL Server called Integrated Change Tracking to efficiently sync data changes from SQL Server to Azure Search. Change tracking is an internal capability of SQL Server that can track changes (Inserts, Updates, and Deletes) that have been made to the user tables. It is also an incredibly efficient method of tracking and finding changes that has very low impact on the performance of your database. In this blog post, will make use of the sample found at Codeplex.

Requirements

This tutorial assumes you have access to:

SQL Server 2008 or higher

NOTE: If you are using the default database configuration and SQL Server Management Studio, connect to the server instance as (LocalDB)\v11.0

Configuring the SQL Server to Azure Search Sample

At this point you should have downloaded the sample project and opened it up in Visual Studio. In the sample Visual Studio project, you will need to add the connection information for your Azure Search service. Please also make note of the connection information for your SQL Server database in case it needs to be modified. Open up app.config and make changes to the SearchServiceName and SearchServiceApiKey values to reflect your Azure Search service and Azure Search Service API Key which can be found in the Azure Portal.

Adding Change Tracking to Your SQL Server Database

When you launch this application, a new database called “AzureSearchSyncTest” along with a table called Products is populated with data. Once this table is created, Change Tracking for this table will be enabled.

OPTIONAL: If you wish to try this in your own SQL Server database, there are two scripts in the \sql folder that you can use. Or simply change the connection information in the app.config file to point to your SQL Server instance.

One file that is worth reviewing is the add_change_tracking.sql file located in the \sql folder. Note that only two SQL Commands are required to turn on Integrated Change Tracking for this table:

ALTER DATABASE SyncTest SET CHANGE_TRACKING = ON

(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON);

ALTER TABLE Products ENABLE CHANGE_TRACKING

WITH (TRACK_COLUMNS_UPDATED = OFF);

The first command turns on change tracking in the database and tells SQL Server to retain change information for 2 days, after which the change data will be deleted to avoid taking too much database space. The second command tells SQL Server which table to track. For this demo, we have told SQL Server to only track rows that were changed and not to track column specific updates. An alternative approach is to enable change track on columns. You should do this if it makes more sense to send the columns that have changed rather than the entire row that changed. Keep in mind that this would increase the amount of storage allocated for change tracking, however, it might be worth it if you are making a lot of data changes and the changes are usually limited to a few columns in a row.

How it Works

Let’s take a closer look at this console application that does the synchronization from SQL Server to Azure Search. In the previous step, you would have opened the application in Visual Studio. If you have not done so, please open it now.

Detecting Changes

The ability to detect the changes efficiently in SQL Server is key to this application. Open the Program.cs file and move to the Main(string[] args) function. The first line we want to make note of is:

_lastVersion = -1;

SQL Server Integrated Change Tracking uses a Change Version which gets incremented every time a change is made to one of the tracked tables. Using this Change Version, you can ask SQL Server to send back the changes that have occurred since a specific Change Version. This _lastVersion variable is used to track the Change Version that was used when the previous sync successfully completed. In this case, we are running the app for the first time, so we set it to -1 which will tell the application to sync all of the data for the first synchronization. An optional enhancement you could make to this application is to store this _lastVersion value somewhere and load it when the application runs. That way you can pick up where you left off even if you close the application. The next lines of this function initialize change tracking in the SQL Server database as well as to create the Azure Search index which will receive the data. If we move to the while (true) loop we can see that this application will check for changes every 5 seconds [Thread.Sleep(5000)]. Within this loop we can see that a ChangeEnumeratorSql is created. This is the query that will be used to do a full upload of the data from SQL Server to Azure Search the first time through. Next, changeEnumerator.ComputeChangeSet(_lastVersion) is called. This call does quite a bit of work which you can see if you open the ChangeEnumerator.cs file and move to the ComputeChangeSet(Int64 lastVersion) function. First, it calls GetChangeSetVersion(con), which will ask SQL Server what the most recent Change Version is. We will remember this, because the next time we run through this, we want to use this value to find any changes that have happened since this point. Next, we see a call to EnumerateUpdatedDocuments(con, lastVersion), which gathers all of the data changes into an IEnumerable<Dictionary>. If you drill into this EnumerateUpdatedDocuments function you will notice the first time (where _lastVersion is -1) it will do the full select of the data. If the _lastVersion is > -1 it will only get the changes. You can see that the table CHANGETABLE is key to being able to get the changes. This table is an internally table maintained by SQL Server. You might also notice the following lines which say we only want to get the Inserts and Updates that have happened to the Products table:

sqlCmd += "and (CT.SYS_CHANGE_OPERATION = 'U' ";

sqlCmd += "or CT.SYS_CHANGE_OPERATION = 'I') ";

If you want to add the ability to sync deletes, you can also call this table in a similar way where you request CT.SYS_CHANGE_OPERATION = 'D'. SQL Azure does not currently support Integrated Change Tracking, so you would need to alter this application to implement a different change tracking method (e.g., rowversions + tombstones).

Pushing changes into Azure Search

Now that we have reviewed the method used for getting changes, let’s go back to the while (true) loop in the Main(string[] args) function of Program.cs. Picking up where we left off we can see a call to ApplyChanges(changes). This will take the ChangeSet received of the data that is to be uploaded to Azure Search. If you drill into this function you will see that this will upload the changes in batches of 999. It also uses an action called: "mergeOrUpload". This tells Azure Search that the data that is being received should be inserted if the document key does not exist and update the values in the corresponding document if the key does exist. A key field is used to uniquely identify a document in Azure Search. In our case, the field productID is uses as our key field. One optional enhancement you could make is if you are uploading new rows and you’re sure they are new, upload should be faster than merge and mergeOrUpload.

Scheduling of Sync

This is a very simple console application that uses a while loop to check for changes every 5 seconds and is run. In a production environment on premise, you would want to consider making this a windows service or implement some sort of scheduling job. If you are running this in the cloud (say against a SQL Server VM), a Webjob or a WebRole is probably the best way to implement this functionality.

Running the Application to Upload Data Changes from SQL Server to Azure Search

At this point we are ready to launch the application. You may wish to add a breakpoint in the Main function located within the Program.cs file so that you can step through the application. You should see text in the Console as follows:

Sync Processing Started...

Creating SQL Server database with Products table...

Enabling change tracking for Products table...

Uploading 294 changes...

Sync Complete, waiting 5 seconds...

Sync Complete, waiting 5 seconds...

Notice how the first execution of the application uploaded all 294 rows. At intervals of 5 seconds, the application will check for changes and then upload them. Let’s make a change to one of the rows and see what happens. While keeping the application running, connect to the SQL Server database AzureSearchSyncTest and execute:

UPDATE [Products] set Color = 'Green' where ProductID = 680

Go back to the console window and you should see the following message:

Uploading 1 changes...

The application found the one update and uploaded it to Azure Search. This will also work if you Insert a new row. Please note, Integrated Change Tracking will also track row deletions, however, this has not been added to this sample.

Verify Data in Azure Search Index

Now that we have data synchronizing up to your Azure Search Index, let’s query the index and make sure all of the data is there. To do this we will use Fiddler.

From Fiddler, choose the “Composer” tab and then choose the “Parsed” tab

Every time the application starts a _lastVersion value is set to -1. This tells the application to execute a full upload of data changes to Azure Search. You might want to store that last successful Change Version in your database and retrieve this when the application starts so that the application can pick up where it left off.

Consider modifying this console application into a Windows Service.

Please keep the feedback coming. Liam Cavanagh can be contacted at his blog or through twitter.