Monday, 13 February 2012

I've recently been looking at the rate of data migration into Crm Online using SSIS, and how this can be optimised. I started with a baseline rate of 12 records per second, and have so far improved this to 430 records per second, all using one client machine.

The easiest way to migrate data into Crm Online is via a synchronous, single-threaded process that writes one record at a time, so this was the baseline. It soon became very clear that the performance bottleneck in this scenario is network latency - i.e. the round-trip time for the network packets to make a request to the Crm Online server, and to receive a response back.

So, the challenge was to improve on this, which can be done by addressing each aspect of the simple scenario - i.e.

Synchronous v. asynchronous calls

Single or multi-threaded - either within one process, or multiple concurrent processes

Sending more than one record at a time

So far, I've not tested asynchronous calls, primarily because SSIS is stream-based, and I can't see a way to write out synchronous error output if using an asynchronous pattern. It would be possible to write out asynchronous error information, but for now that would involve too much code rewrites. In general, though, I would expect use of an asynchronous pattern would give similar performance benefits to the multi-threaded approach, though it may be possible to multiply the benefits by combining the approaches.

Multiple threadsSSIS controls the threading behaviour of a package, so rather than try for a multi-threaded single process, I went for running several instances of the same package concurrently, which you can do with the dtexec tool. There are two main aspects to making this work:

You will need to be able to partition the source data, so that each package instance submits different records. For my tests, I had an integer identity column on the source data, and used the SQL modulo operator (%) to filter on the remainder from an integer division. For 10 concurrent packages, the where clause was 'WHERE id % 10 = ?' with the '?' replaced by a package variable.

The package will not be able to reference any files, either as data sources, destinations or log files, as SSIS will attempt to get exclusive access to the files. So, I used a SQL Server source, and wrote log information to SQL via the SSIS SQL Log Provider

I tested 10 concurrent packages, and this gave between a 7-fold and 9-fold performance improvement.

Submitting multiple recordsThe Crm API is primarily designed around modifying one record at a time, with a separate request per record. However, CRM 2011 introduced the facility to pass multiple records, using the RelatedEntities property of a 'parent' entity. This allows you to build (in memory) a RelatedEntityCollection of multiple records, then attach this to one record, and submit this as one request.

There are two limitations to this approach:

The entities have to be related via a relationship in CRM.

The same data operation has to apply to the parent record, and the records in the RelatedEntityCollection

Initially I'd hoped to use the systemuser entity as the parent entity, as there is necessarily a relationship between this entity and any user-owned entity. However, this wouldn't work with limitation 2, as I wanted to update the systemuser, but create the related entities, and this doesn't work.

Instead, I had to make schema changes. I created a new entity (e.g. exc_batchimport), and a 1-N relationship with each entity that I wanted to import. Each request would then create 1 exc_batchimport record, and a configurable number of related records.

I tried various batch sizes, and had success up to a batch size of 1000, but failures with a batch of 5000. My initial view is that the failures come from the number of records, and not the total data size, but I've not tested this extensively.

This approach also gave significant performance gains, though only when network latency was a main performance factor - i.e. it helped a lot when connecting to Crm Online, but gave no appreciable benefit (and in some case, worse performace) when connecting to an On Premise CRM server. Most of the benefit came with a batch size of 10 records, though performance did continue to improve slightly if increasing the batch sizes up to 1000 records.

Performance resultsI did the tests running on Windows 2008 Server with moderate capacity:

A virtual server running via Hyper-V

One processor core allocated to the server, running at 2GHz

4 GB of memory

Server was running in a hosted environment in England, connecting to the EMEA Crm Online Data Centre

The tests were done writing 100000 new records to a custom entity, writing 2 text fields, and integer field, and an option set, and allowing CRM to generate the primary key values.

Concurrent packages

Batch Size

Time

Records / sec

1

n/a

832

12

10

n/a

1094

91

10

10

276

362

10

100

267

374

10

1000

233

429

ConclusionsThe main performance issue with modifying multiple records with Crm Online relates to network latency. I've successfully tested 2 complementary approaches which give a combined 35-fold speed improvement, and it may also be possible to gain further improvements with asynchronous calls.

The performance figures were for a custom entity. I'm intending to do further tests with the customer entities (account and contact), and the activity entities, as each of these require more SQL updates than a custom entity.

6
comments:

I worked on a project where we also imported using SSIS packages. We had a task to import aproximately 50 million records across 50 entities, and several entities had 10+ relationships and some entities had 50+ fields.We set up a multithreaded SSIS import job in which we allready had generated all GUID's and relationship GUIDs, instead of letting CRM doing it.We where able to import the 50 million records in 24 hours.

This is a good article & good site.Thank you for sharing this article. It is help us following categorize: healthcare, e commerce, programming, multi platform,inventory management, cloud-based solutions, it consulting, retail, manufacturing, CRM, technology means, digital supply chain management, Delivering high-quality service for your business applications,Solutions for all Industries,Getting your applications talking is the key to better business processes,Rapid web services solutions for real business problems,Web-based Corporate Document Management System,Outsourcing Solution,Financial and Operations Business Intelligence Solution,

Hi David,If we are to read data from CRM Online and write to another CRM Online instance, how can we use the 'Package Variable' on the fetch xml while reading data from CRM.Example: I have to read data from Contact entity where in I migrate bulk data on one instance, and then I run package every night to migrate data based on modfiedon being on or after the last run in the fetchxml. How can this be parameterized? Please suggest.

Are you tired of seeking loans and Mortgages,have you been turned down constantly By your banks and other financial institutions,We offer any form of loan to individuals and corporate bodies at low interest rate.If you are interested in taking a loan,feel free to contact us today,we promise to offer you the best services ever.Just give us a try,because a trial will convince you.What are your Financial needs?Do you need a business loan?Do you need a personal loan?Do you want to buy a car?Do you want to refinance?Do you need a mortgage loan?Do you need a huge capital to start off your business proposal or expansion? Have you lost hope and you think there is no way out, and your financial burdens still persists? Contact us (gaincreditloan1@gmail.com)

Feeds

Who I am

Professionally:I'm a founder member of Excitation Ltd, a Microsoft Gold Partner in the UK that specializes in Microsoft CRM, and I've been the technical lead in over 50 CRM implementations since the release of CRM 1.2.This is a personal blog, and any views expressed here do not necessarily reflect those of Excitation; sometimes they will, but that should be treated as a happy coincidence rather than a normal state of affairs.

Personally: We'll see if I get onto this in the blog; if so, I expect it will include some permutation of mountains, snow and gravity