Performance impact: What is the optimal payload for SqlBulkCopy.WriteToServer()?

For many years, I have been using a C# program to generate the TPC-C compliant data for testing. The program relies on the SqlBulkCopy class to load the data generated as per the TPC-C specifications into the SQL Server tables. In general, the performance of this C# data loader is satisfactory. Lately however, I found myself in a situation where I needed to generate a much larger amount of data than I typically do and the data needed to be loaded within a confined time frame. So I was driven to look into the code more carefully to see if it could be significantly sped up.

Among other things, I became curious about the performance impact of the amount of data copied by the WriteToServer(DataTable) method. I’ll call this amount of data the WriteToServer payload in this post, or simply the payload.

I was a bit lazy when I first coded the C# data loader, and no knowing the full implication, I set the payload to the same value I assigned to the BatchSize property. That proves to be terribly inconvenient when it comes to tuning the loader’s performance.

By the way, in the Microsoft Windows SDK documentation, the descriptions for WriteToServer(DataTable)and BatchSize are as follows:

Number of rows in each batch. At the end of each batch, the rows in the batch are sent to the server. If UseExternalTransaction is not in effect, each batch of rows is inserted as a separate transaction.

Going off the topic a bit, in the BatchSize description, it says that, “At the end of each batch, the rows in the batch are sent to the server” This is at least confusing. I could be wrong, but the setting of BatchSize alone will not cause the SqlBulkCopy object to initiate any action of sending any rows over the wire to the server.

The question is how many rows should we have in the DataTable object before we send them over to the server with WriteToServer? In other words, how should we size the WriteToServer payload?

The most optimal setting no doubt depends on the configuration of the environment. However, intuitively there is overhead in sending data across the wire, and you would not want the payload to overhead ratio to be too low. On the other hand, you probably don’t want to wait for an excessive amount of time just for the DataTable object to be populated with a very large payload before you send its content. So there should be a happy medium somewhere, we hope.

Some tests should help give us a more precise idea

I used the TPC-C customer table for my tests. The customer table was defined as follows:

(

c_idint,

c_d_idtinyint,

c_w_idint,

c_firstchar(16),

c_middlechar(2),

c_lastchar(16),

c_street_1char(20),

c_street_2char(20),

c_citychar(20),

c_statechar(2),

c_zipchar(9),

c_phonechar(16),

c_sincedatetime,

c_creditchar(2),

c_credit_limnumeric(12,2),

c_discountnumeric(4,4),

c_balancenumeric(12,2),

c_ytd_paymentnumeric(12,2),

c_payment_cntsmallint,

c_delivery_cntsmallint,

c_datachar(500)

)

For each of the tests, three million rows were loaded into the customer table without any index , and the following row numbers were tested for the WriteToServer method:

2, 20, 100, 200, 2,000, 20,000, and 3,000,000

Three million rows amounted to about 2.2GB of data after they were all loaded into the customer table. This amount of test data was small compared to the amount of data I needed to load, but was sufficient for the test purpose, and it helped reducing the test time. In addition, for all the tests, SqlBulkCopy.BatchSize was set to 20,000.

The other test parameter considered was the number of threads in loading the data concurrently into non-overlapping ranges. I looked at loading the three million rows single threaded versus in 10 threads.

The results are in the following table:

Threads

Payload

(WriteToServer Row Number)

Load Duration (second)

1

2

>1440

1

20

~1440

1

100

397

1

200

300

1

2,000

289

1

20,000

278

1

3,000,000

270

10

2

1998

10

20

238

10

100

105

10

200

90

10

2,000

94

10

20,000

100

10

3,000,000

N/A

So, at least for my test environment, it does not look like a good idea for WriteToServer to send 20 or fewer rows to the server each time. We should stock up the payload a bit more before each trip.

The loader program was run on one of the nodes in a two-node cluster, and the SQL Server instance was on the other node. The network in between was 1Gbps. There was more than sufficient bandwidth and the latency was low. With extremely low payload (i.e. when the DataTable object only had 2~20 rows), the network bandwidth was barely used (4% or so).

A salient point to note from the test result is that once the payload crossed a certain point, for instance 200 rows in these tests, the exact payload size no longer mattered much. For instance, it made little difference whether it was 2000 rows or 20,000 rows.

Loading data concurrently did help to mask the penalty of a very low payload. For instance, when the data was loaded in a single thread, setting payload to 20 would cause the loader to take 1440 seconds to load the three million rows, whereas the same payload with 10 threads dramatically reduced the load time to 238 seconds.

In the next post, I’ll look at a few other implications of the payload setting more closely.

2) Does your network MTU window size forces packet fragmentation on the way to the server. Try the following command on the client: ping <server> -l 4096 –f . If the packets are getting fragmented on the way to the server, this command will fail with “Packet needs to be fragmented but DF set.” error