However, that fails with the NpgsqlException: "ERROR: 42703: column "value" does not exist". After reading this question, I thought that perhaps I have to pass a DataTable object instead of an object array:

However, this fails with the exact same exception. How can I perform a bulk-insert and get the resulting serial ids out of Dapper over Npgsql?

I did note that the casing of the exception does not match the column name, but I am certain that I have quotes around the table and column names, so I'm not certain why it says "value" instead of "Value" in the exception. Just thought I would mention it in case it is related to the error somehow, as it is easy to overlook casing.

The issue is that I need to be able to insert hundreds (perhaps thousands in the near future) of rows per second into "MyTable", so waiting for this loop to iteratively send each value to the database is cumbersome and (I assume, but have yet to benchmark) time consuming. Further, I perform additional computation on the values that may or may not result in additional inserts where I need a foreign key reference to the "MyTable" entry.

Because of these issues, I am looking for an alternative that sends all values in a single statement to the database, in order to reduce network traffic and processing latency. Again, I have NOT benchmarked the iterative approach yet... what I am looking for is an alternative that does a bulk insert so I can benchmark the two approaches against each other.

Accepted Answer

Ultimately, I came up with four different approaches to this problem. I generated 500 random values to insert into MyTable, and timed each of the four approaches (including starting and rolling back the transaction in which it was run). In my test, the database is located on localhost. However, the solution with the best performance also requires only one round trip to the database server, so the best solution I found should still beat the alternatives when deployed to a different server than the database.

Note that the variables connection and transaction are used in the following code, and are assumed to be valid Npgsql data objects. Also note that the notation Nx slower indicates an operation took an amount of time equal to the optimal solution multiplied by N.

I was shocked that this was only 2x slower than the optimal solution, but I would expect that to get significantly worse in the real environment, since this solution requires sending 500 messages to the server serially. However, this is also the simplest solution.

This is getting better, but is still less than optimal because we can only queue as many inserts as there are available threads in the thread pool. However, this is almost as simple as the non-threaded approach, so it is a good compromise between speed and readability.

Approach #4 (134ms = 1.7x slower): Bulk inserts

This approach requires the following Postgres SQL be defined prior to running the code segment below it:

There are two issues that I have with this approach. The first is that I have to hard-code the ordering of the members of MyTableType. If that ordering ever changes, I have to modify this code to match. The second is that I have to convert all input values to a string prior to sending them to postgres (in the real code, I have more than one column, so I can't just change the signature of the database function to take a double precision[], unless I pass in N arrays, where N is the number of fields on MyTableType).

Despite these pitfalls, this is getting closer to ideal, and only requires one round-trip to the database.

-- BEGIN EDIT --

Since the original post, I came up with four additional approaches that are all faster than those listed above. I have modified the Nx slower numbers to reflect the new fastest method, below.

Approach #5 (105ms = 1.3x slower): Same as #4, without a dynamic query

The only difference between this approach and Approach #4 is the following change to the "InsertIntoMyTable" function:

The only downside to this is the same as the first issue with Approach #4. Namely, that it couples the implementation to the ordering of "MyTableType". Still, I found this to be my second favorite approach since it is very fast, and does not require any database functions to work correctly.

Approach #7 (80ms = VERY slightly slower): Same as #1, but without parameters

This is my favorite approach. It is only marginally slower than the fastest (even with 4000 records, it still runs under 1 second), but requires no special database functions or types. The only thing I don't like about it is that I have to stringify the double precision values, only to be parsed out again by Postgres. It would be preferable to send the values in binary so they took up 8 bytes instead of the massive 20 or so bytes I have allocated for them.

Approach #8 (80ms): Same as #5, but in pure sql

The only difference between this approach and Approach #5 is the following change to the "InsertIntoMyTable" function:

This approach, like #5, requires one function per "MyTable" partition. This is the fastest because the query plan can be generated once for each function, then reused. In the other approaches, the query must be parsed, then planned, then executed. Despite this being the fastest, I didn't choose it due to the additional requirements on the database side over Approach #7, with very little speed benefit.