Performance Results

Here is a comparison of the performance of PostgreSQL's COPY and pg_bulkload.
Performance was measured with basic-tuned PostgreSQL server.

Table definition

Customer table in DBT-2 benchmark, that is an implementation of TPC-C.

Index definition

There are 2 indexes.
The first one is a primary key with one ascending integer column.
The second one is a non-unique index with one random integer column.

There are the following measurement patterns.

Initial data loading to an empty table for 4GB of data

Appended data loading to a table with 4GB of data for 1GB of new data

Performance efficiencies by maintenance_work_mem and FILTER features

Result 1: Initial data loading

Pg_bulkload on WRITER = PARALLEL mode can load data with an almost half time against COPY.
In PARALLEL mode, performance will be improved on multi-CPU server because reading an input file and writing rows to a table are done with two processes.

COPY also can be more fast by loading into table without indexes and create indexes after it.

Initial Load (4GB)

Item

Duration

Duration comparison

COPY with indexes

500 sec

-

COPY without indexes+ CREATE INDEX

333 sec (229 sec + 51 sec+ 53 sec)

66.7 %

pg_bulkload (DIRECT)with indexes

334 sec

66.8 %

pg_bulkload (PARALLEL)with indexes

221 sec

44.2 %

Result 2: Appended data loading

Pg_bulkload on WRITER = PARALLEL mode can load data with an almost half time against COPY in this case also.
COPY without indexes is not faster than COPY with indexes. Because it has to create indexes for total records of the table from initial.

Appended Load (1GB)

Item

Duration

Duration comparison

COPY with indexes

140 sec

-

COPY without indexes+ CREATE INDEX

187 sec (62 sec + 60 sec + 65 sec)

133.6 %

pg_bulkload (DIRECT)with indexes

93 sec

66.4 %

pg_bulkload (PARALLEL)with indexes

70 sec

50.0 %

Result 3: Influence from parameters and features

The maintenece_work_mem, PostgreSQL parameter, affects the performance of pg_bulkload.
The duration becomes almost 15 % shorter if this parameter is changed from 64 MB to 1 GB.

FILTER feature transforms input data in various operations, but it's not free.
The actual measurement shows the loading time is increased to almost 240 % with SQL functions and almost 140 % with C functions.