About gpfdist Setup and Performance

About gpfdist Setup and Performance

Consider the following scenarios for optimizing your ETL network performance.

Allow network traffic to use all ETL host Network Interface Cards (NICs)
simultaneously. Run one instance of gpfdist on the ETL host, then
declare the host name of each NIC in the LOCATION clause of your
external table definition (see Creating External Tables - Examples).

Divide external table data equally among multiple
gpfdist instances on the ETL host. For example, on an ETL system
with two NICs, run two gpfdist instances (one on each NIC) to
optimize data load performance and divide the external table data files evenly between
the two gpfdists.

Note: Use pipes (|) to separate formatted text when you submit files to
gpfdist. Greenplum Database encloses comma-separated
text strings in single or double quotes. gpfdist has to remove the
quotes to parse the strings. Using pipes to separate formatted text avoids the extra step
and improves performance.