The Pivotal Greenplum-Kafka Integration supports the Apache and Confluent Kafka
distributions. Refer to the Apache Kafka Documentation
for more information about Apache Kafka.

A Kafka message may include a key and a value. Kafka stores streams of messages (or records) in categories called topics.
A Kafka producer publishes records to partitions
in one or more topics. A Kafka consumer subscribes to a topic
and receives records in the order that they
were sent within a given Kafka partition. Kafka does not guarantee the order of data
originating from different Kafka partitions.

Note: Starting in Greenplum Database version 5.16, gpkafka is a wrapper around
the Greenplum Stream Server (GPSS) gpss
and gpsscli
commands. Pivotal recommends that you migrate to using the GPSS commands.

The gpkafka utility is a Kafka consumer. It ingests streaming
data from a single Kafka topic, using Greenplum Database readable external tables
to transform and insert the data into a target Greenplum table. You identify
the Kafka source, data format, and the Greenplum connection options and target
table definition in a YAML-formatted load configuration file that you provide to
the utility. In the case of user interrupt
or exit, gpkafka resumes a subsequent data load operation specifying
the same Kafka topic and target Greenplum Database table names from
the last recorded offset.

Requirements

The Greenplum-Kafka Integration requires Kafka version 0.11 or newer
for exactly-once delivery assurance. You can run with an older version
of Kafka (but lose the exactly-once guarantee)
by adding the following PROPERTIES block to your
gpkafka.yaml load configuration file: