HTTP to Kafka

The HTTP to Kafka origin listens on an HTTP
endpoint and writes the contents of all authorized HTTP POST requests directly to
Kafka.

Use the HTTP to Kafka origin to write large volumes of HTTP POST requests immediately to
Kafka without additional processing. To perform processing, you can create a separate
pipeline with a Kafka Consumer origin that reads from the Kafka topic.

If you need to process data before writing it to Kafka or need to write to a destination
system other than Kafka, use the HTTP Server origin.

You can configure multiple HTTP clients to send data to the HTTP to Kafka origin. Just
complete the necessary prerequisites before you configure the origin. Here is an example
of the architecture for using the HTTP to Kafka origin:

When you configure HTTP to Kafka, you specify the listening port, Kafka configuration
information, maximum message size, and the application ID. You can also configure SSL/TLS properties, including default
transport protocols and cipher suites.

You can add Kafka configuration properties and enable Kafka security as needed.

Tip:Data Collector provides
several HTTP origins to address different needs. For a quick comparison
chart to help you choose the right one, see Comparing HTTP Origins.

Prerequisites

Before you run a pipeline with the HTTP
to Kafka origin, configure the following prerequisites:

Configure HTTP clients to send data to the HTTP to Kafka listening port

When you configure the origin, you define a listening port number where the
origin listens for data.

To pass data to the pipeline, configure each HTTP client to send data to a
URL that includes the listening port number.

Use the following format for the
URL:

<http | https>://<sdc_hostname>:<listening_port>/

The URL includes the following components:

<http | https> - Use https for secure HTTP connections.

<sdc_hostname> - The Data Collector host name.

<listening_port> - The port number where the origin listens for
data.

For example: https://localhost:8000/

Include the application ID in request headers

When you configure the origin, you define an application ID. All messages
sent to the HTTP to Kafka origin must include the application ID in the
request header.

Add the following information to the request header for all HTTP POST
requests that you want the origin to
process:

X-SDC-APPLICATION-ID: <applicationID>

For example:

X-SDC-APPLICATION-ID: sdc_http2kafka

Pipeline Configuration

When you use an HTTP to Kafka origin in a pipeline,
connect the origin to a Trash destination.

The HTTP to Kafka origin writes records directly to Kafka. The origin does not pass records to its output port, so you
cannot perform additional processing or write the data to other destination
systems.

However, since a pipeline requires a destination, you should
connect the origin to the Trash destination to satisfy pipeline validation
requirements.

A pipeline with the HTTP to Kafka origin should look like this:

Kafka Maximum Message Size

Configure the Kafka maximum message size in the
origin in relationship to the equivalent Kafka cluster property. The origin property
should be equal to or less than the Kafka cluster property.

The HTTP to Kafka origin writes the contents of each HTTP POST request to Kafka as a
single message. So the maximum message size configured in the origin determines the
maximum size of the HTTP request and limits the size of messages written to Kafka.

To ensure all messages are written to Kafka, set the origin property to equal to or less
than the Kafka cluster property. Attempts to write messages larger than the specified
Kafka cluster property fail, returning an HTTP 500 error to the originating HTTP
client.

For example, if the Kafka cluster allows a maximum message size of 2 MB, configure the
Maximum Message Size property in the origin to 2 MB or less to avoid HTTP 500 errors for
larger messages.

By default, the maximum message size in a
Kafka cluster is 1 MB, as defined by the message.max.bytes property.

Enabling Kafka Security

When using Kafka version 0.9.0.0 or later,
you can configure the HTTP to Kafka origin to connect securely through SSL/TLS,
Kerberos, or both.

Earlier versions of Kafka do not support security.

Enabling SSL/TLS

Perform the following steps
to enable the HTTP to Kafka origin to use SSL/TLS to connect to Kafka version 0.9.0.0 or later.

To use SSL/TLS to connect, first make sure Kafka is
configured for SSL/TLS as described in the Kafka documentation.

On the General tab of the stage, set
the Stage Library property to Apache Kafka 0.9.0.0 or a
later version.

On the Connection tab, add the
security.protocol Kafka configuration property and
set it to SSL.

Then, add the following SSL Kafka configuration
properties:

ssl.truststore.location

ssl.truststore.password

When the Kafka broker requires client authentication - when the
ssl.client.auth broker property is set to "required" - add and configure the
following properties:

ssl.keystore.location

ssl.keystore.password

ssl.key.password

Some brokers might require adding the following properties as
well:

ssl.enabled.protocols

ssl.truststore.type

ssl.keystore.type

For details about these properties, see the Kafka
documentation.

For example, the following properties allow the stage to use SSL/TLS to
connect to Kafka 0.9.0.0 with client authentication:

Enabling Kerberos (SASL)

When
you use Kerberos authentication, Data Collector
uses the Kerberos principal and keytab to connect to Kafka version 0.9.0.0 or later.
Perform the following steps to enable the HTTP to Kafka origin to use Kerberos to
connect to Kafka.

To use Kerberos, first make sure Kafka is configured for
Kerberos as described in the Kafka documentation.

Add the Java Authentication and Authorization
Service (JAAS) configuration properties required for Kafka clients based on your
installation type:

RPM or tarball installation - Add the properties
to the JAAS configuration file used by Data Collector - the
$SDC_CONF/ldap-login.conf file. Add the following
properties to a client login section in the file named
KafkaClient:

Cloudera Manager installation - Add the
properties to the Data Collector Advanced Configuration Snippet
(Safety Valve) for generated-ldap-login-append.conf field for
the StreamSets service in Cloudera Manager. Add the properties to the
field as
follows:

Add the Java Authentication and Authorization
Service (JAAS) configuration properties required for Kafka clients based on your
installation type:

RPM or tarball installation - Add the properties
to the JAAS configuration file used by Data Collector - the
$SDC_CONF/ldap-login.conf file. Add the following
properties to a client login section in the file named
KafkaClient:

Cloudera Manager installation - Add the
properties to the Data Collector Advanced Configuration Snippet
(Safety Valve) for generated-ldap-login-append.conf field for
the StreamSets service in Cloudera Manager. Add the properties to the
field as
follows: