TCP Server

The TCP Server origin listens at the specified
port numbers, establishes TCP sessions with clients that initiate TCP connections, and
then processes the incoming data.

The origin can operate in different modes. The modes determine the messages that it can
process. It can process NetFlow messages or syslog messages. It can also process
supported Data Collector
data formats passed as data separated by specified characters, or passed as
character-based data with length prefixes.

The TCP Server can process data from can process data from multiple clients
simultaneously, creating separate batches for each client, and sending acknowledgements
to the originating client after parsing each record or committing each batch. You can
configure the origin to use multiple threads to improve performance when processing of
large volumes of data. And on 64-bit Linux systems, you can enable native Epoll
transports to further improve performance.

When you configure the TCP Server origin, you specify the ports to use and the TCP mode
that indicates the type of data the origin will receive. Then you configure mode-related
properties, such as the characters that separate records.

You can optionally configure the acknowledgements that you want to send. You can also configure SSL/TLS properties, including default
transport protocols and cipher suites.

Multithreaded Processing

The TCP Server origin
performs parallel processing and enables the creation of a multithreaded pipeline.

When you enable multithreaded processing, the TCP Server origin uses multiple concurrent
threads based on the Number of Receiver Threads property. When you start the pipeline,
the origin creates the number of threads specified in the property.

As clients initiate TCP connections, the origin establishes TCP sessions and waits for
data. Upon filling a batch, the origin passes the batch to an available pipeline runner.

A pipeline runner is a sourceless
pipeline instance - an instance of the pipeline that includes
all of the processors and destinations in the pipeline and represents all
pipeline processing after the origin. Each pipeline runner processes one batch at a time,
just like a pipeline that runs on a single thread. When the flow of data
slows, the pipeline runners wait idly until they are needed.

Multithreaded pipelines preserve the order of
records within each batch, just like a single-threaded pipeline. But since
batches are processed by different pipeline instances, the order that
batches are written to destinations is not ensured.

For example, say you enable multithreaded processing and set the Number of Receiver
Threads property to 5. When you start the pipeline, the origin creates five
threads, and Data Collector
creates a matching number of pipeline runners.Upon receiving data, the origin passes a batch to
each of the pipeline runners for processing.

Each pipeline runner performs the processing associated
with the rest of the pipeline. After a batch is written to pipeline
destinations, the pipeline runner becomes available for another batch of
data. Each batch is processed and written as quickly as possible,
independent from other batches processed by other pipeline runners, so
batches may be written differently from the read-order.

At any given moment, the five pipeline runners can each
process a batch, so this multithreaded pipeline processes up to five batches at a
time. When incoming data slows, the pipeline runners sit idle, available for use
as soon as the data flow increases.

Closing Connections for Invalid Data

When the TCP
Server origin receives invalid data, it closes the connection to the TCP client that
sent the data. It also passes the data to the pipeline for error handling.

For example, when you configure the origin, you specify the maximum record size. When a
TCP client sends a message that translates to larger than the maximum record size, the
origin disconnects from the client and passes the message to the pipeline for error
handling.

Similarly, say the TCP Server origin is configured to process XML data. If the origin
receives an invalid XML document, it disconnects from the sending client and passes the
data to the pipeline for error handling.

Sending Acknowledgements

You can configure
the TCP Server origin to send acknowlegements, a.k.a. acks., to the originating client.
The acknowledgement message can be a simple text message, such as "Ack". Or, you can use
the expression language to include additional information in the message.

The origin can send two types of acknowledgements:

record processed acknowledgement

When you configure a record processed acknowledgement, the origin sends acks
after it receives and processes each record. It sends the ack after parsing
a record from the incoming data.

batch completed acknowledgement

When you configure a batch completed acknowledgement, the origin sends acks
after the pipeline completes processing the batch. It sends the ack after
the batch is committed to all destinations.

Using Expressions in Messages

You can use the Data Collector
expression language to create custom acknowledgement messages. You might use expressions
to include information about Data Collector,
the pipeline, record, or batch in the message.

For example, if you have multiple Data Collectors
processing data from the same client, you might use the following record processed
message to include the Data Collector
host name and the pipeline title in the message, along with a record
identifier:

You can set the time zone to use for datetime values returned by expressions. By default,
the origin uses UTC.

Note: In a batch completion message, record functions return information from the last
record in the batch.

You can use the batchSize variable to return the number
of records included in the batch. The batchSize variable can be used only with the TCP
Server origin and must be typed into the message. The batchSize variable does not appear
in the expression completion list.

For example, the following message includes the number of records in the batch, the
transaction ID of the last record in the batch, and the pipeline that performed the
processing:

Pipeline: ${pipeline:title()} committed a batch whose last record was
${record:value('/transactionID')} and included ${batchSize} messages.

TCP Modes

The TCP Server origin
processes data differently depending on the mode that you select. The origin provides
the following modes:

NetFlow messages

The TCP Server origin can process NetFlow 5 and NetFlow 9 messages. When processing NetFlow messages, the stage generates
different records based on the NetFlow version. When processing NetFlow 9,
the records are generated based on the NetFlow 9 configuration properties.
For more information, see NetFlow Data Processing.

To process NetFlow messages, set the TCP Mode property to NetFlow. Then, for
NetFlow 9 data, configure the properties on the NetFlow 9 tab. NetFlow 5
data does not require additional configuration.

syslog messages

The TCP Server origin processes syslog messages in accordance with RFC 6587, except the origin does not support
method changes.

The TCP Server origin can process the following types of syslog messages:

Non-standard common messages, such as RFC 3339 dates with no version
digit

To process syslog messages, set the TCP Mode property to "syslog" and
configure the transfer framing mode.

Important: All TCP clients
must use the same transfer framing mode to transmit data.

Use one of the following transfer framing modes:

Octet counting - The frame indicates the length of the syslog
message and includes the entire contents of the message.

Non-transparent framing - The frame includes the syslog message and
a user-defined trailing separator characters. The origin uses the
separator characters to create records.

Use the following Java
Unicode syntax to specify a separator character:

\u<Unicode character code>

To
define multiple characters, simply list them as a group as
appropriate, such
as:

\u<Unicode character code>\u<Unicode character code>\u<Unicode character code>

For
example, the default separator character is line feed, whose
Unicode character code is 000A. To specify this as the separator
character, enter \u000A.

Separated records

The TCP Server origin can process the supported Data Collector data formats when the data is separated by the specified record separator
characters.

To process supported data formats, set the TCP Mode property to Separated
Records and specify the record separator characters. Then, configure any
data format-related properties.

Important: All TCP clients must
use the same record separator characters.

Character data with length prefix

The TCP Server origin can process the supported Data Collector data formats when passed as character-based data with a length prefix.

To process supported data formats, set the TCP Mode property to Character
Data with Length Prefix and specify the character set of the data.

A length prefix consists of the digits that indicate the length of the data,
and a space character. The data to be converted to a record should
immediately follow the space character.

Note: The length prefix must be in a
single-byte encoding, such as UTF-8. The data can be in any valid
character set, which you specify in the origin.

For example,
say a TCP client sends the following UTF-8 data:

11 hello world

The length prefix is "11 ", which indicates that the data to be converted is
11 bytes long. The origin then converts the following 11 bytes, "hello
world" to a record.

You can use this TCP mode to capture raw syslog
messages that are framed with octet counting into a string
field.

Data Formats

In Separated Record or
Character Data with Length Prefix TCP mode, the TCP Server origin processes data
differently based on the data format.

The origin can also process the following types of data when separated by the appropriate
record separator:

Avro

Generates a record for every message. Includes a "precision" and
"scale" field attribute for each Decimal field. For more
information about field attributes, see Field Attributes.

The origin writes the Avro schema to an avroSchema record header
attribute. For more information about record header attributes,
see Record Header Attributes.

You can use one of the following methods to specify the location
of the Avro schema definition:

Message/Data Includes Schema -
Use the schema in the message.

In Pipeline Configuration - Use
the schema that you provide in the stage
configuration.

Confluent Schema Registry -
Retrieve the schema from Confluent Schema Registry.
The Confluent Schema Registry is a distributed
storage layer for Avro schemas. You can configure
the origin to look up the schema in the Confluent
Schema Registry by the schema ID embedded in the
message or by the schema ID or subject specified in
the stage configuration.

Using a schema in the stage configuration or retrieving a schema
from the Confluent Schema Registry overrides any schema that
might be included in the message and can improve
performance.

Binary

Generates a record with a single byte array field at the root of
the record.

When the data exceeds the user-defined maximum data size, the
origin cannot process the data. Because the record is not
created, the origin cannot pass the record to the pipeline to be
written as an error record. Instead, the origin generates a
stage error.

Delimited

Generates a record for each delimited line. You can use the
following delimited format types:

You can use a list or list-map root field type for delimited data,
optionally including the header information when available. For
more information about the root field types, see Delimited Data Root Field Type.

When using a header line, you can allow processing records with
additional columns. The additional columns are named using a
custom prefix and integers in sequential increasing order, such
as _extra_1, _extra_2. When you disallow additional columns when
using a header line, records that include additional columns are
sent to error.

You can also replace a string constant with null values.

When a record exceeds the maximum record length defined for the
origin, the origin processes the object based on the error
handling configured for the stage.

JSON

Generates a record for each JSON object. You can process JSON
files that include multiple JSON objects or a single JSON
array.

When an object exceeds the maximum object length defined for the
origin, the origin processes the object based on the error
handling configured for the stage.

Log

Generates a record for every log line.

When a line exceeds the user-defined maximum line length, the
origin truncates longer lines.

You can include the processed log line as a field in the record.
If the log line is truncated, and you request the log line in
the record, the origin includes the truncated line.

Generates a record for every protobuf message. By default, the
origin assumes messages contain multiple protobuf messages.

Protobuf messages must match the specified message type and be
described in the descriptor file.

When the data for a record exceeds 1 MB, the origin cannot
continue processing data in the message. The origin handles the
message based on the stage error handling property and continues
reading the next message.

Generates records based on a user-defined delimiter element. Use
an XML element directly under the root element or define a
simplified XPath expression. If you do not define a delimiter
element, the origin treats the XML file as a single record.

Generated records include XML attributes and namespace
declarations as fields in the record by default. You can
configure the stage to include them in the record as field
attributes.

You can include XPath information for each parsed XML element and
XML attribute in field attributes. This also places each
namespace in an xmlns record header attribute.

Note:Field attributes and record header attributes are
written to destination systems automatically only when you use the SDC RPC
data format in destinations. For more information about working with field
attributes and record header attributes, and how to include them in records,
see Field Attributes and Record Header Attributes.

When a record exceeds the user-defined maximum record length, the
origin skips the record and continues processing with the next
record. It sends the skipped record to the pipeline for error
handling.

Character data with length prefix - Use to process
supported Data Collector data formats passed as character data with a
length prefix.

The length prefix must be in a
single-byte encoding, such as UTF-8. For more
information, see TCP Modes.

Charset

Character set of the data to be processed.

Used only
with the Character-Based with Length Prefix TCP mode.

Record Separator

One or more characters used by TCP clients to separate
records.

Specify one or more characters using the Java Unicode syntax,
as follows: \u<Unicode character code>. To specify
multiple characters, repeat the syntax for each character, as follows:
\u<Unicode character code>\u<Unicode character
code>\u<Unicode character code>.

Max Batch Size (messages)

Maximum number of messages to include in a batch and pass
through the pipeline at one time. Honors values up to the
Data Collector maximum batch size.

Default is 1000. The Data Collector default is 1000.

Batch Wait Time (ms)

Number of milliseconds to wait before sending a partial or empty batch.

Max Message Size (bytes)

Maximum message size in bytes to be converted into a
record.

When a message is larger than the maximum message
size, the origin disconnects from the originating client
and passes the record to the pipeline for error
handling.

Charset

Character set to use when sending
acknowledgements.

Ack Time Zone

Time zone to use for acknowledgement messages. Any dates
returned by functions are adjusted to the specified time
zone.

Record Processed Ack Message

Acknowledgement message to send after processing a
record. When configured, the origin sends a message after
processing each record.

Acknowledgement message to send after processing a batch.
When configured, the origin sends a message after each batch
of data is committed to all destinations.

You can use
expressions to include additional information in the
message. Record functions return information from the
last record in the batch. For more information, see
Using Expressions in Messages.

By default, no acknowledgement is
sent.

When processing data in Syslog TCP mode, on the Syslog
tab, configure the following properties:

Syslog Property

Description

Syslog Message Transfer Framing Mode

The framing mode that the TCP clients use to pass the
data. Use one of the following options:

Octet Counting - The message is entirely enclosed in
the frame.

Non-transparent-framing - The message includes
trailing separator characters to indicate the end of
the message.

Non-transparent-framing Separator

One or more separator characters used to separate
records.

Specify one or more characters using the Java Unicode syntax,
as follows: \u<Unicode character code>. To specify
multiple characters, repeat the syntax for each character, as follows:
\u<Unicode character code>\u<Unicode character
code>\u<Unicode character code>.

Used with the non-transparent framing mode
only.

Charset

Character encoding of the data to be processed.

When processing data in Separated Record or Character Data with Length Prefix
TCP mode, click the Data Formats tab and configure the
Data Format of the data.

For Avro data, on the Data Format tab, configure the
following properties:

Avro Property

Description

Avro Schema Location

Location of the Avro schema definition to use when
processing data:

Message/Data Includes Schema - Use the schema in the
message.

In Pipeline Configuration - Use the schema provided
in the stage configuration.

Overrides any existing schema definitions associated
with the message.

Schema Subject

Avro schema subject to look up in the Confluent Schema
Registry.

If the specified subject has multiple schema
versions, the origin uses the latest schema version for
that subject. To use an older version, find the
corresponding schema ID, and then set the
Look Up Schema By property to
Schema ID.

Schema ID

Avro schema ID to look up in the Confluent Schema
Registry.

For binary data, on the Data Format tab, configure the
following properties:

Indicates whether a file contains a header line, and
whether to use the header line.

Allow Extra Columns

When processing data with a header line, allows
processing records with more columns than exist in the
header line.

Extra Column Prefix

Prefix to use for any additional columns. Extra columns
are named using the prefix and sequential increasing
integers as follows:
<prefix><integer>.

For
example, _extra_1. Default is _extra_.

Max Record Length (chars)

Maximum length of a record in characters. Longer records
are not read.

This property can be limited by the Data Collector parser
buffer size. For more information, see Maximum Record Size.

Delimiter Character

Delimiter character for a custom delimiter format. Select
one of the available options or use Other to enter a custom
character.

You can enter a Unicode control character
using the format \uNNNN, where ​N is a
hexadecimal digit from the numbers 0-9 or the letters
A-F. For example, enter \u0000 to use the null character
as the delimiter or \u2028 to use a line separator as
the delimiter.

Default is the pipe character ( |
).

Escape Character

Escape character for a custom file type.

Quote Character

Quote character for a custom file type.

Root Field Type

Root field type to use:

List-Map - Generates an indexed list of data.
Enables you to use standard functions to process
data. Use for new pipelines.

List - Generates a record with an indexed list with
a map for header and value. Requires the use of
delimited data functions to process data. Use only
to maintain pipelines created before 1.1.0.

Lines to Skip

Lines to skip before reading data.

Parse NULLs

Replaces the specified string constant with null
values.

NULL Constant

String constant to replace with null values.

Charset

Character encoding of the files to be processed.

Ignore Ctrl Characters

Removes all ASCII control characters except for the tab, line feed, and carriage
return characters.

For JSON data, on the Data Format tab, configure the
following properties:

Includes the XPath to each parsed XML element and XML
attribute in field attributes. Also includes each namespace
in an xmlns record header attribute.

When not selected,
this information is not included in the record. By
default, the property is not selected.

Note:Field attributes and record header attributes are
written to destination systems automatically only when you use the SDC RPC
data format in destinations. For more information about working with field
attributes and record header attributes, and how to include them in records,
see Field Attributes and Record Header Attributes.

Namespaces

Namespace prefix and URI to use when parsing the XML
document. Define namespaces when the XML element being used
includes a namespace prefix or when the XPath expression
includes namespaces.

Includes XML attributes and namespace declarations in the
record as field attributes. When not selected, XML
attributes and namespace declarations are included in the
record as fields.

Note:Field attributes are automatically included in
records written to destination systems only when you use the SDC RPC data
format in the destination. For more information about working with field
attributes, see Field Attributes.

By default, the property is not
selected.

Max Record Length (chars)

The maximum number of characters in a record. Longer
records are diverted to the pipeline for error handling.

This property can be limited by the Data Collector parser
buffer size. For more information, see Maximum Record Size.

Charset

Character encoding of the files to be processed.

Ignore Ctrl Characters

Removes all ASCII control characters except for the tab, line feed, and carriage
return characters.

To use SSL/TLS, click the TLS tab and configure the
following properties:

TLS Property

Description

Use TLS

Enables the use of TLS.

Keystore File

The path to the keystore file. Enter an absolute path to
the file or a path relative to the Data Collector resources directory: $SDC_RESOURCES.

The maximum number
of milliseconds to cache an idle template. Templates
unused for more than the specified time are evicted from
the cache. For more information about templates, see
Caching NetFlow 9 Templates.