HTTP Client

When you configure HTTP Client, you specify the resource URL, optional headers, and the
method to use. For some methods, you can specify the request body and default content type.

You can configure the actions to take based on the response status and configure pagination
properties to enable processing large volumes of data from paginated APIs. You can also enable
the origin to read compressed and archived files.

The origin provides response header fields as record header attributes so you can use the
information in the pipeline when needed.

The origin also provides several different authentication types to access data. You can enter
credentials in the origin or you can secure the credentials in runtime resource files and
reference the files in the origin. You can also configure the origin to use the OAuth 2
protocol to connect to an HTTP service.

You can optionally use an HTTP proxy and configure SSL/TLS properties.

Tip:Data Collector provides
several HTTP origins to address different needs. For a quick comparison
chart to help you choose the right one, see Comparing HTTP Origins.

Keep All Fields

When using
pagination, you can configure the origin to keep all fields in addition to those in the
specified result field path. The resulting record includes all fields in the original
structure and the result field path that includes one set of data. By default, the origin
returns only the data within the specified result field path.

For example, say we use the same sample data as above, with /results for the result field
path. And we configure the origin to keep all fields. The origin generates three records
that keep the existing record structure, and includes one set of data in the /results
field.

HTTP Method

To request data
from an HTTP resource URL, specify the request method to use. Most servers require a GET
request, but you should verify the request required by the server you want to access.

You can use the following methods:

GET

PUT

POST

DELETE

HEAD

OAuth 2 Authorization

You can configure the HTTP Client origin to use the OAuth 2 protocol to connect to an
HTTP service that uses basic, digest, or universal authentication, OAuth 2 client
credentials, OAuth 2 username and password, or OAuth 2 JSON Web Tokens (JWT).

The OAuth 2 protocol
authorizes third-party access to HTTP service resources without sharing credentials. The
HTTP Client origin uses credentials to request an access token from the service. The
service returns the token to the origin, and then the origin includes the token in a
header in each request to the resource URL.

The credentials that you enter to request an access token
depend on the credentials grant type required by the HTTP service. You can define
the following OAuth 2 credentials grant types for HTTP Client:

Client credentials grant

HTTP Client sends its own credentials - the client ID and
client secret or the basic, digest, or universal
authentication credentials - to the HTTP service. For
example, use the client credentials grant to process data
from the Twitter API or from the Microsoft Azure Active
Directory (Azure AD) API.

HTTP Client sends the credentials for the resource owner -
the resource owner username and password - to the HTTP
service. Or, you can use this grant type to migrate
existing clients using basic, digest, or universal
authentication to OAuth 2 by converting the stored
credentials to an access token.

On the HTTP tab, set Authentication
Type to None, and then select
Use OAuth 2.

On the OAuth 2 tab, select JSON Web
Tokens for the grant type.

In the Token URL property, enter the following URL used
to request the access token:

https://www.googleapis.com/oauth2/v4/token

Select the following algorithm to sign the JWT: RSASSA-PKCS-v1_5
using SHA-256.

Enter the Base64 encoded key used to sign the JWT.

To access the key, download the JSON key file when you generate the Google
credentials. Locate the "private_key" field in the file, which contains a
string version of the key. Copy the string into the JWT Signing
Key property, and then replace all "\n" literals with new
lines.

You
can include the expression language in the JWT claims. For example, in the
sample claim above, both the "exp" (expiration time) claim and the "iat"
(issued at) claim include Data Collector time
functions to set the expiration time and the issue time.

Tip: Google access tokens expire after 60 minutes. As a result, set
the expiration time claim to be slightly less than 60 minutes so that HTTP
Client can request a new token within the time limit.

The following image shows the OAuth 2
tab configured for Google service accounts:

Data Formats

The HTTP Client origin processes data differently based on the data format. The origin
processes the following types of data:

Binary

Generates a record with a single byte array field at the root of
the record.

When the data exceeds the user-defined maximum data size, the
origin cannot process the data. Because the record is not
created, the origin cannot pass the record to the pipeline to be
written as an error record. Instead, the origin generates a
stage error.

Delimited

Generates a record for each delimited line. You can use the
following delimited format types:

You can use a list or list-map root field type for delimited data,
optionally including the header information when available. For
more information about the root field types, see Delimited Data Root Field Type.

When using a header line, you can allow processing records with
additional columns. The additional columns are named using a
custom prefix and integers in sequential increasing order, such
as _extra_1, _extra_2. When you disallow additional columns when
using a header line, records that include additional columns are
sent to error.

You can also replace a string constant with null values.

When a record exceeds the maximum record length defined for the
origin, the origin processes the object based on the error
handling configured for the stage.

JSON

Generates a record for each JSON object.

When an object exceeds the specified maximum object length, the origin
processes the object based on the error handling configured for the stage.

Log

Generates a record for every log line.

When a line exceeds the user-defined maximum line length, the
origin truncates longer lines.

You can include the processed log line as a field in the record.
If the log line is truncated, and you request the log line in
the record, the origin includes the truncated line.

Generates a record for every record. Use to process records
generated by a Data Collector
pipeline using the SDC Record data format.

For error records, the origin provides the original record as read
from the origin in the original pipeline, as well as error
information that you can use to correct the record.

When processing error records, the origin expects the error file
names and contents as generated by the original pipeline.

Text

Generates a record for each line of text.

When a line exceeds the specified maximum line length, the origin truncates
the line. The origin adds a boolean field named Truncated to indicate if the
line was truncated.

XML

Generates records based on a user-defined delimiter element. Use
an XML element directly under the root element or define a
simplified XPath expression. If you do not define a delimiter
element, the origin treats the XML file as a single record.

Generated records include XML attributes and namespace
declarations as fields in the record by default. You can
configure the stage to include them in the record as field
attributes.

You can include XPath information for each parsed XML element and
XML attribute in field attributes. This also places each
namespace in an xmlns record header attribute.

Note:Field attributes and record header attributes are
written to destination systems automatically only when you use the SDC RPC
data format in destinations. For more information about working with field
attributes and record header attributes, and how to include them in records,
see Field Attributes and Record Header Attributes.

When a record exceeds the user-defined maximum record length, the
origin skips the record and continues processing with the next
record. It sends the skipped record to the pipeline for error
handling.

Response Header Fields in Header Attributes

The HTTP Client
origin includes response header fields – such as Content-Encoding, Content-Type, or any
custom response header field – in records as record header attributes. The attribute
names match the original response header field name.

Configuring an HTTP Client Origin

Configure
an HTTP Client origin to read data from an HTTP resource URL.

In the Properties panel, on the General tab, configure the
following properties:

General Property

Description

Name

Stage name.

Description

Optional description.

On Record Error

Error record handling for the stage:

Discard - Discards the record.

Send to Error - Sends the record to the pipeline for
error handling.

Stop Pipeline - Stops the pipeline.

On the HTTP tab, configure the following properties:

HTTP Property

Description

Resource URL

URL where the data resides.

Headers

Optional headers to include in the request. Using simple or bulk edit mode, click the
Add icon to add additional
headers.

Mode

Processing mode:

Streaming - Maintains a connection and processes
data as it becomes available.

Polling - Connects periodically to check for data.

Batch - Processes all available data, and then stops
the pipeline.

Polling Interval (ms)

Milliseconds to wait before checking for new data. Used
in the polling mode only.

Per-Status Actions

Actions to take for specific response statuses. For
example, you can configure the origin to retry the request
with an exponential backoff when it receives a 500 HTTP
status code.

Click Add to add an
action for an additional status code.

HTTP Method

HTTP method to use to request data from the
server.

Body Time Zone

Time zone to use for evaluating the request body. Use
when the request body includes datetime variables or time
functions.

Request Body

Request data to use with the specified method. Available
for the PUT, POST, and DELETE methods.

Universal - Makes an anonymous connection, then provides authentication credentials
upon receiving a 401 status and a WWW-Authenticate header request.

Requires a username
and password associated with basic or digest authentication.

Use only with servers
that respond to this workflow.

OAuth - Uses OAuth 1.0 authentication. Requires OAuth credentials.

Use OAuth 2

Enables using OAuth 2 authorization to request access tokens.

You can use OAuth 2
authorization with none, basic, digest, or universal authentication.

Use Proxy

Enables using an HTTP proxy to connect to the system.

Max Batch Size (records)

Maximum number of records to include in a batch and send
through the pipeline at one time.

Batch Wait Time (ms)

Maximum number of milliseconds wait before sending a
partial or empty batch.

On the Pagination tab, optionally configure pagination
details:

Pagination Property

Description

Pagination Mode

Method of pagination to use. Use a method supported by
the API of the HTTP client.

Initial Page/Offset

The initial page for page number pagination, or the
initial offset for offset number pagination.

Next Page Link Field

Field path in the response that contains the URL to the
next page.

For link in response field
pagination.

Stop Condition

Condition that evaluates to true when there are no more
pages to process.

For link in response field
pagination.

For example, let's say that the API of
the HTTP client includes a count property that
determines the number of items displayed per page. If
the count is set to 1000 and a page returns with less
than 1000 items, it is the last page of data. So you'd
enter the following expression to stop processing when
the count is less than
1000:

${record:value('/count') < 1000}

Result Field Path

Field path in the response that contains the data that
you want to process. Must be a list or array field.

The
origin generates a record for each object in the
specified field.

Keep All Fields

Includes all fields from the response in the resulting
record when enabled.

By default, only the fields in the
specified result field path are included in the
record.

Wait Time Between Pages (ms)

The number of milliseconds to wait before requesting the
next page of data.

When using authentication, on the Credentials tab,
configure the following properties:

Indicates whether a file contains a header line, and
whether to use the header line.

Allow Extra Columns

When processing data with a header line, allows
processing records with more columns than exist in the
header line.

Extra Column Prefix

Prefix to use for any additional columns. Extra columns
are named using the prefix and sequential increasing
integers as follows:
<prefix><integer>.

For
example, _extra_1. Default is _extra_.

Max Record Length (chars)

Maximum length of a record in characters. Longer records
are not read.

This property can be limited by the Data Collector parser
buffer size. For more information, see Maximum Record Size.

Delimiter Character

Delimiter character for a custom delimiter format. Select
one of the available options or use Other to enter a custom
character.

You can enter a Unicode control character
using the format \uNNNN, where ​N is a
hexadecimal digit from the numbers 0-9 or the letters
A-F. For example, enter \u0000 to use the null character
as the delimiter or \u2028 to use a line separator as
the delimiter.

Default is the pipe character ( |
).

Escape Character

Escape character for a custom file type.

Quote Character

Quote character for a custom file type.

Root Field Type

Root field type to use:

List-Map - Generates an indexed list of data.
Enables you to use standard functions to process
data. Use for new pipelines.

List - Generates a record with an indexed list with
a map for header and value. Requires the use of
delimited data functions to process data. Use only
to maintain pipelines created before 1.1.0.

Lines to Skip

Lines to skip before reading data.

Parse NULLs

Replaces the specified string constant with null
values.

NULL Constant

String constant to replace with null values.

Charset

Character encoding of the files to be processed.

Ignore Ctrl Characters

Removes all ASCII control characters except for the tab, line feed, and carriage
return characters.

For JSON data, on the Data Format tab, configure the
following properties:

Includes the XPath to each parsed XML element and XML
attribute in field attributes. Also includes each namespace
in an xmlns record header attribute.

When not selected,
this information is not included in the record. By
default, the property is not selected.

Note:Field attributes and record header attributes are
written to destination systems automatically only when you use the SDC RPC
data format in destinations. For more information about working with field
attributes and record header attributes, and how to include them in records,
see Field Attributes and Record Header Attributes.

Namespaces

Namespace prefix and URI to use when parsing the XML
document. Define namespaces when the XML element being used
includes a namespace prefix or when the XPath expression
includes namespaces.

Includes XML attributes and namespace declarations in the
record as field attributes. When not selected, XML
attributes and namespace declarations are included in the
record as fields.

Note:Field attributes are automatically included in
records written to destination systems only when you use the SDC RPC data
format in the destination. For more information about working with field
attributes, see Field Attributes.

By default, the property is not
selected.

Max Record Length (chars)

The maximum number of characters in a record. Longer
records are diverted to the pipeline for error handling.

This property can be limited by the Data Collector parser
buffer size. For more information, see Maximum Record Size.

Charset

Character encoding of the files to be processed.

Ignore Ctrl Characters

Removes all ASCII control characters except for the tab, line feed, and carriage
return characters.