You must register a Data Collector to work with StreamSets Control Hub. When you register a Data Collector, Data Collector generates an authentication token that it uses to issue authenticated requests to Control Hub.

A Control Hub job defines the pipeline to run and the Data Collectors or Edge Data Collectors (SDC Edge) that run the pipeline. When you start a job, Control Hub remotely runs the pipeline on the group of Data Collectors or Edge Data Collectors. To monitor the job statistics and metrics within Control Hub, you must configure the pipeline to write statistics to Control Hub or to another system.

You can view the commit history of any pipeline that has been published to Control Hub. If the pipeline has been committed multiple times, you can get an older version of the pipeline and then
continue editing the older version.

Pipeline Management with Control Hub

After you register a Data Collector with
StreamSets Control Hub,
you can manage how the pipelines work with Control Hub.

You develop pipelines in Data Collector,
and then publish or import them to Control Hub.
Within Control Hub,
you create jobs to determine the Data Collectors
that run the pipelines. When you start a job on a group of Data Collectors,
Control Hub
remotely runs a pipeline instance on each Data Collector.

Managing pipelines with Control Hub
involves completing the following tasks:

Understanding the different types of pipelines that can run on a registered Data Collector.

If you have not registered a Data Collector,
you can still develop pipelines in the Data Collector
and then export the pipelines for use in Control Hub.

Pipeline Types

After a Data Collector
has been registered with Control Hub,
you can view the following types of pipelines in the Data Collector:

Local pipelines

Local pipelines are pipelines that are managed by a Data Collector and run locally on that Data Collector. Data Collector displays local pipelines when they are running and not running.

Use a Data Collector to design, start, stop, and monitor local pipelines.

Published pipelines

Published pipelines are local pipelines that have been published to Control Hub. You can still use Data Collector to manage and locally run published pipelines on that Data Collector. Data Collector displays published pipelines with the current version number, and when
they are running and not running.

Control Hub controlled pipelines

Control Hub controlled pipelines are pipelines that are managed by Control Hub and run remotely on registered Data Collectors. Data Collector displays Control Hub controlled pipelines when they are running.

Control Hub controlled pipelines include the following:

Published pipelines run from Control Hub jobs.

After you publish or import pipelines to Control Hub, you add them to a job, and then start the job. When you
start a job on a group of Data Collectors, Control Hub remotely runs an instance of the published pipeline on each
Data Collector. When the Control Hub job stops, the running published pipeline also stops, and can
no longer be viewed in the Data Collector. Use Control Hub to start, stop, and monitor published pipelines that are run
from jobs.

System pipelines run from Control Hub jobs.

Control Hub automatically generates and runs system pipelines to
aggregate statistics for jobs. System pipelines collect,
aggregate, and push metrics for all of the remote pipeline
instances run from a job. When you start a job on a group of Data Collectors, Control Hub picks one Data Collector to run the system pipeline. When the Control Hub job stops, the running system pipeline also stops, and can no
longer be viewed in the Data Collector.

Control Hub generates system pipelines as needed. Published pipelines
that are not configured to aggregate statistics do not require system
pipelines.

Note: A Data Collector administrator can use Data Collector to
stop Control Hub controlled pipelines. Otherwise, you cannot modify or manage Control Hub
controlled pipelines in Data Collector.

Viewing Pipeline Types in Data Collector

Let's look at a sample Data Collector
Home page to see how Data Collector
displays local pipelines, published pipelines, and Control Hub
controlled pipelines:

The Data Collector
displays the following pipelines:

Local pipeline that was developed in this Data Collector and can be run locally on this Data Collector. Local pipelines are listed by title. In the image above, Remove Extra Fields is
a local pipeline that is not running. The Remove Extra Fields pipeline has not been
published to Control Hub, as indicated by no version number after its title.

Published pipeline that was published to Control Hub
and can still be run locally on this Data Collector. Published pipelines are listed by title and version number. In the image above,
Kafka to HDFS is a published pipeline that is not running.

Running published pipeline that was published to Control Hub, then run from a job. Published pipelines that are remotely run from jobs are
listed with a "Control Hub" label.

Running system pipeline that collects, aggregates, and pushes metrics for all of the
remote pipeline instances run from the job. Running system pipelines are listed with
a "Control Hub
system" label.

Tip: In the image above, the job was started on a Data Collector
used to design pipelines. As a best practice, use labels within Control Hub to
separate development Data Collectors
from production Data Collectors.
That way, you can ensure that published pipelines are only run on production Data Collectors
and not on a Data Collector
that a developer is currently using to design pipelines. For more information about
using labels, see the Control Hub
online help.

Publishing Pipelines to Control Hub

After you finish developing pipelines in
Data Collector, you publish the pipelines to the Control Hub
pipeline repository. You can publish pipelines that are valid.

Tip: When you update a published pipeline, Data Collector displays an asterisk next to the pipeline name to indicate that the pipeline has
been updated since it was last published, as follows:

From the Home page, select pipelines in the list and
then click the Publish Pipeline icon . Or to publish a pipeline from the pipeline canvas, click the
Control Hub Options icon , and then click Publish
Pipeline.

The Publish Pipeline dialog box appears.

Enter a commit message.

As a best practice, state what changed in this pipeline version so that you
can track the commit history of the pipeline.

Note: If you are publishing multiple
pipelines from the Home page, the same commit message
is used for all of the pipelines.

Click Publish Pipeline.

Reverting Changes to Published Pipelines

If you update
a published pipeline but decide not to publish the updates as a new version, you can
revert the changes made to the pipeline configuration.

In the pipeline canvas, click the Control Hub Options
icon , and then click Revert
Changes.

In the confirmation dialog box, click Yes.

Viewing Pipeline Commit History

You can view the commit history of any pipeline that has been published to Control Hub. If the
pipeline has been committed multiple times, you can get an older version of the pipeline and
then continue editing the older version.

If you edit and
then publish an older version, Control Hub
updates the minor version number rather than the major version number. For example,
you have a pipeline with three versions. You get version 2 of the pipeline, edit the
pipeline, and then publish the pipeline. Control Hub
versions the current pipeline as 2.1. So you now have four versions of the pipeline,
with version 2.1 of the pipeline marked as the current version:

2.1

3

2

1

To view pipeline commit history:

In the pipeline canvas, click the Control Hub Options
icon , and then click Commit
History.

The Pipeline Commit History dialog box opens. For
example, a pipeline with three versions displays the commit history as
follows:

To get an older version of a pipeline, click Get in the
Actions column for that version.

Data Collector opens the selected pipeline version in the canvas. You can make edits to the
pipeline version, and then publish the pipeline as another version.

Downloading Published Pipelines

You can download a
published pipeline from the Control Hub
pipeline repository into a registered Data Collector.
Download a published pipeline when you need to edit a published pipeline version, and
that pipeline version was originally developed on a different Data Collector.

When you download a pipeline from Control Hub, you become the owner of a local instance of the published pipeline. The
downloaded pipeline has no connection to the published pipeline.

From the Home page, click Create New Pipeline > Download Published Pipeline.

The Download Published Pipeline dialog box displays all
pipelines in the Control Hub pipeline repository.

Click Download for each published pipeline that you want
to download to this Data Collector.

Data Collector downloads all versions of the selected pipeline. You can view the pipeline
commit history in Data Collector, and get an older version of the pipeline if needed.

Click Close when you have finished downloading published
pipelines.

Exporting Pipelines for Control Hub

If
you develop pipelines in a Data Collector
that is not registered with Control Hub, export valid pipelines for use in Control Hub.

If you develop pipelines in a Data Collector
that is registered with Control Hub, publish the pipelines
directly to Control Hub.

You can export a single pipeline or a set of pipelines. When you export pipelines for
Control Hub, Data Collector
exports the pipelines without plain text credentials.

Alternatively, to export a single pipeline, you can open the pipeline.

Click the More icon, and then click Export
for Control Hub.

Data Collector exports the pipelines without any plain text credentials and writes a
file containing the exported pipelines to your default downloads
directory:

When you export a single pipeline, Data Collector generates a JSON file named after the pipeline, as follows:
<pipeline name>.json. The generated JSON
file includes the definition of each stage library used in the
pipeline.

When you export a set of pipelines, Data Collector creates a ZIP file named pipelines.zip.

After exporting pipelines for Control Hub,
import the pipelines into Control Hub and
reconfigure any plain text credentials removed during export.