Pipeline Maintenance

Understanding Pipeline States

A pipeline state is the
current condition of the pipeline, such as "running" or "stopped". The pipeline state
can display in the All Pipelines list. The state of a pipeline can also appear in the
Data Collector log.

The following pipeline states often display in the All Pipelines list:

EDITED - The pipeline has been created or modified, and has not run since the
last modification.

FINISHED - The pipeline has completed all expected processing and has stopped
running.

RUN_ERROR - The pipeline encountered an error while running and stopped.

RUNNING - The pipeline is running.

STOPPED - The pipeline was manually stopped.

START_ERROR - The pipeline encountered an error while starting and failed to
start.

STOP_ERROR - The pipeline encountered an error while stopping.

The following pipeline states are transient and rarely display in the All Pipelines list.
These states can display in the Data Collector
log when the pipeline logging level is set to Debug:

CONNECT_ERROR - When running a cluster-mode pipeline, Data Collector cannot
connect to the underlying cluster manager, such as Mesos or YARN.

CONNECTING - The pipeline is preparing to restart after a Data Collector restart.

DISCONNECTED - The pipeline is disconnected from external systems, typically
because Data Collector is restarting or shutting down.

FINISHING - The pipeline is in the process of finishing all expected
processing.

RETRY - The pipeline is trying to run after encountering an error while running.
This occurs only when the pipeline is configured for a retry upon error.

RUNNING_ERROR - The pipeline encounters errors while running.

STARTING - The pipeline is initializing, but hasn't started yet.

STARTING_ERROR - The pipeline encounters errors while starting.

STOPPING - The pipeline is in the process of stopping after a manual request to
stop.

STOPPING_ERROR - The pipeline encounters errors while stopping.

State Transition Examples

Here are some
examples of how pipelines can move through states:

Starting a pipeline

When you successfully start a pipeline for the first time, a pipeline
transitions through the following
states:

(EDITED)... STARTING... RUNNING

When you start a pipeline for the first time but it cannot start, the
pipeline transitions through the following
states:

(EDITED)... STARTING... STARTING_ERROR... START_ERROR

Stopping or restarting Data Collector

When Data Collector shuts down, running pipelines transition through the following
states:

(RUNNING)... DISCONNECTING... DISCONNECTED

When Data Collector restarts, any pipelines that were running transition through the
following
states:

DISCONNECTED... CONNECTING... STARTING... RUNNING

Retrying a pipeline

When a pipeline is configured to retry upon error, Data Collector performs the specified number of retries when the pipeline encounters
errors while running.

When retrying upon error and successfully retrying, a pipeline transitions
through the following
states:

(RUNNING)... RUNNING_ERROR... RETRY... STARTING... RUNNING

When retrying upon error and encountering another error, a pipeline
transitions through the following
states:

For these origins, when you stop the pipeline, the Data Collector notes where it stopped processing data. When you restart the pipeline, it
continues from where it left off by default. When you want the Data Collector to process all available data instead of continuing from where it stopped, reset
the origin.

You can configure the Kafka and MapR Streams origins to process
all available data by specifying an additional Kafka configuration property. For
more information, see "Processing All Unread Data" in the stage documentation. The
remaining origin stages process transient data where resetting the origin has no
effect.

You can reset the origin for multiple pipelines at the same time from
the Home page. Or, you can reset the origin for a single
pipeline from the pipeline canvas.

To reset the origin:

Select multiple pipelines from the Home page, or view a
single pipeline in the pipeline canvas.

Click the More icon, and then click Reset
Origin.

In the Reset Origin Confirmation dialog box, click
Yes to reset the origin.

Stopping Pipelines

Stop pipelines when you want Data Collector to
stop processing data for the pipelines.

When Data Collector runs a pipeline, it displays in the Data Collector UI in Monitor mode by default.

From the Home page, select the pipelines in the list and
then click the Stop icon. Or to stop a pipeline in the
pipeline canvas, click the Stop icon.

The Stop Pipeline Confirmation dialog box appears.

To stop the pipelines, click Yes.

If a pipeline remains in a Stopping state, you can force Data Collector to stop the pipeline immediately.

In some situations, a pipeline can remain
in a Stopping state for up to five minutes. For example, if a scripting
processor in the pipeline includes code with a timed wait or an infinite
loop, Data Collector waits for five minutes before forcing the pipeline to stop.

To force a pipeline to stop from the Home page, click
the More icon for the pipeline, and then click
Force Stop. Or to force a pipeline to stop from the
pipeline canvas, click Force Stop.

The Force Stop Pipeline Confirmation dialog box
appears.

To force the pipelines to stop, click Yes.

Importing Pipelines

Import pipelines to use
pipelines developed on a different Data Collector or
to restore backup files.

You can import pipelines from individual pipeline files or from a ZIP file containing
multiple pipeline files. Pipeline files are JSON files exported from a Data Collector
instance.

Importing a Single Pipeline

You can import a single pipeline from a pipeline
JSON file exported from a Data Collector
instance. When you import a single pipeline, you can rename the pipeline during the
import.

To import a single pipeline, from the Home page or
Getting Started page, click Import
Pipeline.

In the Import Pipeline dialog box, enter a pipeline name
and optional description. Browse and select the pipeline file, and then click
Open.

To import the pipeline, click Import.

Importing a Set of Pipelines

You can import a set of pipelines from a ZIP file
that contains multiple pipeline JSON files. When you import a set of pipelines, Data Collector imports the existing pipeline names. If necessary, you can rename the pipelines
after the import.

To import a set of pipelines, from the Home page or
Getting Started page, click Import Pipelines
from Archive.

In the Import Pipelines from Archive dialog box, browse
and select the ZIP file that contains the pipeline files, and then click
Open.

To import all pipelines in the file, click Import.

Sharing Pipelines

When you
create a pipeline, you become the owner of the pipeline. As the owner of a pipeline, you
have all permissions for the pipeline, you can configure pipeline sharing, and you can
change the owner of the pipeline. A pipeline can have a single user as the owner.

Like the pipeline owner, a user with the Admin role also has all permissions for all
pipelines, can configure pipeline sharing and can change the pipeline owner.

By default, all other users have no access to pipelines. To allow other users to work
with a pipeline, you must share the pipeline with the users or their groups, and
configure pipeline permissions.

When you share a pipeline, you can configure the following permissions for each user and
group:

Permission

Description

Read

View and monitor the pipeline, and see alerts. View existing snapshot data.

Write

Edit the pipeline and alerts.

Execute

Start and stop the pipeline. Preview data and take a
snapshot.

When someone shares a pipeline with you, it displays in the Pipeline library under the
Shared With Me label in the pipeline library.

Sharing a Pipeline

Share a pipeline to
allow users to perform pipeline-related tasks. You can share a pipeline with individual
users or with groups.

You can share a pipeline if you are the owner of the pipeline or
a user with the Admin role.

You can configure pipeline sharing at any time,
but pipeline permissions are only enforced when Data Collector is enabled to use pipeline access controls. The sharing configuration goes into
effect when sharing is enabled.

You can share a pipeline from either of the following locations:

From the Home page, select the pipeline, click the
More icon, and click
Share.

From the pipeline canvas, click the Share icon:
.

In the Sharing Settings dialog box, click in the
Select Users and Groups window, select the users and
groups that you want to share with, and click Add.

Configure the permissions that you want each user and group to have and click
Save.

Changing the Pipeline Owner

As the owner
of a pipeline or a user with the Admin role, you can specify a user as the pipeline
owner.

The pipeline owner has all permissions for the pipeline and can configure
sharing for other users and groups. There can only be one pipeline
owner.

You can configure pipeline permissions from the following locations:

From the Home page, select the pipeline, click the
More icon, and click
Share.

From the pipeline canvas, click the Share icon:
.

In the Sharing Settings dialog box, if necessary, add the
user that you want to use as the owner.

To select a new owner, click the More icon for the user
and click Is Owner.

Click Save to save the change.

Adding Labels to Pipelines

You
can add labels to pipelines to group similar pipelines. For example, you might want to
group pipelines by database schema or by the test or production environment.

You can use nested labels to create a hierarchy of pipeline
groupings. Enter nested labels using the following
format:

<label1>/<label2>/<label3>

For example, you
might want to group pipelines in the test environment by the origin system. You
add the labels Test/HDFS and Test/Elasticsearch to the appropriate pipelines.

You can add labels to pipelines from the following locations:

From the Home page, select pipelines in the list, click the
More icon, and then click Add
Labels. Enter labels and then click
Save.

Note: Existing labels that have already been added
to the pipeline are ignored.

From the pipeline canvas, click the General tab and then
enter labels for the Labels property.

Exporting Pipelines

Export pipelines to
create backups or to use the pipelines with another Data Collector. You can export a single pipeline or a set of pipelines at one time.

When you export a single pipeline, Data Collector generates a JSON file named after the pipeline, as follows: <pipeline
name>.json.

When you export a set of pipelines, Data Collector creates a ZIP file named pipelines.zip.

To export a single pipeline, from the Home page, select
the pipeline from the list, click the More icon, and then
click Export.

To export a set of pipelines, select multiple pipelines, click the
More icon, and then click
Export.

Or, to export a single pipeline from the pipeline canvas, click the
More icon for the pipeline and then click
Export.

Duplicating a Pipeline

Duplicate a pipeline when you want to keep the existing version of a pipeline while
continuing to configure a duplicate version. A duplicate is an exact copy of the original
pipeline.

When you duplicate
a pipeline, you can rename the pipeline and specify the number of copies to make.

From the Home page, select a pipeline in the list view and
then click the Duplicate icon. Or to duplicate a pipeline
in the pipeline canvas, click the More icon for the
pipeline and then click Duplicate.

In the Duplicate Pipeline Definition dialog box, enter a
name for the duplicate pipeline and the number of copies to make.

When you create multiple copies, Data Collector appends an integer after the pipeline name. For example, if you enter the
name "test" and create two copies of the pipeline, Data Collector names the duplicate pipelines "test1" and "test2".

Click Duplicate.

The duplicate pipelines display.

Deleting Pipelines

You can delete
pipelines when you no longer need them. Deleting pipelines is permanent. To keep
backups, export the pipelines before you delete them.

From the Home page, select pipelines in the list and then
click the Delete icon. Or to delete a pipeline in the
pipeline canvas, click the More icon for the pipeline and
then click Delete.