Installation with Cloudera Manager

You can use Cloudera Manager to easily
install Data Collector
across the cluster as an add-on service.

To install Data Collector
through Cloudera Manager, perform the following steps:

Install the StreamSets custom service descriptor (CSD).

(Optional.) Manually install the parcel and checksum files. Typically only
needed when the Cloudera Manager Server does not have internet access.

Download, distribute, and activate the StreamSets parcel.

Configure the StreamSets service.

Afterwards, you can configure the Data Collector if
necessary.

Important: When you use Cloudera Manager to install Data Collector, you must use Cloudera Manager to configure Data Collector properties and environment. Manual changes to the Data Collector properties or environment files are not recognized by Cloudera Manager.

This documentation includes details about Cloudera Manager to simplify the installation
and configuration process. For more information about using Cloudera Manager, see the
Cloudera documentation.

Step 1. Install the StreamSets Custom Service Descriptor

Install the StreamSets custom service descriptor file
(CSD), and then restart Cloudera Manager.

Copy the Data Collector CSD file
to the Local Descriptor Repository Path. By default, the
path is /opt/cloudera/csd.

To verify the path to use, in Cloudera Manager, click Administration > Settings. In the navigation panel, select the Custom Service
Descriptors category. Place the CSD file in the path configured
for Local Descriptor Repository Path.

Set the file ownership to cloudera-scm:cloudera-scm with
permission 644.

Copy the StreamSets parcel and checksum files to the Cloudera
Manager Local Parcel Repository Path.

By default, the path is /opt/cloudera/parcel-repo.

To verify the path to use, click Administration > Settings. In the navigation panel, select the
Parcels category. Place the StreamSets parcel file in
the path configured for Local Parcel Repository Path.

Step 3. Distribute and Activate the StreamSets Parcel

After you add the StreamSets repository to
Cloudera Manager, you can download, distribute, and activate the StreamSets parcel
across the cluster.

Note: The StreamSets parcel repository is added to Cloudera Manager during the
installation of the CSD. However, if installing the parcel before the CSD, the
StreamSets parcel repository URL is located at: https://archives.streamsets.com/index.html. Download the correct parcel type for the operating
system that you use.

When working with multiple clusters, perform the following steps for each
cluster.

To view the list of available parcels, in the menu bar, click the
Parcels icon.

The StreamSets parcel displays in the list of available parcels. If it doesn't
display, click Check for New Parcels.

To download the StreamSets parcel to the local repository, click
Download.

After the parcel is downloaded, the Download button becomes the Distribute
button.

To distribute the StreamSets parcel to the cluster, click
Distribute.

After distribution, the Distribute button becomes the Activate button.

To activate the StreamSets parcel, click Activate.

Step 4. Configure the StreamSets Service

When you configure the service, you assign
Data Collector to the hosts where you want it to run.

To run Data Collector in
cluster streaming mode, colocate Data Collector on
a node with the Spark Gateway role. To run Data Collector in
cluster batch mode, colocate Data Collector on
a node with the YARN Gateway role.

To write to HDFS, colocate Data Collector on a node with the HDFS Gateway role. Similarly, to write to HBase or Hive,
colocate Data Collector on nodes with the HBase or Hive Gateway roles, respectively.

When working
with multiple clusters, perform the following steps for each cluster.

In Cloudera Manager, click the menu for the cluster you want to use, then click
Add a Service.

In the Service Types list, select
StreamSets, then click
Continue.

To select the hosts where you want to install StreamSets, on the
Customize Role Assignments for StreamSets page, click
Select Hosts to open a list of available hosts.

Select one or more hosts, then click OK. Click
Continue.

The Review Changes page displays the Data and Resource
directories for the Data Collector.

Optionally change the directories, then click
Continue.

The First Run Command page displays status updates as
Cloudera Manager starts Data Collector on the selected hosts.