Warehouse: Step 5. Configure the Destination Using SFTP

This topic describes the tasks to configure Warehouse Connector to write to a remote destination using Secure File Transfer Protocol (SFTP). The remote destination can be a remote server that is NFS mounted to the MapR cluster or it can be a remote staging server.

By default, in the remote destination the Warehouse Connector writes data in the following directory structure:

Where <staging_folder> is the folder on the remote server where the Warehouse Connector writes the data.

If you are using a remote staging server as the remote destination, you need to manually copy or move the directory structure to any of the following deployments:

RSA Analytics Warehouse (MapR)

Commercial MapR M5 Enterprise Edition for Apache Hadoop

Pivotal HD

Caution: To generate reports from the data written by Warehouse Connector, make sure that in your Hadoop deployment you maintain a similar directory structure that is created by Warehouse Connector in the remote destinations.

The following illustration describes how you can use SFTP to write data from Warehouse Connector to a remote destination.

Prerequisites

Make sure that you have:

Installed the Warehouse Connector service or virtual appliance in your network environment.

Added the Warehouse Connector service to Security Analytics. For more information, see the Add a Service to a Host. topic in the Hosts and Services Getting Started Guide.

For the SFTP destination type, the destination host should be listed in the /root/.ssh/known_hosts file used by the ssh service (i.e. sshd) running on the Warehouse Connector.To add the destination host to the /root/.ssh/known_hosts file, from the Warehouse Connector host, initiate a secure connection to the destination host. Perform the following steps:

Login to the Warehouse Connector.

Enter ssh root@<SAWIP> or ssh username@<SAWIP>.

Select Yes and enter the password.

Add the host key in the /root/.ssh/known_hosts file

Note: After you upgrade Warehouse Connector to 10.6, you must make sure that the destination host is listed in the /root/.ssh/known_hosts file used by the ssh service (i.e. sshd) running on the Warehouse Connector. If you do not perform this action, the streams configured with SFTP in Warehouse Connector will not start.

If you want to use SFTP to write data into the destination using SSH key-based access, you need to configure SSH key-based access between the Warehouse Connector and the Warehouse host or hadoop node. For more information, see Configure SSH Keys below.

Note: If you want to enable checksum validation to validate the integrity of the AVRO files that are transferred from the Warehouse Connector to the destinations, make sure that you generate the keys without setting the passphrase and do a key exchange between warehouse connector and the warehouse nodes.

On the Sources and Destinations tab, in the Destination Configuration section, click .

In the Add Destination dialog, select SFTP from the Type drop-down list.

In the Name field, enter a unique symbolic name for the destination.

Note: The Name field does not support space or special characters except underscore (_).

In the Host field, enter the remote server IP address.

In the Port field, retain the default port, 22.

In the Username field, enter the SSH username.

Note: In the case of Pivotal HD, ensure that the username is gpadmin and for password-based access the password for gpadmin should be used or for passphrase-based access, the passphrase used to generate the keys for gpadmin user should be used.

In the Password/Passphrase field, enter one of the following:

SSH password, if you are using SFTP to write data into the destination using password-based access.

SSH passphrase, if you are using SFTP to write data into the destination using SSH key-based access.

In the Remote Path field, enter the path of the directory present on the SFTP server.

Click Save.

(Optional) If you want to enable checksum validation, perform the following:

In the Security Analytics menu, select Administration > Services.

In the Services view, select the added Warehouse Connector service, and >View > Explore. The Explore view of Warehouse Connector is displayed.

In the options panel, navigate to warehouseconnector/destinations/sftp/config.

Set the parameter isChecksumValidationRequired to 1.

Restart the respective stream.

Configure SSH Keys

Follow these steps to configure SSH key-based access between the Warehouse Connector and the Warehouse host or hadoop node.

Generate SSH keys on the Warehouse Connector at the default location. Perform the following:

Log on to the Warehouse Connector.

Type the following command and press ENTER:

$ ssh-keygen -t dsa

The command prompts you to enter the file in which to save the generated key.

Enter file in which to save the key (/root/.ssh/id_dsa):

Enter the file in which you want to save the key and press ENTER.The command prompts you to enter and confirm the passphrase.

Enter passphrase (empty for no passphrase): Enter same passphrase again:

The public key is generated and is saved in the location that you provided.

Note: If you want to enable checksum validation to validate the integrity of the AVRO files that are transferred from the Warehouse Connector to the destinations, make sure that you do not set the passphrase.

Append the generated public key to the remote Warehouse host or hadoop node's authorized keys list located at:~/.ssh/authorized_keys

Result

You can now securely communicate between Warehouse Connector and Warehouse nodes or hadoop nodes.