Configuring Sqoop 2

This section explains how to configure the Sqoop 2 server.
Note: Sqoop 2 is being deprecated. Cloudera recommends using Sqoop 1.

Configuring which Hadoop Version to Use

The Sqoop 2 client does not interact directly with Hadoop MapReduce, and so it does not require any MapReduce configuration.

The Sqoop 2 server can work with either MRv1 or YARN. It cannot work with both simultaneously.You set the MapReduce version the Sqoop 2 server works with
by means of the alternatives command (or update-alternatives, depending on your operating system):

To use YARN:

alternatives --set sqoop2-tomcat-conf /etc/sqoop2/tomcat-conf.dist

To use MRv1:

alternatives --set sqoop2-tomcat-conf /etc/sqoop2/tomcat-conf.mr1

Important:If you are upgrading from a release earlier than CDH 5 Beta 2

In earlier releases, the mechanism for setting the MapReduce version was the CATALINA_BASEvariable in the /etc/defaults/sqoop2-server file. This does not work as of CDH 5 Beta 2, and in fact could cause problems. Check your /etc/defaults/sqoop2-server file and make sure CATALINA_BASE is not set.

Configuring Sqoop 2 to Use PostgreSQL instead of Apache Derby

Deciding which Database to Use

Sqoop 2 has a built-in Derby database, but Cloudera recommends that you use a PostgreSQL database instead, for the following reasons:

Derby runs in embedded mode and it is not possible to monitor its health.

It is not clear how to implement a live backup strategy for the embedded Derby database, though it may be possible.

Under load, Cloudera has observed locks and rollbacks with the embedded Derby database which don't happen with server-based databases.

Restart the Sqoop 2 Server

$ sudo /sbin/service sqoop2-server start

Installing the JDBC Drivers

Sqoop 2 does not ship with third party JDBC drivers. You must download them separately and save them to the /var/lib/sqoop2/ directory on the server. The
following sections show how to install the most common JDBC drivers. Once you have installed the JDBC drivers, restart the Sqoop 2 server so that the drivers are loaded.
Note:

The JDBC drivers need to be installed only on the machine where Sqoop is executed; you do not need to install them on all nodes in your Hadoop cluster.

Installing the MySQL JDBC Driver

Download the MySQL JDBC driver here. You will need to sign up for an account if you don't
already have one, and log in, before you can download it. Then copy it to the /var/lib/sqoop2/ directory. For example:

At the time of publication, version was 5.1.31, but the version may have changed by the time you read this.
Important:

Make sure you have at least version 5.1.31. Some systems ship with an earlier version that may not work correctly with Sqoop.

Installing the Oracle JDBC Driver

You can download the JDBC Driver from the Oracle website, for example here. You must accept the license agreement before you can download the driver. Download the ojdbc6.jar file and copy it to /var/lib/sqoop2/ directory:

$ sudo cp ojdbc6.jar /var/lib/sqoop2/

Installing the Microsoft SQL Server JDBC Driver

Download the Microsoft SQL Server JDBC driver here and copy it
to the /var/lib/sqoop2/ directory. For example:

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.