The Apache Software FoundationBlogging in Action.

Apache Sqoop

How to Secure Apache
Sqoop Jobs with Oracle Wallet

ApacheSqoop is a tool that transfers data between
the Hadoop ecosystem and enterprise data stores. Sqoop does this by providing
methods to transfer data toHDFS orHive (usingHCatalog). Oracle Database is one of the
databases supported by Apache Sqoop.

With Oracle Database, the database connection
credentials are stored in Oracle Wallet. Oracle Wallet can act as the store of
keys and secrets such as authentication credentials. This post describes howOracleWalletManager adds a secure authentication layer for
Sqoop jobs.

In order to connect to an external database,
Sqoop users must provide a set of credentials specific to that data store.
These credentials are typically in the form of user name and password.

Some enterprises may already be standardized on
credential management tools provided by their enterprise DBMS vendor. These
companies may not be comfortable with any of Sqoop’s three authentication methods:

●Password provided on the command line

●Password read from the console during the
interactive execution of a Sqoop job.

●Password provided on a secure file system that
only the user can access.

With Oracle Database, the database connection
credentials can be securely stored in an Oracle Wallet. The use of Oracle
Wallet can allay the security concerns mentioned above because it provides a
secure client-side software container allowing secure storage of authentication
and signing credentials.

With support for Oracle Wallet, Sqoop jobs no
longer need to embed usernames and passwords. This reduces the risk of exposing
credentials and eases enforcement of authentication policies, since application
code and scripts need not change whenever usernames or passwords change.

Instructions for Using
Oracle Wallets for External Password Store

Please note that the following steps were tested
with Oracle 11gR2 database. We expect these instructions should be applicable
to other versions of Oracle DB, but they were not tested on versions other than
Oracle 11gR2.

At a high-level, these are the steps (which we
describe in detail below):

●Create an Oracle client-side wallet

●Create tnsnames.ora and sqlnet.ora files

●Add the database access credentials to the Oracle
wallet

●Test the Oracle wallet

●Use the Oracle wallet for Sqoop jobs

●Modify the JDBC URL for the connection to use the
wallet

●Run the Sqoop job with Oracle wallet

Create an Oracle Client-Side Wallet

The first step is to create an Oracle wallet to
hold the credentials used by Sqoop jobs. Oracle provides various tools to
manage the Oracle Wallet. This document describes one of those: the mkstore command line tool. The mkstore command can be used to create the
wallet:

mkstore -wrl <wallet_location>
-create

</o:p>

That command line creates a client side wallet at
the location <wallet_location>. Substitute <wallet_location> to your valid directory location.

It will be an auto-logon wallet. So before a
client process can use the wallet, the ownership of the wallet directory and
files (owner and group) must match the process uid and gid.

Make sure that the password matches the database
minimum requirements.

Create tnsnames.ora and
sqlnet.ora Files

The Oracle database uses a few network
configuration files. The files tnsnames.ora and sqlnet.ora are used for configuring client side network
access.

tnsnames.ora

The configuration file tnsnames.ora has client side local
naming parameters. The contents of tnsnames.ora are used to map the
network service name to an Oracle database connection descriptor.

Here is an example of a tnsnames.ora file:

W_ORCL
=

( DESCRIPTION =

(ADDRESS
= (PROTOCOL = TCP)(HOST = 192.168.64.51)(PORT = 1521))

(CONNECT_DATA
=

(SERVER
= DEDICATED)

(SERVICE_NAME
= orcl)

)

)

In the above tnsnames.ora file, the network
service name W_ORCL is mapped to a
connection descriptor accessing a database instance named ORCL running on a host with an IP
address 192.168.64.51.

sqlnet.ora

The configuration file sqlnet.ora is used to specify client side network
configuration properties that influence the connection profiles. This file
configures tracing, authentication, routing, and advanced security parameters.

Here is an example of a sqlnet.ora file:

WALLET_LOCATION
=

(SOURCE =

(METHOD = FILE)

(METHOD_DATA =

(DIRECTORY
= <full path to file>)

)

)

SQLNET.WALLET_OVERRIDE =
TRUE

SSL_CLIENT_AUTHENTICATION
= FALSE

The tnsnames.ora and sqlnet.ora files can be placed in separate directories or
they can both be placed in the wallet directory.

Add the Database Access
Credentials to the Oracle Wallet

Now that we’ve created an Oracle
Wallet, we need to populate it with the database access credentials. We can use the -createCredential
option to the mkstore command to do
this:

This will prompt for the wallet password, which
should be same as the one used when creating the wallet.

Test the Oracle Wallet

A quick way to test the password is to attempt a
DB session with SQLPLUS.

If the tnsnames.ora and sqlnet.ora settings are not in the default location ($ORACLE_HOME/network/admin), then you can provide the location of
these files using the TNS_ADMIN
environment variable.

Assuming the tnsnames.ora and sqlnet.ora are under $HOME/wallet_test, do the following:

</o:p>

export TNS_ADMIN=$HOME/wallet_test

sqlplus /@<svc_name> # in our case
w_orcl

</o:p>

This command should successfully establish the
connection. If it does not, check the tnsnames.ora contents to make sure the DB host, port or
service name are correct.

If the service name in tnsnames.ora changes, then the wallet entry has to be
created/modified.

Use the Oracle Wallet for Sqoop
Jobs

After successfully validating the wallet, it can
be used for Sqoop jobs. There are a few steps for providing the wallet with
Sqoop:

● Provide the wallet to the Sqoop launcher

● Provision the wallet to the mapper tasks

● Make command line changes for the Sqoop
launcher program

● Specify the location of the wallet and
Oracle configuration files to the mapper tasks

● Specify the option to localize the files
to the mapper tasks

● Modify the JDBC URL

Provide the Wallet to the Sqoop Launcher

For the Sqoop client program and the mappers launched
by the Sqoop job, we have to specify the location for the wallet and the tnsnames.ora and sqlnet.ora files.

Add the following files to the $SQOOP_HOME/lib directory (typically SQOOP_HOME is set to /usr/lib/sqoop):

●oraclepki.jar

●osdt_cert.jar

●osdt_core.jar

●ojdbc6.jar

These jars are available as part of an Oracle
Database installation.

Provision the Wallet to the Mapper Tasks

Copy the contents of the wallet directory from
the database host along with the tnsnames.ora and sqlnet.ora files to a folder, such as $HOME/wallet, as the operating system user who is launching the Sqoop command.

The wallet directory will have the following
files after this step:

●cwallet.sso

●ewallet.p12

●sqlnet.ora

●tnsnames.ora

The first two are the wallet files and the next
two are the Oracle client network configuration files that we saw before.

Make sure the wallet directory and the files in
that directory are owned by the operating system user and the group ownership
is the same as the Sqoop user group.

Make Command Line Changes for the Sqoop Launcher Program

Since the wallet location and the TNS locations
are different from the defaults, we have to override the location of the wallet
and the tnsnames.ora and sqlnames.ora files.

Oracle JDBC exposes two properties for this
purpose:

●oracle.net.tns_admin – Location of the
tnsnames.ora and sqlnet.ora files

●oracle.net.wallet_location – Location of the
wallet files, cwallet.sso and ewallet.p12

In this case we will set both these properties to
$HOME/wallet.

Setting these two system properties for Sqoop can
be achieved by setting the system environment variable HADOOP_OPTS which will be used for setting
additional Java options to the JVM.

Specify the Location of the Wallet and Oracle Configuration Files
to the Mapper Tasks

While the client program uses the DB for
retrieving metadata, the mapper tasks do the actual data transfer. To provide
the wallet files to the mapper tasks, we specify them as part of the –files
tool option to localize to each mapper. Also, we have to use the Hadoop
configuration properties to pass specific Java command line options (similar to
HADOOP_OPTS for the launcher).

Specify the Option to Localize the Files to the Mapper Tasks

This can be achieved by adding the –files option to the Sqoop command line.

Please note that Hadoop options have to come
before any Sqoop options. The option and option argument can be specified as
follows:

-files $HOME/wallet/cwallet.sso,\

$HOME/wallet/ewallet.p12,\

$HOME/wallet/sqlnet.ora,\

$HOME/wallet/tnsnames.ora

We also have to tell the Mapper tasks the
location of wallet and TNS files. We use the hadoop configuration parameter mapred.child.java.opts to provide additional Java options to
the mappers.

Note that we use the current directory as the
location (‘.’). This is because the files provided by the –files option will be localized to the current
directory for each mapper task.

If there are additional Java options that need to
be provided for the mappers, they should be added to the Java options given
below. For example, to add the system
property -Djava.security.egd=file:/dev/./urandom, the options would be
specified as:

Modify the JDBC URL

The JDBC URL provided should use the format jdbc:oracle:thin:@SVC where the SVC is the service name used for accessing the
database.

This is added as part of the createCredential
command above and should also be in the tnsnames.ora file.

Run the Sqoop Job with Oracle
Wallet

Please make sure that Apache
Sqoop v1.4.5 is used or (if using a prior version of Sqoop) install the OracleManager fix for wallet support.

When we use wallet-based authentication, we don’t
provide the username or password for establishing the JDBC connection.

Sqoop code has been enhanced to properly handle
this case (please seeSQOOP</spancolor:>-1224</spancolor:> for more information). Use Oracle
session user when the database user name is not explicitly provided.

This will be part of Sqoop 1.4.5 and later versions. For versions of Sqoop prior to 1.4.5, please
make sure that the Sqoop product has this patch applied.

Run the Sqoop Job

Now we are ready to use the wallet with the Sqoop
job.

Here is an excerpt from a script that can be used
after the wallet and Oracle client configuration setup are complete. This combines all of the steps previously
outlined in this post.