Configuring Encryption for Data Spills

Some CDH services can encrypt data that lives temporarily on the local filesystem outside HDFS. This usually includes data that may spill to disk when
operations are too memory intensive and the service exceeds its allotted memory limit on a host. You can enable on-disk spill encryption for the following services.

MapReduce v2 (YARN)

MapReduce v2 allows you to encrypt intermediate files generated during encrypted shuffle and in case of data spills during the map and reduce stages. Enable this by setting the
following properties in mapred-site.xml.

mapreduce.job.encrypted-intermediate-data

Enable or disable encryption for intermediate MapReduce spills.

Default: false

mapreduce.job.encrypted-intermediate-data-key-size-bits

The key length used to encrypt data spilled to disk.

Default: 128

mapreduce.job.encrypted-intermediate-data.buffer.kb

The buffer size in Kb for the stream written to disk after encryption.

Default: 128

Note: Enabling encryption for intermediate data spills will restrict the number of attempts for a job to 1.

HBase

HBase does not write data outside HDFS, and does not require spill encryption.

Impala

Impala allows certain memory-intensive operations to be able to write temporary data to disk in case these operations approach their memory limit on a host. For details, read SQL Operations that Spill to Disk. To enable disk spill encryption in Impala:

Go to the Cloudera Manager Admin Console.

Click the Configuration tab.

Select Scope > Impala Daemon.

Select Category > Security.

Check the checkbox for the Disk Spill Encryption property.

Click Save Changes to commit the changes.

Hive

Hive jobs occasionally write data temporarily to local directories. If you enable HDFS encryption, you must ensure that the following intermediate local directories are also
protected:

LOCALSCRATCHDIR: The MapJoin optimization in Hive writes HDFS tables to a local directory and then uploads them to the
distributed cache. To ensure these files are encrypted, either disable MapJoin by setting hive.auto.convert.join to false, or encrypt the
local Hive Scratch directory (hive.exec.local.scratchdir) using Cloudera
Navigator Encrypt.

DOWNLOADED_RESOURCES_DIR: JARs that are added to a user session and stored in HDFS are downloaded to hive.downloaded.resources.dir on the HiveServer2 local filesystem. To encrypt these JAR files, configure Cloudera Navigator Encrypt to encrypt the directory specified by hive.downloaded.resources.dir.

NodeManager Local Directory List: Hive stores JARs and MapJoin files in the distributed cache. To use MapJoin or encrypt JARs and other resource files,
the yarn.nodemanager.local-dirs YARN configuration property must be configured to a set of encrypted local directories on all nodes.

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.