Add an HBase gateway role to all YARN worker hosts and the edge host where you run spark-submit, spark-shell, or
pyspark and deploy HBase client configurations.

Limitation with Region Pruning for HBase Tables

When SparkSQL accesses an HBase table through the HiveContext, region pruning is not performed. This limitation can result in slower performance
for some SparkSQL queries against tables that use the HBase SerDes than when the same table is accessed through Impala or Hive.

Limitations in Kerberized Environments

The following limitations apply to Spark applications that access HBase in a Kerberized cluster:

The application must be restarted every seven days.

If the cluster also has HA enabled, you must specify the keytab and principal parameters in your command line (as opposed
to using kinit). For example:

Accessing Hive from Spark

The host from which the Spark application is submitted or on which spark-shell or pyspark runs must have a Hive gateway role defined in Cloudera Manager and client
configurations deployed.

When a Spark job accesses a Hive view, Spark must have privileges to read the data files in the underlying Hive tables. Currently, Spark cannot use fine-grained privileges based on the
columns or the WHERE clause in the view definition. If Spark does not have the required privileges on the underlying data files, a SparkSQL query against the view
returns an empty result set, rather than an error.

Running Spark Jobs from Oozie

For CDH 5.4 and higher you can invoke Spark jobs from Oozie using the Spark action. For information on the Spark action, see Oozie Spark Action Extension.

In CDH 5.4, to enable dynamic allocation when running the action, specify the following in
the Oozie workflow:

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.