Issues Fixed in CDH 5.5.5

Apache Oozie

The Oozie Web Console returns a 500 error when the Oozie server is running on JDK 8u75 and higher. The Oozie server still functions, and you can use the Oozie command line, REST API,
Java API, or the Hue Oozie Dashboard to review status of those jobs.

Issues Fixed in CDH 5.5.4

Apache Hadoop

FSImage may get corrupted after deleting snapshot

When deleting a snapshot that contains the last record of a given INode, the fsimage may become corrupt because the create list of the snapshot diff in the previous snapshot and the
child list of the parent INodeDirectory are not cleaned.

Apache HBase

The ReplicationCleaner process can abort if its connection to ZooKeeper is inconsistent

If the connection with ZooKeeper is inconsistent, the ReplicationCleaner may abort, and the following event is logged by the HMaster:

WARN org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner: Aborting ReplicationLogCleaner
because Failed to get list of replicators

Unprocessed WALs accumulate.

The seekBefore() method calculates the size of the previous data block by assuming that data blocks are contiguous, and HFile v2 and higher store Bloom
blocks and leaf-level INode blocks with the data. As a result, reverse scans do not work when Bloom blocks or leaf-level INode blocks are present when HFile v2 or higher is used.

Workaround: Restart the HMaster occasionally. The ReplicationCleaner restarts if necessary and process the unprocessed
WALs.

SOLR-6820 - Make the number of version buckets used by the UpdateLog configurable as
increasing beyond the default 256 has been shown to help with high volume indexing performance in SolrCloudIncrease the default number of buckets to 65536 instead of 256, fix numVersionBuckets name
attribute in configsets

SOLR-7281 - Add an overseer action to publish an entire node as 'down'

SOLR-7332 - Initialize the highest value for all version buckets with the max value from the
index or recent updates to avoid unnecessary lookups to the index to check for reordered updates when processing new documents

HBASE-14533 - Connection Idle time 1 second is too short and the connection is closed too
quickly by the ChoreService. Increase it to the default (10 minutes) for testAll(). The patch is not committed upstream yet.

HBASE-14541 - TestHFileOutputFormat.testMRIncrementalLoadWithSplit failed due to too many
splits and few retries

Issues Fixed in CDH 5.5.1

The following issues have been fixed in CDH 5.5.1:

Apache Commons Collections deserialization vulnerability

Cloudera has learned of a potential security vulnerability in a third-party library called the Apache Commons Collections. This library is used in products distributed and supported by Cloudera (“Cloudera Products”), including core Apache Hadoop. The Apache Commons Collections
library is also in widespread use beyond the Hadoop ecosystem. At this time, no specific attack vector for this vulnerability has been identified as present in Cloudera Products.

In an abundance of caution, we are currently in the process of incorporating a version of the Apache Commons Collections library with a fix into the Cloudera Products. In most cases,
this will require coordination with the projects in the Apache community. One example of this is tracked by HADOOP-12577.

The Apache Commons Collections potential security vulnerability is titled “Arbitrary remote code execution with InvokerTransformer” and is tracked by COLLECTIONS-580. MITRE has not issued a CVE, but related CVE-2015-4852 has been filed for the vulnerability. CERT has issued Vulnerability Note #576313 for this issue.

Fix for Tail Directory Source FileNotFoundException.

Fix for Kafka Channel timeout property handling.

Apache Hadoop

YARN/MapReduce

Incorrect headroom leads to deadlock between mappers and reducers.

Blacklisting Support for Scheduling ApplicationMasters

When an ApplicationMaster fails, and the NodeManager on the same host has not yet been blacklisted, the framework should route the second ApplicationMaster attempt to a NodeManager on a
different host.

Replication factor is not properly set in SparkHashTableSinkOperator

Hive LDAP Authenticator should allow users to set Domain without the base Distinguished Name

When the base distinguished name (DN) is not configured but only the Domain has been set in hive-site.xml, the LDAP authentication provider cannot locate the user in the directory.
Authentication fails in such cases.

The shuffle service fails on NodeManager restarts and kills all running Spark applications

In CDH 5.4.0 through CDH 5.4.4, the shuffle service is on by default. Because it fails in NodeManager restarts, in CDH 5.4.5, and higher, the shuffle service is off by default. Dynamic
allocation requires that the shuffle service be turned on.

Spark not automatically picking up hive-site.xml

Workaround: Do one of the following, depending on which deployment mode you are running in:

Client - set HADOOP_CONF_DIR to /etc/hive/conf/ (or the directory where hive-site.xml
is located).

Cluster - add --files=/etc/hive/conf/hive-site.xml (or the path for hive-site.xml) to the spark-submit script.

Apache Sentry (incubating)

Synchronize calls in SentryClient and create Sentry client once per request in SimpleDBProvider

Adds proper locking to the SentryClient and reduces the number of SentryClients created within a single request in the SimpleDbProvider (used by Hive). This fixes issues that may have
caused transient permission failures and out of memory conditions.

Sentry-HDFS Sync was treating database and table names as case-sensitive. This led to incorrect or missing ACLs being applied as part of the sync operation if the DDL operations used a
different case for the catalog objects.

Cloudera Search

The GoLive Function Does not Support Running As a Configurable User

After using --go-live mode with the MapReduceIndexerTool and HBaseMapReduceIndexerTool, depending on group mappings and the configured HDFS umask, Solr may
not have been able to read the results of the indexing job.

With Search for CDH 5.5 and later, the MapReduceIndexerTool and HBaseMapReduceIndexerTool includes updated --go-live functionality. The indexers now
automatically update HDFS ACLs for the specified output directory, giving Solr permission to read the indexer results.

Workaround: Do not use the --golive mode with MapReduceIndexerTool and HBaseMapReduceIndexerTool or use a less restrictive
umask.

MapReduceIndexerTool fails to Index Documents When Sentry Is Enabled

Prior to CDH 5.5, when Sentry was enabled, the MapReduceIndexerTool was unable to index data even if the user was authorized to write to the collection according to Sentry permissions.
This limitation occurred because, by default, the MapReduceIndexerTool used the underlying collection's solrconfig.xml from ZooKeeper to build the index using its
EmbeddedSolrServers. But the embedded servers are not properly configured to use Sentry, so this process failed.

With Search for CDH 5.5, the MapReduceIndexerTool uses a default solrconfig.xml that is appropriate for the vast majority of collection configurations.
With this configuration, the MapReduceIndexerTool is able to index data, even if Sentry is enabled. Note that this default configuration does not include any updateRequestProcessorChains; if your configuration requires an updateRequestProcessorChain, you can tell the MapReduceIndexerTool to use the
configuration from ZooKeeper by specifying --use-zk-solrconfig.xml or from local disk by specifying --solr-home-dir.

Bug: None.

Workaround: To address this issue, configure the MapReduceIndexerTool to run without Sentry restrictions. This does not compromise security because this
only affects the "embedded" Solr Servers in the job that are used to build the offline index; Solr's Sentry permissions are still checked when the data is merged into the cluster via --go-live.

Here are two ways to enable indexing:

If your environment uses the default configuration files, use solrconfig.xml for indexing jobs, rather than solrconfig.xml.secure. Use the --solr-home-diroption to specify the directory containing solrconfig.xml, causing the
job to run with Sentry disabled.

Alternately, you can comment out the following line:

<str name="update.chain">updateIndexAuthorization</str>

This line must be commented out and the change saved in the solrconfig file used by the machine running the indexing job.