Procedural Language Support Matrix

The following table summarizes component version support for Procedural Languages available in Pivotal HDB 2.x. The versions listed have been tested with HDB. Higher versions may be compatible. Please test higher versions thoroughly in your non-production environments before deploying to production.

Pivotal HDB Version

PL/Java Java Version Requirement

PL/R R Version Requirement

PL/Perl Perl Version Requirement

PL/Python Python Version Requirement

2.1.0.0

1.7

3.3.1

5.10.1

2.6.2

2.0.1.0

1.7

3.3.1

5.10.1

2.6.2

2.0.0.0

1.6, 1.7

3.1.0

5.10.1

2.6.2

AWS Support Requirements

Pivotal HDB is supported on Amazon Web Services (AWS) servers using either Amazon block level Instance store (Amazon uses the volume names ephemeral[0-23]) or Amazon Elastic Block Store (Amazon EBS) storage. Use long-running EC2 instances with these for long-running HAWQ instances, as Spot instances can be interrupted. If using Spot instances, minimize risk of data loss by loading from and exporting to external storage.

Pivotal HDB 2.1.0 Features and Changes

Pivotal HDB 2.1.0 is based on Apache HAWQ (Incubating), and includes the following new features and changes in behavior as compared to Pivotal HDB 2.0.1.0:

HDP 2.5.0 support

This is an upgrade from HDP 2.4.0 and HDP 2.4.2 to HDP 2.5.0 stack. See HDP 2.5.0 Release Notes for details. The main change is upgrade to Apache Hadoop 2.7.3 from 2.7.1.

gporca upgrade to version 1.684 from 1.638

Many new features and bug fixes in the modular query optimizer are integrated with HAWQ. Refer to gporca releases for details.

HAWQ now makes available to the PXF service both the predicate (filter string) and the column projection information. With this feature, PXF plug-in developers can implement predicate pushdown for their custom plug-ins.

PXF checksum verification

HAWQ now performs client-side checksum verification when reading blocks of data from HDFS.

Installing HDP and HDB with Ambari 2.4.1

If you are using Ambari 2.4.1 and you want to install both HDP and HAWQ at the same time, special care must be taken if you want to install the very latest version of the HDP stack instead of the default version. Follow these steps:

After installing Ambari, start the Cluster Install Wizard and proceed until you reach the Select Version screen.

On the Select Version screen, select HDB-2.5 from the list of available stack versions.

While still on the Select Version screen, copy the Base URL values for the HDP-2.5 and HDP-UTILS-1.1.0.21 repositories that are listed for your operating system. Paste these values into a temporary file; you will need to restore these Base URL values later.

Use the drop-down menu for HDP-2.5 to select the stack option, HDP-2.5 (Default Version Definition). Verify that the hdb-2.1.0.0 and hdb-add-ons-2.1.0.0 repositories now appear in the list of Repositories for your operating system.

To install the very latest version of HDP, replace the Base URL values for the HDP-2.5 and HDP-UTILS-1.1.0.21 repositories with the values you pasted into the text file in Step 3.

Known Issues and Limitations

MADlib Compression

Pivotal HDB 2.1.0 is compatible with MADlib 1.9 and 1.9.1. However, you must download and execute a script in order to remove the MADlib Quicklz compression, which is not supported in HDB 2.1.0. Run this script if you are upgrading to HDB 2.1.0, or if you are installing MADlib on HDB 2.1.0.

Note: If you do not include the --prefix option, the script uses the location ${GPHOME}/madlib.

Continue installing MADlib using the madpack install command as described in the MADlib Installation Guide. For example:

$ madpack –p hawq install

Operating System

Some Linux kernel versions between 2.6.32 to 4.3.3 (not including 2.6.32 and 4.3.3) have a bug that could introduce a getaddrinfo() function hang. To avoid this issue, upgrade the kernel to version 4.3.3+.

Command Line Tools

HAWQ-1213 - hawq register returns the following error when you attempt to use a YAML file to register a randomly-distributed table to a destination randomly-distributed table that you created with a non-default default_hash_table_bucket_number:

Bucket number of <table-name> is not consistent with previous bucket number.

If you wish to use this feature in HDB 2.1.0, set default_hash_table_bucket_number to 6 before creating the destination randomly-distributed table you wish to register to.

PXF

PXF in a Kerberos-secured cluster requires YARN to be installed due to a dependency on YARN libraries.

HAWQ-974 - When using certain PXF profiles to query against larger files stored in HDFS, users may occasionally experience hanging or query timeout. This is a known issue that will be improved in a future HDB release. Refer to Addressing PXF Memory Issues for a discussion of the configuration options available to address these issues in your PXF deployment.

After upgrading from HDB version 2.0.0, HCatalog access through PXF may fail with the following error:

Note: Use the allow_system_table_mods server configuration parameter and identified SQL commands only in the context of this workaround. They are not otherwise supported.

PL/R

The HAWQ PL/R extension is provided as a separate RPM in the hdb-add-ons-2.1.0.0 repository. The files installed by this RPM are owned by root. If you installed HAWQ via Ambari, HAWQ files are owned by gpadmin. Perform the following steps on each node in your HAWQ cluster after PL/R RPM installation to align the ownership of PL/R files:

Ambari

Ambari-managed clusters should only use Ambari for setting system parameters. Parameters modified using the hawq configcommand will be overwritten on Ambari startup or reconfiguration.

In certain configurations, the HAWQ Master may fail to start in Ambari versions prior to 2.4.2 when webhdfs is disabled. Refer to AMBARI-18837. To work around this issue, enable webhdfs by setting dfs.webhdfs.enabled to True in hdfs-site.xml, or contact Support.

When installing HAWQ in a Kerberos-secured cluster, the installation process may report a warning/failure in Ambari if the HAWQ configuration for resource management type is switched to YARN mode during installation. The warning is related to HAWQ not being able to register with YARN until the HDFS & YARN services are restarted with new configurations resulting from the HAWQ installation process.

The Ambari Re-Synchronize HAWQ Standby Master service action fails if there is an active connection to the HAWQ master node. The HAWQ task output shows the error, Active connections. Aborting shutdown... If this occurs, close all active connections and then try the re-synchronize action again.

The Ambari Run Service Check action for HAWQ and PXF may not work properly on a secure cluster if PXF is not co-located with the YARN component.

In a secured cluster, if you move the YARN Resource Manager to another host you must manually update hadoop.proxyuser.yarn.hosts in the HDFS core-site.xml file to match the new Resource Manager hostname. If you do not perform this step, HAWQ segments fail to get resources from the Resource Manager.

The Ambari Stop HAWQ Server (Immediate Mode) service action or hawq stop -M immediate command may not stop all HAWQ master processes in some cases. Several postgres processes owned by the gpadmin user may remain active.

Ambari checks whether the hawq_rm_yarn_address and hawq_rm_yarn_scheduler_address values are valid when YARN HA is not enabled. In clusters that use YARN HA, these properties are not used and may get out-of-sync with the active Resource Manager. This can leading to false warnings from Ambari if you try to change the property value.

Ambari does not support Custom Configuration Groups with HAWQ.

Certain HAWQ server configuration parameters related to resource enforcement are not active. Modifying the parameters has no effect in HAWQ since the resource enforcement feature is not currently supported. These parameters include hawq_re_cgroup_hierarchy_name, hawq_re_cgroup_mount_point, and hawq_re_cpu_enable. These parameters appear in the Advanced hawq-site configuration section of the Ambari management interface.

Workaround Required after Moving Namenode

If you use the Ambari Move Namenode Wizard to move a Hadoop namenode, the Wizard does not automatically update the HAWQ configuration to reflect the change. This leaves HAWQ in an non-functional state, and will cause HAWQ service checks to fail with an error similar to: