PL/R is a procedural language. With the HAWQ PL/R extension, you can write database functions in the R programming language and use R packages that contain R functions and data sets.

Note: To use PL/R in HAWQ, R must be installed on each node in your HAWQ cluster. Additionally, you must install the PL/R package on an existing HAWQ deployment or have specified PL/R as a build option when compiling HAWQ.

PL/R Examples

This section contains simple PL/R examples.

Example 1: Using PL/R for Single Row Operators

This function generates an array of numbers with a normal distribution using the R function rnorm().

Use the hawq scp utility and the hawq_hosts file to copy the tar.gz files to the same directory on all nodes of the HAWQ cluster. The hawq_hosts file contains a list of all of the HAWQ segment hosts. You might require root access to do this.

Use the hawq ssh utility in interactive mode to log into each HAWQ segment host (hawq ssh -f hawq_hosts). Install the packages from the command prompt using the R CMD INSTALL command. Note that this may require root access. For example, this R install command installs the packages for the arm package.

Ensure that the R package was installed in the /usr/lib64/R/library directory on all the segments (hawq ssh can be used to install the package). For example, this hawq ssh command lists the contents of the R library directory.

$ hawq ssh -f hawq_hosts "ls /usr/lib64/R/library"

Verify the R package can be loaded.

This function performs a simple test to determine if an R package can be loaded:

This SQL command calls the previous function to determine if the R package arm can be loaded:

SELECTR_test_require('arm');

Displaying R Library Information

You can use the R command line to display information about the installed libraries and functions on the HAWQ host. You can also add and remove libraries from the R installation. To start the R command line on the host, log in to the host as the gpadmin user and run the script R.

$ R

This R function lists the available R packages from the R command line:

>library()

Display the documentation for a particular R package

>library(help="package_name")>help(package="package_name")

Display the help file for an R function:

>help("function_name")>?function_name

To see what packages are installed, use the R command installed.packages(). This will return a matrix with a row for each package that has been installed. Below, we look at the first 5 rows of this matrix.

>installed.packages()

Any package that does not appear in the installed packages matrix must be installed and loaded before its functions can be used.

References

https://github.com/pivotalsoftware/PivotalR - GitHub repository for PivotalR, a package that provides an R interface to operate on HAWQ tables and views that is similar to the R data.frame. PivotalR also supports using the machine learning package MADlib directly from R.