About Greenplum Database PL/R

PL/R is a procedural language. With the Greenplum Database PL/R extension you can write
database functions in the R programming language and use R packages that contain R functions
and data sets.

For information about supported PL/R versions, see the Greenplum Database Release
Notes.

Installing PL/R

The PL/R extension is available as a package. Download the package
from Pivotal Network and install it with the Greenplum Package
Manager (gppkg).

The gppkg utility installs Greenplum Database extensions, along with any
dependencies, on all hosts across a cluster. It also automatically installs extensions on
new hosts in the case of system expansion and segment recovery.

For information about gppkg, see the Greenplum Database Utility
Guide.

Installing the Extension Package

Before you install the PL/R extension, make sure that your Greenplum Database is
running, you have sourced greenplum_path.sh, and that the
$MASTER_DATA_DIRECTORY and $GPHOME variables are
set.

Install the software extension package by running the
gppkg command. This example installs the PL/R extension on a Linux
system:

$ gppkg -i plr-2.3.1-gp5-rhel6-x86_64.gppkg

Restart the database.

$ gpstop -r

Source the file $GPHOME/greenplum_path.sh.

The extension and the R environment is installed in this directory:

$GPHOME/ext/R-3.3.3/

Note: The version of some shared libraries installed with the operating system
might not be compatible with the Greenplum Database PL/R extension.

If a shared
library is not compatible, edit the file $GPHOME/greenplum_path.sh in
all Greenplum Database master and segment hosts and set environment variable
LD_LIBRARY_PATH to specify the location that is installed with the
PL/R extension.

Uninstalling PL/R

When you remove PL/R language support from a database, the PL/R routines that you created
in the database will no longer work.

Remove PL/R Support for a Database

For a database that no longer requires the PL/R language, remove support for PL/R with
the SQL command DROP LANGUAGE or the Greenplum Database
droplang utility. Because PL/R is an untrusted language, only
superusers can remove support for the PL/R language from a database. For example,
running this command as the gpadmin user removes support for PL/R from the database
named testdb:

$ droplang plr -d testdb

Uninstall the Extension Package

If no databases have PL/R as a registered language, uninstall the Greenplum PL/R
extension with the gppkg utility. This example uninstalls PL/R package
version 2.3.1

$ gppkg -r plr-2.3.1

You can run the gppkg utility with the options -q
--all to list the installed extensions and their versions.

Restart the database.

$ gpstop -r

Enabling PL/R Language Support

For each database that requires its use, register the PL/R language with the SQL command
CREATE LANGUAGE or the utility createlang. Because
PL/R is an untrusted language, only superusers can register PL/R with a database. For
example, running this command as the gpadmin system user registers the
language with the database named testdb:

$ createlang plr -d testdb

PL/R is registered as an untrusted language.

Examples

The following are simple PL/R examples.

Example 1: Using PL/R for single row operators

This function generates an array of numbers with a normal distribution using the R
function rnorm().

Downloading and Installing R Packages

R packages are modules that contain R functions and data sets. You can install R packages
to extend R and PL/R functionality in Greenplum Database.

Greenplum Database provides a collection of data science-related R
libraries that can be used with the Greenplum Database PL/R language. You can download
these libraries in .gppkg format from Pivotal Network. For information about the libraries, see R Data
Science Library Package.

Note: If you expand Greenplum Database and add segment hosts, you must install
the R packages in the R installation of the new hosts.

For an R package, identify all dependent R packages and each package web
URL. The information can be found by selecting the given package from the following
navigation page:

Use the gpscp utility and the
hosts_all file to copy the tar.gz files to the same
directory on all nodes of the Greenplum Database cluster. The hosts_all
file contains a list of all the Greenplum Database segment hosts. You might require root
access to do
this.

gpscp -f hosts_all Matrix_0.9996875-1.tar.gz =:/home/gpadmin

gpscp -f /hosts_all arm_1.5-03.tar.gz =:/home/gpadmin

Use the gpssh utility in interactive mode to log into
each Greenplum Database segment host (gpssh -f all_hosts). Install the
packages from the command prompt using the R CMD INSTALL command. Note
that this may require root access. For example, this R install command installs the
packages for the arm package.

$R_HOME/bin/R CMD INSTALL Matrix_0.9996875-1.tar.gz arm_1.5-03.tar.gz

Ensure that the package is installed in the
$R_HOME/library directory on all the segments (the
gpssh can be use to install the package). For example, this
gpssh command list the contents of the R library
directory.

gpssh -s -f all_hosts "ls $R_HOME/library"

The
gpssh option -s sources the
greenplum_path.sh file before running commands on the remote hosts.

Test if the R package can be loaded.

This function performs a simple
test to if an R package can be
loaded:

Displaying R Library Information

You can use the R command line to display information about the installed libraries and
functions on the Greenplum Database host. You can also add and remove libraries from the R
installation. To start the R command line on the host, log into the host as the gadmin
user and run the script R from the directory $GPHOME/ext/R-3.3.3/bin.

This R function lists the available R packages from the R command line:

> library()

Display the documentation for a particular R package

> library(help="package_name")
> help(package="package_name")

Display the help file for an R function:

> help("function_name")
> ?function_name

To see what packages are installed, use the R command
installed.packages(). This will return a matrix with a row for each
package that has been installed. Below, we look at the first 5 rows of this matrix.

> installed.packages()

Any package that does not appear in the installed packages matrix must be installed and
loaded before its functions can be used.

References

https://cran.r-project.org/web/packages/PivotalR/ - The home
page for PivotalR, a package that provides an R interface to operate on Greenplum Database
tables and views that is similar to the R data.frame. PivotalR also
supports using the machine learning package MADlib directly from R.