Metadata support for the JDBC driver. Both
DatabaseMetaData and ResultSetMetaData are
included.

The ability to return a ResultSet from a query as an alternative to
building a ResultSet row by row.

Full support for savepoints and exception handling.

The ability to use IN, INOUT, and OUT parameters.

Two separate Greenplum Database languages:

pljava, TRUSTED PL/Java language

pljavau, UNTRUSTED PL/Java language

Transaction and Savepoint listeners enabling code execution when a
transaction or savepoint is committed or rolled back.

Integration with GNU GCJ on selected platforms.

The Greenplum Database PL/Java extension package ships with embedded Java Runtime 6u32 that
is installed in the directory $GPHOME/ext/jre-1.6.0_32 on each of the
Greenplum Database hosts.

A function in SQL will appoint a static method in a Java class. In order for the function
to execute, the appointed class must available on the class path specified by the Greenplum
Database sever configuration parameter pljava_classpath. The PL/Java
extension adds a set of functions that helps installing and maintaining the java classes.
Classes are stored in normal Java archives, JAR files. A JAR file can optionally contain a
deployment descriptor that in turn contains SQL commands to be executed when the JAR is
deployed or undeployed. The functions are modeled after the standards proposed for SQL
2003.

PL/Java implements a standardized way of passing parameters and return values. Complex
types and sets are passed using the standard JDBC ResultSet class.

A JDBC driver is included in PL/Java. This driver calls Greenplum Database internal SPI
routines. The driver is essential since it is common for functions to make calls back to the
database to fetch data. When PL/Java functions fetch data, they must use the same
transactional boundaries that are used by the main function that entered PL/Java execution
context.

PL/Java is optimized for performance. The Java virtual machine executes within the same
process as the backend to minimize call overhead. PL/Java is designed with the objective to
enable the power of Java to the database itself so that database intensive business logic
can execute as close to the actual data as possible.

The standard Java Native Interface (JNI) is used when bridging calls between the backend
and the Java VM.

Installing PL/Java

For Greenplum Database, the PL/Java extension is available as a package. Download the
package from Pivotal Network and then install it with the Greenplum
Package Manager (gppkg).

The gppkg utility installs Greenplum Database extensions, along with any
dependencies, on all hosts across a cluster. It also automatically installs extensions on
new hosts in the case of system expansion and segment recovery.

For information about gppkg,see the Greenplum Database Utility
Guide.

To install and use PL/Java:

Install the Greenplum Database PL/Java extension.

Optional. Change the Java version used by PL/Java.

Enable the language for each database where you intend to use
PL/Java.

Install user-created JAR files containing Java methods into the same
directory on all Greenplum Database hosts.

Add the name of the JAR file to the Greenplum Database server
configuration parameter pljava_classpath. The parameter lists the
installed JAR files. For information about the parameter, see the Greenplum Database
Reference Guide.

Installing the Greenplum PL/Java Extension

Before you install the PL/Java extension, make sure that your Greenplum database is
running, you have sourced greenplum_path.sh, and that the
$MASTER_DATA_DIRECTORY and $GPHOME variables are set.

Download the PL/Java extension package from Pivotal Network then copy it to the master host.

Install the software extension package by running the
gppkg command. This example installs the PL/Java extension package on
a Linux
system:

$ gppkg -i pljava-ossv1.4.0_pv1.2_gpdb4.3orca-rhel5-x86_64.gppkg

Reload
greenplum_path.sh.

$ source $GPHOME/greenplum_path.sh

Restart Greenplum Database.

$ gpstop -r

Changing Java version used by PL/Java (Optional)

The PL/Java extension package ships with Java JRE 6u32. When the package is
installed, the JRE is installed on each host of your Greenplum Database cluster. To use
newer version of Java with PL/Java, perform these steps to specify the location of the
Java version. For information about supported Java versions, see the Greenplum
Database Release Notes for your release.

Note: The newer Java must be installed in the same location on all Greenplum Database
hosts and must be accessible to the system user of the Greenplum Database administrator
(gpadmin).

In the $GPHOME/greenplum_path.sh file, modify the
JAVA_HOME and LD_LIBRARY_PATH environment
variables.

Set the JAVA_HOME variable to the directory where your Java
Runtime is installed. For example, for Oracle JRE this directory would be
/usr/java/latest. For OpenJDK, the directory is
/usr/lib/jvm/jre. This example changes the environment variable
to use
/usr/java/latest.

JAVA_HOME=/usr/java/latest

Set the LD_LIBRARY_PATH to include the directory with Java server
runtime libraries. PL/Java depends on libjvm.so and the shared
object should be in your LD_LIBRARY_PATH. By default,
libjvm.so is available in
$JAVA_HOME/lib/amd64/server. This example adds the directory to
the environment
variable.

Enabling PL/Java and Installing JAR Files

Perform the following steps as the Greenplum Database administrator
gpadmin.

Enable PL/Java by running the SQL script
$GPHOME/share/postgresql/pljava/install.sql in the databases that
will use PL/Java. For example, this example enables PL/Java on the database
mytestdb:

$ psql -d mytestdb
-f $GPHOME/share/postgresql/pljava/install.sql

The
script install.sql registers both the trusted and untrusted PL/Java
language.

Copy your Java archives (JAR files) to the same directory on all
Greenplum Database hosts. This example uses the Greenplum Database
gpscp utility to copy the file myclasses.jar to the
directory
$GPHOME/lib/postgresql/java/:

The
file gphosts_file contains a list of the Greenplum Database
hosts.

Set the pljava_classpath server configuration parameter
in the master postgresql.conf file. For this example, the parameter
value is a colon (:) separated list of the JAR files. For
example:

$ gpconfig -c pljava_classpath
-v \'examples.jar:myclasses.jar\'

The
file examples.jar is installed when you install the PL/Java extension
package with the gppkg utility.

Note: If you install JAR files in
a directory other than $GPHOME/lib/postgresql/java/, you must specify
the absolute path to the JAR file. Each JAR file must be in the same location on all
Greenplum Database hosts. For more information about specifying the location of JAR
files, see the information about the pljava_classpath server
configuration parameter in the Greenplum Database Reference
Guide.

Reload the postgresql.conf
file.

$ gpstop -u

(optional) Greenplum provides an examples.sql file
containing sample PL/Java functions that you can use for testing. Run the commands in
this file to create the test functions (which use the Java classes in
examples.jar).

$ psql -f $GPHOME/share/postgresql/pljava/examples.sql

Uninstalling PL/Java

Remove PL/Java Support for a Database

For a database that no long requires the PL/Java language, remove support for PL/Java.
Run the uninstall.sql file as the gpadmin user. For example, this command
disables the PL/Java language in the specified database.

$ psql -d mydatabase
-f $GPHOME/share/postgresql/pljava/uninstall.sql

Uninstall the Java JAR files and Software Package

If no databases have PL/Java as a registered language, remove the Java JAR files and
uninstall the Greenplum PL/Java extension with the gppkg utility.

Remove the pljava_classpath server configuration
parameter from the postgresql.conf file on all Greenplum Database
hosts. For example:

Use the Greenplum gppkg utility with the
-r option to uninstall the PL/Java extension. This example uninstalls
the PL/Java extension on a Linux
system:

$ gppkg -r pljava-ossv1.4.0_pv1.3_gpdb4.3orca

You can
run the gppkg utility with the options -q --all to
list the installed extensions and their versions.

Reload
greenplum_path.sh.

$ source $GPHOME/greenplum_path.sh

Restart the database.

$ gpstop -r

About Greenplum Database PL/Java

There are a few key differences between the implementation of PL/Java in standard
PostgreSQL and Greenplum Database.

Functions

The following functions are not supported in Greenplum Database. The classpath is handled
differently in a distributed Greenplum Database environment than in the PostgreSQL environment.

sqlj.install_jar

sqlj.install_jar

sqlj.replace_jar

sqlj.remove_jar

sqlj.get_classpath

sqlj.set_classpath

Greenplum Database uses the pljava_classpath server configuration
parameter in place of the sqlj.set_classpath function.

Server Configuration Parameters

The following server configuration parameters are used by PL/Java in Greenplum Database.
These parameters replace the pljava.* parameters that are used in the
standard PostgreSQL PL/Java implementation:

pljava_classpath

A colon (:) separated list of the jar files containing the Java
classes used in any PL/Java functions. The jar files must be installed in the same
locations on all Greenplum Database hosts. With the trusted PL/Java language handler,
jar file paths must be relative to the $GPHOME/lib/postgresql/java/
directory. With the untrusted language handler (javaU language tag), paths may be
relative to $GPHOME/lib/postgresql/java/ or absolute.

The
server configuration parameter pljava_classpath_insecure controls
whether the server configuration parameter pljava_classpath can be
set by a user without Greenplum Database superuser privileges. When
pljava_classpath_insecure is enabled, Greenplum Database developers
who are working on PL/Java functions do not have to be database superusers to change
pljava_classpath.

SQL Declaration

A Java function is declared with the name of a class and a static method on that class.
The class will be resolved using the classpath that has been defined for the schema where
the function is declared. If no classpath has been defined for that schema, the public
schema is used. If no classpath is found there either, the class is resolved using the
system classloader.

The following function can be declared to access the static method
getProperty on java.lang.System class:

Type Mapping

Scalar types are mapped in a straight forward way. This table lists the current mappings.

Table 1. PL/Java data type mapping

PostgreSQL

Java

bool

boolean

char

byte

int2

short

int4

int

int8

long

varchar

java.lang.String

text

java.lang.String

bytea

byte[ ]

date

java.sql.Date

time

java.sql.Time (stored value treated as local time)

timetz

java.sql.Time

timestamp

java.sql.Timestamp (stored value treated as local
time)

timestampz

java.sql.Timestamp

complex

java.sql.ResultSet

setof complex

java.sql.ResultSet

All other types are mapped to java.lang.String and will utilize the standard
textin/textout routines registered for respective
type.

NULL Handling

The scalar types that map to Java primitives can not be passed as NULL
values. To pass NULL values, those types can have an alternative mapping.
You enable this mapping by explicitly denoting it in the method reference.

In order to return NULL values from a Java method, you use the object
type that corresponds to the primitive (for example, you return
java.lang.Integer instead of int). The PL/Java resolve
mechanism finds the method regardless. Since Java cannot have different return types for
methods with the same name, this does not introduce any ambiguity.

Complex Types

A complex type will always be passed as a read-only java.sql.ResultSet
with exactly one row. The ResultSet is positioned on its row so a call to
next() should not be made. The values of the complex type are retrieved
using the standard getter methods of the ResultSet.

Returning Complex Types

Java does not stipulate any way to create a ResultSet. Hence, returning a ResultSet is
not an option. The SQL-2003 draft suggests that a complex return value should be handled
as an IN/OUT parameter. PL/Java implements a ResultSet that way. If you declare a function
that returns a complex type, you will need to use a Java method with boolean return type
with a last parameter of type java.sql.ResultSet. The parameter will be
initialized to an empty updateable ResultSet that contains exactly one row.

Assume that the complexTest type in previous section has been
created.

The return value denotes if the receiver should be considered as a valid tuple (true) or
NULL (false).

Functions That Return Sets

When returning result set, you should not build a result set before returning it, because
building a large result set would consume a large amount of resources. It is better to
produce one row at a time. Incidentally, that is what the Greenplum Database backend
expects a function with SETOF return to do. You can return a SETOF a scalar type such as
an int, float or varchar, or you can
return a SETOF a complex type.

Returning a SETOF <scalar type>

In order to return a set of a scalar type, you need create a Java method that returns
something that implements the java.util.Iterator interface. Here is an
example of a method that returns a SETOF varchar:

Returning a SETOF <complex type>

A method returning a SETOF <complex type> must use either the interface
org.postgresql.pljava.ResultSetProvider or
org.postgresql.pljava.ResultSetHandle. The reason for having two
interfaces is that they cater for optimal handling of two distinct use cases. The former
is for cases when you want to dynamically create each row that is to be returned from the
SETOF function. The latter makes is in cases where you want to return the result of an
executed query.

Using the ResultSetProvider Interface

This interface has two methods. The boolean assignRowValues(java.sql.ResultSet
tupleBuilder, int rowNumber) and the void close() method.
The Greenplum Database query evaluator will call the assignRowValues
repeatedly until it returns false or until the evaluator decides that it does not need
any more rows. Then it calls close.

The listComplextTests method is called once. It may return
NULL if no results are available or an instance of the
ResultSetProvider. Here the Java class Fum
implements this interface so it returns an instance of itself. The method
assignRowValues will then be called repeatedly until it returns
false. At that time, close will be called

Using the ResultSetHandle Interface

This interface is similar to the ResultSetProvider interface in that
it has a close() method that will be called at the end. But instead of
having the evaluator call a method that builds one row at a time, this method has a
method that returns a ResultSet. The query evaluator will iterate over this set and
deliver the RestulSet contents, one tuple at a time, to the caller until a call to
next() returns false or the evaluator decides that no more rows are
needed.

Here is an example that executes a query using a statement that it obtained using the
default connection. The SQL suitable for the deployment descriptor looks like this:

After obtaining a connection, you can prepare and execute statements similar to other JDBC
connections. These are limitations for the PL/Java JDBC driver:

The transaction cannot be managed in any way. Thus, you cannot use methods
on the connection such as:

commit()

rollback()

setAutoCommit()

setTransactionIsolation()

Savepoints are available with some restrictions. A savepoint cannot
outlive the function in which it was set and it must be rolled back or released by that
same function.

A ResultSet returned from executeQuery() are always
FETCH_FORWARD and CONCUR_READ_ONLY.

Meta-data is only available in PL/Java 1.1 or higher.

CallableStatement (for stored procedures) is not
implemented.

The types Clob or Blob are not
completely implemented, they need more work. The types byte[] and
String can be used for bytea and text
respectively.

Exception Handling

You can catch and handle an exception in the Greenplum Database backend just like any other
exception. The backend ErrorData structure is exposed as a property in a class called
org.postgresql.pljava.ServerException (derived from
java.sql.SQLException) and the Java try/catch mechanism is synchronized
with the backend mechanism.

Important: You will not be able to continue executing backend functions until your
function has returned and the error has been propagated when the backend has generated an
exception unless you have used a savepoint. When a savepoint is rolled back, the exceptional
condition is reset and you can continue your execution.

Savepoints

Greenplum Database savepoints are exposed using the java.sql.Connection interface. Two
restrictions apply.

A savepoint must be rolled back or released in the function where it was
set.

A savepoint must not outlive the function where it was set

Logging

At present, the logger uses a handler that maps the current state of the Greenplum Database
configuration setting log_min_messages to a valid Logger level and that
outputs all messages using the Greenplum Database backend function elog().

Note: The log_min_messages setting is read from the database the first time a
PL/Java function in a session is executed. On the Java side, the setting does not change
after the first PL/Java function execution in a specific session until the Greenplum
Database session that is working with PL/Java is restarted.

The following mapping apply between the Logger levels and the Greenplum Database backend
levels.

Table 2. PL/Java Logging Levels

java.util.logging.Level

Greenplum Database Level

SEVERE ERROR

ERROR

WARNING

WARNING

CONFIG

LOG

INFO

INFO

FINE

DEBUG1

FINER

DEBUG2

FINEST

DEBUG3

Security

Installation

Only a database super user can install PL/Java. The PL/Java utility functions are
installed using SECURITY DEFINER so that they execute with the access permissions that
where granted to the creator of the functions.

Trusted Language

PL/Java is a trusted language. The trusted PL/Java language has no access to the
file system as stipulated by PostgreSQL definition of a trusted language. Any database
user can create and access functions in a trusted language.

PL/Java also installs a language handler for the language javau. This
version is not trusted and only a superuser can create new functions that use it.
Any user can call the functions.

Some PL/Java Issues and Solutions

When writing the PL/Java, mapping the JVM into the same process-space as the Greenplum
Database backend code, some concerns have been raised regarding multiple threads, exception
handling, and memory management. Here are brief descriptions explaining how these issues
where resolved.

Multi-threading

Java is inherently multi-threaded. The Greenplum Database backend is not. There is
nothing stopping a developer from utilizing multiple Threads class in the Java code.
Finalizers that call out to the backend might have been spawned from a background Garbage
Collection thread. Several third party Java-packages that are likely to be used make use
of multiple threads. How can this model coexist with the Greenplum Database backend in the
same process?

Solution

The solution is simple. PL/Java defines a special object called the
Backend.THREADLOCK. When PL/Java is initialized, the backend
immediately grabs this objects monitor (i.e. it will synchronize on this object). When
the backend calls a Java function, the monitor is released and then immediately regained
when the call returns. All calls from Java out to backend code are synchronized on the
same lock. This ensures that only one thread at a time can call the backend from Java,
and only at a time when the backend is awaiting the return of a Java function call.

Exception Handling

Java makes frequent use of try/catch/finally blocks. Greenplum Database sometimes use an
exception mechanism that calls longjmp to transfer control to a known
state. Such a jump would normally effectively bypass the JVM.

Solution

The backend now allows errors to be caught using the macros
PG_TRY/PG_CATCH/PG_END_TRY and in the catch block,
the error can be examined using the ErrorData structure. PL/Java implements a
java.sql.SQLException subclass called
org.postgresql.pljava.ServerException. The ErrorData can be retrieved
and examined from that exception. A catch handler is allowed to issue a rollback to a
savepoint. After a successful rollback, execution can continue.

Java Garbage Collector Versus palloc() and Stack Allocation

Primitive types are always be passed by value. This includes the String
type (this is a must since Java uses double byte characters). Complex types are often
wrapped in Java objects and passed by reference. For example, a Java object can contain a
pointer to a palloc'ed or stack allocated memory and use native JNI calls to extract and
manipulate data. Such data will become stale once a call has ended. Further attempts to
access such data will at best give very unpredictable results but more likely cause a
memory fault and a crash.

Solution

The PL/Java contains code that ensures that stale pointers are cleared when the
MemoryContext or stack where they where allocated goes out of scope. The Java wrapper
objects might live on but any attempt to use them will result in a stale native handle
exception.

Example

The following simple Java example creates a JAR file that contains a single method and runs
the method.