In Greenplum Database, data is divided up across segments — each segment is a
distinct PostgreSQL database. To prevent inconsistent or unexpected results, do not
execute functions classified as VOLATILE at the segment level if
they contain SQL commands or modify the database in any way. For example, functions
such as setval() are not allowed to execute on distributed data in
Greenplum Database because they can cause inconsistent data between segment
instances.

To ensure data consistency, you can safely use VOLATILE and
STABLE functions in statements that are evaluated on and run
from the master. For example, the following statements run on the master (statements
without a FROM clause):

SELECT setval('myseq', 201);
SELECT foo();

If a statement has a FROM clause containing a distributed table
and the function in the FROM clause returns a set of
rows, the statement can run on the segments:

SELECT * from foo();

Greenplum Database does not support functions that return a table reference
(rangeFuncs) or functions that use the
refCursor datatype.

Function Volatility and Plan Caching

There is relatively little difference between the STABLE and
IMMUTABLE function volatility categories for simple
interactive queries that are planned and immediately executed. It does not
matter much whether a function is executed once during planning or once during
query execution startup. But there is a big difference when you save the plan
and reuse it later. If you mislabel a function IMMUTABLE,
Greenplum Database may prematurely fold it to a constant during planning,
possibly reusing a stale value during subsequent execution of the plan. You may
run into this hazard when using PREPAREd statements, or when
using languages such as PL/pgSQL that cache plans.

User-Defined Functions

Greenplum Database supports user-defined functions. See Extending SQL in the PostgreSQL documentation for more information.

Use the CREATE FUNCTION statement to register user-defined functions
that are used as described in Using Functions in Greenplum Database. By
default, user-defined functions are declared as VOLATILE, so if
your user-defined function is IMMUTABLE or STABLE,
you must specify the correct volatility level when you register your function.

When you create user-defined functions, avoid using fatal errors or destructive
calls. Greenplum Database may respond to such errors with a sudden shutdown or
restart.

In Greenplum Database, the shared library files for user-created functions must
reside in the same library path location on every host in the Greenplum Database
array (masters, segments, and mirrors).

You can also create and execute anonymous code blocks that are written in a Greenplum
Database procedural language such as PL/pgSQL. The anonymous blocks run as transient
anonymous functions. For information about creating and executing anonymous blocks,
see the DO command.

Built-in Functions and Operators

The following table lists the categories of built-in functions and operators
supported by PostgreSQL. All functions and operators are supported in Greenplum
Database as in PostgreSQL with the exception of STABLE and
VOLATILE functions, which are subject to the restrictions noted
in Using Functions in Greenplum Database. See the Functions and Operators section of the PostgreSQL
documentation for more information about these built-in functions and operators.

Greenplum Database includes JSON processing functions that manipulate values the
json data type. For information about JSON data, see Working with JSON Data.

Window Functions

The following built-in window functions are Greenplum extensions to the PostgreSQL
database. All window functions are immutable. For more information about
window functions, see Window Expressions.

Table 3. Window functions

Function

Return Type

Full Syntax

Description

cume_dist()

double precision

CUME_DIST() OVER ( [PARTITION BY expr] ORDER BY expr )

Calculates the cumulative distribution of a value
in a group of values. Rows with equal values always evaluate to the
same cumulative distribution value.

dense_rank()

bigint

DENSE_RANK () OVER ( [PARTITION BY expr] ORDER BY expr)

Computes the rank of a row in an ordered group of
rows without skipping rank values. Rows with equal values are given
the same rank value.

first_value(expr)

same as input expr type

FIRST_VALUE(expr) OVER ( [PARTITION BY expr] ORDER BY expr [ROWS|RANGE frame_expr] )

Returns the first value in an ordered set of
values.

lag(expr [,offset] [,default])

same as input expr type

LAG(expr [,offset] [,default]) OVER ( [PARTITION BY expr] ORDER BY expr )

Provides access to more than one row of the same
table without doing a self join. Given a series of rows returned
from a query and a position of the cursor, LAG
provides access to a row at a given physical offset prior to that
position. The default offset is 1. default
sets the value that is returned if the offset goes beyond the scope
of the window. If default is not specified, the default value
is null.

last_value(expr)

same as input expr type

LAST_VALUE(expr) OVER ( [PARTITION BY expr]
ORDER BY expr [ROWS|RANGE frame_expr] )

Returns the last value in an ordered set of
values.

lead(expr [,offset]
[,default])

same as input expr type

LEAD(expr [,offset]
[,exprdefault]) OVER ( [PARTITION BY
expr] ORDER BY expr )

Provides access to more than one row of the same
table without doing a self join. Given a series of rows returned
from a query and a position of the cursor, lead
provides access to a row at a given physical offset after that
position. If offset is not specified, the default offset is
1. default sets the value that is returned if the offset goes
beyond the scope of the window. If default is not specified,
the default value is null.

ntile(expr)

bigint

NTILE(expr) OVER ( [PARTITION BY expr] ORDER
BY expr )

Divides an ordered data set into a number of
buckets (as defined by expr) and assigns a bucket number to
each row.

percent_rank()

double precision

PERCENT_RANK () OVER ( [PARTITION BY expr] ORDER BY
expr )

Calculates the rank of a hypothetical row
R minus 1, divided by 1 less than the number of
rows being evaluated (within a window partition).

rank()

bigint

RANK () OVER ( [PARTITION BY expr] ORDER BY expr
)

Calculates the rank of a row in an ordered group
of values. Rows with equal values for the ranking criteria receive
the same rank. The number of tied rows are added to the rank number
to calculate the next rank value. Ranks may not be consecutive
numbers in this case.

row_number()

bigint

ROW_NUMBER () OVER ( [PARTITION BY expr] ORDER BY
expr )

Assigns a unique number to each row to which it is
applied (either each row in a window partition or each row of the
query).

Advanced Aggregate Functions

The following built-in advanced aggregate functions are Greenplum extensions of the
PostgreSQL database. These functions are immutable.

SELECT department_id, MEDIAN(salary)
FROM employees
GROUP BY department_id;

Can take a two-dimensional array as input. Treats
such arrays as matrices.

PERCENTILE_CONT (expr) WITHIN GROUP (ORDER BY
expr [DESC/ASC])

timestamp, timestamptz, interval, float

PERCENTILE_CONT(percentage) WITHIN GROUP (ORDER BY
expression)

Example:

SELECT department_id,
PERCENTILE_CONT (0.5) WITHIN GROUP (ORDER BY salary DESC)
"Median_cont";
FROM employees GROUP BY department_id;

Performs an inverse distribution function that
assumes a continuous distribution model. It takes a percentile value
and a sort specification and returns the same datatype as the
numeric datatype of the argument. This returned value is a computed
result after performing linear interpolation. Null are ignored in
this calculation.

PERCENTILE_DISC (expr) WITHIN GROUP
(ORDER BY expr [DESC/ASC])

timestamp, timestamptz, interval, float

PERCENTILE_DISC(percentage) WITHIN GROUP (ORDER BY
expression)

Example:

SELECT department_id,
PERCENTILE_DISC (0.5) WITHIN GROUP (ORDER BY salary DESC)
"Median_desc";
FROM employees GROUP BY department_id;

Performs an inverse distribution function that
assumes a discrete distribution model. It takes a percentile value
and a sort specification. This returned value is an element from the
set. Null are ignored in this calculation.