Greenplum Database Data Types

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.

Greenplum Database Data Types

Greenplum Database has a rich set of native data types available to users. Users may also
define new data types using the CREATE TYPE command. This reference shows all
of the built-in data types. In addition to the types listed here, there are also some
internally used data types, such as oid (object identifier), but those are not
documented in this guide.

Optional modules in the contrib directory may also install new data
types. The hstore module, for example, introduces a new data type and
associated functions for working with key-value pairs. See hstore Functions. The citext module
adds a case-insensitive text data type. See citext Data Type.

The following data types are specified by SQL: bit, bit varying,
boolean, character varying, varchar, character, char, date,
double precision, integer, interval, numeric, decimal,
real, smallint, time (with or without time zone), and timestamp
(with or without time zone).

Each data type has an external representation determined by its input and output functions.
Many of the built-in types have obvious external formats. However, several types are either
unique to PostgreSQL (and Greenplum Database), such as geometric paths, or have several
possibilities for formats, such as the date and time types. Some of the input and output
functions are not invertible. That is, the result of an output function may lose accuracy when
compared to the original input.

Pseudo-Types

Greenplum Database supports special-purpose data type entries that are collectively called
pseudo-types. A pseudo-type cannot be used as a column data type, but it can be
used to declare a function's argument or result type. Each of the available pseudo-types is
useful in situations where a function's behavior does not correspond to simply taking or
returning a value of a specific SQL data type.

Functions coded in procedural languages can use pseudo-types only as allowed by their
implementation languages. The procedural languages all forbid use of a pseudo-type as an
argument type, and allow only void and record as a result type.

A function with the pseudo-type record as a return data type returns an unspecified
row type. The record represents an array of possibly-anonymous composite types. Since
composite datums carry their own type identification, no extra knowledge is needed at the
array level.

The pseudo-type void indicates that a function returns no value.

Note: Greenplum Database does not support triggers and the pseudo-type trigger.

The types anyelement, anyarray, anynonarray, and anyenum are
pseudo-types called polymorphic types. Some procedural languages also support polymorphic
functions using the types anyarray, anyelement, anyenum, and
anynonarray.

The pseudo-type anytable is a Greenplum Database type that specifies a table
expression—an expression that computes a table. Greenplum Database allows this type only as
an argument to a user-defined function. See Table Value Expressions for
more about the anytable pseudo-type.

For more information about pseudo-types, see the Postgres documentation about Pseudo-Types.

Polymorphic Types

Four pseudo-types of special interest are anyelement, anyarray,
anynonarray, and anyenum, which are collectively called polymorphic
types. Any function declared using these types is said to be a polymorphic function. A
polymorphic function can operate on many different data types, with the specific data types
being determined by the data types actually passed to it at runtime.

Polymorphic arguments and results are tied to each other and are resolved to a specific
data type when a query calling a polymorphic function is parsed. Each position (either
argument or return value) declared as anyelement is allowed to have any specific
actual data type, but in any given call they must all be the same actual type. Each position
declared as anyarray can have any array data type, but similarly they must all be the
same type. If there are positions declared anyarray and others declared
anyelement, the actual array type in the anyarray positions must be an array
whose elements are the same type appearing in the anyelement positions.
anynonarray is treated exactly the same as anyelement, but adds the
additional constraint that the actual type must not be an array type. anyenum is
treated exactly the same as anyelement, but adds the additional constraint that the
actual type must be an enum type.

When more than one argument position is declared with a polymorphic type, the net effect is
that only certain combinations of actual argument types are allowed. For example, a function
declared as equal(anyelement, anyelement) takes any two input
values, so long as they are of the same data type.

When the return value of a function is declared as a polymorphic type, there must be at
least one argument position that is also polymorphic, and the actual data type supplied as
the argument determines the actual result type for that call. For example, if there were not
already an array subscripting mechanism, one could define a function that implements
subscripting as subscript(anyarray, integer) returns
anyelement. This declaration constrains the actual first argument to be an
array type, and allows the parser to infer the correct result type from the actual first
argument's type. Another example is that a function declared as
myfunc(anyarray) returns anyenum will only accept arrays
of enum types.

Note that anynonarray and anyenum do not represent separate type variables;
they are the same type as anyelement, just with an additional constraint. For
example, declaring a function as myfunc(anyelement, anyenum)
is equivalent to declaring it as myfunc(anyenum, anyenum):
both actual arguments must be the same enum type.

A variadic function (one taking a variable number of arguments) is polymorphic when its
last parameter is declared as VARIADIC anyarray. For purposes of
argument matching and determining the actual result type, such a function behaves the
same as if you had declared the appropriate number of anynonarray parameters.

Table Value Expressions

The anytable pseudo-type declares a function argument that is a table value
expression. The notation for a table value expression is a SELECT statement
enclosed in a TABLE() function. You can specify a distribution policy for
the table by adding SCATTER RANDOMLY, or a SCATTER BY
clause with a column list to specify the distribution key.

The SELECT statement is executed when the function is called and the
result rows are distributed to segments so that each segment executes the function with a
subset of the result table.

For example, this table expression selects three columns from a table named
customer and sets the distribution key to the first column:

TABLE(SELECT cust_key, name, address FROM customer SCATTER BY 1)

The SELECT statement may include joins on multiple base tables,
WHERE clauses, aggregates, and any other valid query syntax.

The anytable type is only permitted in functions implemented in the C or C++
languages. The body of the function can access the table using the Greenplum Database Server
Programming Interface (SPI) or the Greenplum Partner Connector (GPPC) API.

The anytable type is used in some user-defined functions in the
Pivotal GPText API. The following GPText example uses the TABLE function
with the SCATTER BY clause in the GPText function
gptext.index() to populate the index
mydb.mytest.articles with data from the messages
table: