As described in Section
35.2, PostgreSQL can be
extended to support new data types. This section describes how to
define new base types, which are data types defined below the
level of the SQL language.
Creating a new base type requires implementing functions to
operate on the type in a low-level language, usually C.

The examples in this section can be found in complex.sql and complex.c in the src/tutorial directory of the source
distribution. See the README file in
that directory for instructions about running the examples.

A user-defined type must always have input and output
functions. These functions determine how the type appears in
strings (for input by the user and output to the user) and how
the type is organized in memory. The input function takes a
null-terminated character string as its argument and returns the
internal (in memory) representation of the type. The output
function takes the internal representation of the type as
argument and returns a null-terminated character string. If we
want to do anything more with the type than merely store it, we
must provide additional functions to implement whatever
operations we'd like to have for the type.

Suppose we want to define a type complex
that represents complex numbers. A natural way to represent a
complex number in memory would be the following C structure:

typedef struct Complex {
double x;
double y;
} Complex;

We will need to make this a pass-by-reference type, since it's
too large to fit into a single Datum
value.

As the external string representation of the type, we choose a
string of the form (x,y).

The input and output functions are usually not hard to write,
especially the output function. But when defining the external
string representation of the type, remember that you must
eventually write a complete and robust parser for that
representation as your input function. For instance:

You should be careful to make the input and output functions
inverses of each other. If you do not, you will have severe
problems when you need to dump your data into a file and then
read it back in. This is a particularly common problem when
floating-point numbers are involved.

Optionally, a user-defined type can provide binary input and
output routines. Binary I/O is normally faster but less portable
than textual I/O. As with textual I/O, it is up to you to define
exactly what the external binary representation is. Most of the
built-in data types try to provide a machine-independent binary
representation. For complex, we will
piggy-back on the binary I/O converters for type float8:

When you define a new base type, PostgreSQL automatically provides support
for arrays of that type. The array type typically has the same
name as the base type with the underscore character (_) prepended.

Once the data type exists, we can declare additional functions
to provide useful operations on the data type. Operators can then
be defined atop the functions, and if needed, operator classes
can be created to support indexing of the data type. These
additional layers are discussed in following sections.

If the values of your data type vary in size (in internal
form), you should make the data type TOAST-able (see Section 58.2). You should do this even
if the data are always too small to be compressed or stored
externally, because TOAST can
save space on small data too, by reducing header overhead.

To do this, the internal representation must follow the
standard layout for variable-length data: the first four bytes
must be a char[4] field which is never
accessed directly (customarily named vl_len_). You must use SET_VARSIZE() to store the size of the datum in
this field and VARSIZE() to
retrieve it. The C functions operating on the data type must
always be careful to unpack any toasted values they are handed,
by using PG_DETOAST_DATUM. (This
detail is customarily hidden by defining type-specific
GETARG_DATATYPE_P macros.) Then,
when running the CREATE TYPE command,
specify the internal length as variable
and select the appropriate storage option.

If the alignment is unimportant (either just for a specific
function or because the data type specifies byte alignment
anyway) then it's possible to avoid some of the overhead of
PG_DETOAST_DATUM. You can use
PG_DETOAST_DATUM_PACKED instead
(customarily hidden by defining a GETARG_DATATYPE_PP macro) and using the macros
VARSIZE_ANY_EXHDR and VARDATA_ANY to access a potentially-packed
datum. Again, the data returned by these macros is not aligned
even if the data type definition specifies an alignment. If the
alignment is important you must go through the regular
PG_DETOAST_DATUM interface.

Note: Older code frequently declares vl_len_ as an int32
field instead of char[4]. This is OK as
long as the struct definition has other fields that have at
least int32 alignment. But it is
dangerous to use such a struct definition when working with a
potentially unaligned datum; the compiler may take it as
license to assume the datum actually is aligned, leading to
core dumps on architectures that are strict about
alignment.