Functions written in C can be compiled into dynamically
loadable objects (also called shared libraries), and used to
implement user-defined SQL functions. The first time a
user-defined function in a particular loadable object file is
called in a backend session, the dynamic loader loads that object
file into memory so that the function can be called. The
CREATE FUNCTION for a user-defined
function must therefore specify two pieces of information for the
function: the name of the loadable object file, and the C name
(link symbol) of the specific function to call within that object
file. If the C name is not explicitly specified then it is
assumed to be the same as the SQL function name.

Note: After it is used for the first time, a
dynamically loaded user function is retained in memory, and
future calls to the function in the same session will only
incur the small overhead of a symbol table lookup.

The string that specifies the object file (the first string in
the AS clause) should be the full path of
the object code file for the function, bracketed by single quote
marks. If a link symbol is given in the AS clause, the link
symbol should also be bracketed by single quote marks, and should
be exactly the same as the name of the function in the C source
code. On Unix systems the command nm will
print all of the link symbols in a dynamically loadable
object.

Note:Postgres
will not compile a function automatically; it must be
compiled before it is used in a CREATE FUNCTION command. See
below for additional information.

Two different calling conventions are currently used for C
functions. The newer "version 1" calling convention is indicated
by writing a PG_FUNCTION_INFO_V1() macro
call for the function, as illustrated below. Lack of such a macro
indicates an old-style ("version 0") function. The language name
specified in CREATE FUNCTION is 'C' in either case. Old-style
functions are now deprecated because of portability problems and
lack of functionality, but they are still supported for
compatibility reasons.

The following table gives the C type required for parameters
in the C functions that will be loaded into Postgres. The
"Defined In" column gives the actual header file (in the
.../src/backend/ directory) that the
equivalent C type is defined. Note that you should always
include postgres.h first, and that in
turn includes c.h.

Table 13-1. Equivalent C Types for Built-In
Postgres Types

Built-In Type

C Type

Defined In

abstime

AbsoluteTime

utils/nabstime.h

bool

bool

include/c.h

box

(BOX *)

utils/geo-decls.h

bytea

(bytea *)

include/postgres.h

"char"

char

N/A

cid

CID

include/postgres.h

datetime

(DateTime *)

include/c.h or
include/postgres.h

int2

int2 or int16

include/postgres.h

int2vector

(int2vector *)

include/postgres.h

int4

int4 or int32

include/postgres.h

float4

(float4 *)

include/c.h or
include/postgres.h

float8

(float8 *)

include/c.h or
include/postgres.h

lseg

(LSEG *)

include/geo-decls.h

name

(Name)

include/postgres.h

oid

oid

include/postgres.h

oidvector

(oidvector *)

include/postgres.h

path

(PATH *)

utils/geo-decls.h

point

(POINT *)

utils/geo-decls.h

regproc

regproc or REGPROC

include/postgres.h

reltime

RelativeTime

utils/nabstime.h

text

(text *)

include/postgres.h

tid

ItemPointer

storage/itemptr.h

timespan

(TimeSpan *)

include/c.h or
include/postgres.h

tinterval

TimeInterval

utils/nabstime.h

xid

(XID *)

include/postgres.h

Internally, Postgres
regards a base type as a "blob of memory." The user-defined
functions that you define over a type in turn define the way
that Postgres can operate on
it. That is, Postgres will
only store and retrieve the data from disk and use your
user-defined functions to input, process, and output the data.
Base types can have one of three internal formats:

pass by value, fixed-length

pass by reference, fixed-length

pass by reference, variable-length

By-value types can only be 1, 2 or 4 bytes in length (also 8
bytes, if sizeof(Datum) is 8 on your machine). You should be
careful to define your types such that they will be the same
size (in bytes) on all architectures. For example, the
long type is dangerous because it is 4
bytes on some machines and 8 bytes on others, whereas
int type is 4 bytes on most Unix
machines (though not on most personal computers). A reasonable
implementation of the int4 type on
Unix machines might be:

/* 4-byte integer, passed by value */
typedef int int4;

On the other hand, fixed-length types of any size may be
passed by-reference. For example, here is a sample
implementation of a Postgres
type:

Only pointers to such types can be used when passing them in
and out of Postgres functions.
To return a value of such a type, allocate the right amount of
memory with palloc(), fill in the
allocated memory, and return a pointer to it. (Alternatively,
you can return an input value of the same type by returning its
pointer. Never modify the contents of a
pass-by-reference input value, however.)

Finally, all variable-length types must also be passed by
reference. All variable-length types must begin with a length
field of exactly 4 bytes, and all data to be stored within that
type must be located in the memory immediately following that
length field. The length field is the total length of the
structure (i.e., it includes the size of the length field
itself). We can define the text type as follows:

typedef struct {
int4 length;
char data[1];
} text;

Obviously, the data field shown here is not long enough to
hold all possible strings; it's impossible to declare such a
structure in C. When manipulating
variable-length types, we must be careful to allocate the
correct amount of memory and initialize the length field. For
example, if we wanted to store 40 bytes in a text structure, we
might use a code fragment like this:

We present the "old style"
calling convention first --- although this approach is now
deprecated, it's easier to get a handle on initially. In the
version-0 method, the arguments and result of the C function
are just declared in normal C style, but being careful to use
the C representation of each SQL data type as shown above.

Here PGROOT stands for the
full path to the Postgres
source tree. Note that depending on your system, the filename
for a shared object might not end in .so, but in .sl or
something else; adapt accordingly.

Notice that we have specified the functions as "strict",
meaning that the system should automatically assume a NULL
result if any input value is NULL. By doing this, we avoid
having to check for NULL inputs in the function code. Without
this, we'd have to check for NULLs explicitly, for example by
checking for a null pointer for each pass-by-reference
argument. (For pass-by-value arguments, we don't even have a
way to check!)

Although this calling convention is simple to use, it is not
very portable; on some architectures there are problems with
passing smaller-than-int data types this way. Also, there is no
simple way to return a NULL result, nor to cope with NULL
arguments in any way other than making the function strict. The
version-1 convention, presented next, overcomes these
objections.

The version-1 calling convention relies on macros to
suppress most of the complexity of passing arguments and
results. The C declaration of a version-1 function is
always

Datum funcname(PG_FUNCTION_ARGS)

In addition, the macro call

PG_FUNCTION_INFO_V1(funcname);

must appear in the same source file (conventionally it's
written just before the function itself). This macro call is not
needed for "internal"-language functions, since Postgres currently
assumes all internal functions are version-1. However, it is
required for dynamically-loaded functions.

In a version-1 function, each actual argument is fetched
using a PG_GETARG_xxx() macro that corresponds to the
argument's datatype, and the result is returned using a
PG_RETURN_xxx() macro for the return type.

The CREATE FUNCTION commands are the
same as for the version-0 equivalents.

At first glance, the version-1 coding conventions may appear
to be just pointless obscurantism. However, they do offer a
number of improvements, because the macros can hide unnecessary
detail. An example is that in coding add_one_float8, we no
longer need to be aware that float8 is a pass-by-reference
type. Another example is that the GETARG macros for
variable-length types hide the need to deal with fetching
"toasted" (compressed or out-of-line) values. The old-style
copytext and concat_text functions shown above are actually
wrong in the presence of toasted values, because they don't
call pg_detoast_datum() on their
inputs. (The handler for old-style dynamically-loaded functions
currently takes care of this detail, but it does so less
efficiently than is possible for a version-1 function.)

One big improvement in version-1 functions is better
handling of NULL inputs and results. The macro PG_ARGISNULL(n) allows a function to test
whether each input is NULL (of course, doing this is only
necessary in functions not declared "strict"). As with the PG_GETARG_xxx()
macros, the input arguments are counted beginning at zero. To
return a NULL result, execute PG_RETURN_NULL(); this works in both strict and
non-strict functions.

The version-1 function call conventions make it possible to
return "set" results and implement
trigger functions and procedural-language call handlers.
Version-1 code is also more portable than version-0, because it
does not break ANSI C restrictions on function call protocol.
For more details see src/backend/utils/fmgr/README in the source
distribution.

Composite types do not have a fixed layout like C
structures. Instances of a composite type may contain null
fields. In addition, composite types that are part of an
inheritance hierarchy may have different fields than other
members of the same inheritance hierarchy. Therefore,
Postgres provides a procedural
interface for accessing fields of composite types from C. As
Postgres processes a set of
rows, each row will be passed into your function as an opaque
structure of type TUPLE. Suppose we
want to write a function to answer the query

SELECT name, c_overpaid(emp, 1500) AS overpaid
FROM emp
WHERE name = 'Bill' OR name = 'Sam';

GetAttributeByName is the
Postgres system function that
returns attributes out of the current row. It has three
arguments: the argument of type TupleTableSlot* passed into the function, the name
of the desired attribute, and a return parameter that tells
whether the attribute is null. GetAttributeByName returns a Datum value that
you can convert to the proper datatype by using the appropriate
DatumGetXXX() macro.

We now turn to the more difficult task of writing
programming language functions. Be warned: this section of the
manual will not make you a programmer. You must have a good
understanding of C (including the
use of pointers and the malloc memory manager) before trying to
write C functions for use with
Postgres. While it may be
possible to load functions written in languages other than
C into Postgres, this is often difficult (when it
is possible at all) because other languages, such as
FORTRAN and Pascal often do not follow the same calling convention as C. That is, other languages do not pass
argument and return values between functions in the same way.
For this reason, we will assume that your programming language
functions are written in C.

The basic rules for building C
functions are as follows:

The relevant header (include) files are installed under
/usr/local/pgsql/include or
equivalent. You can use pg_config
--includedir to find out where it is on your system
(or the system that your users will be running on). For
very low-level work you might need to have a complete
PostgreSQL source tree
available.

When allocating memory, use the Postgres routines palloc and pfree
instead of the corresponding C
library routines malloc and
free. The memory allocated by
palloc will be freed
automatically at the end of each transaction, preventing
memory leaks.

Always zero the bytes of your structures using
memset or bzero. Several routines (such as the hash
access method, hash join and the sort algorithm) compute
functions of the raw bits contained in your structure. Even
if you initialize all fields of your structure, there may
be several bytes of alignment padding (holes in the
structure) that may contain garbage values.

Most of the internal Postgres types are declared in
postgres.h, while the function
manager interfaces (PG_FUNCTION_ARGS, etc.) are in fmgr.h, so you will need to include at
least these two files. For portability reasons it's best to
include postgres.hfirst, before any other system or user
header files. Including postgres.h will also include c.h, elog.h and
palloc.h for you.

Symbol names defined within object files must not
conflict with each other or with symbols defined in the
PostgreSQL server
executable. You will have to rename your functions or
variables if you get error messages to this effect.

Compiling and linking your object code so that it can be
dynamically loaded into Postgres always requires special
flags. See Section 13.4.6
for a detailed explanation of how to do it for your
particular operating system.

Before you are able to use your PostgreSQL extension function written in C
they need to be compiled and linked in a special way in order
to allow it to be dynamically loaded as needed by the server.
To be precise, a shared library needs
to be created.

For more information you should read the documentation of
your operating system, in particular the manual pages for the C
compiler, cc, and the link editor,
ld. In addition, the PostgreSQL source code contains several
working examples in the contrib
directory. If you rely on these examples you will make your
modules dependent on the availability of the PostgreSQL source code, however.

Creating shared libraries is generally analoguous to linking
executables: first the source files are compiled into object
files, then the object files are linked together. The object
files need to be created as position-independent code (PIC), which conceptually means that they can
be placed at an arbitrary location in memory when they are
loaded by the executable. (Object files intended for
executables are not compiled that way.) The command to link a
shared library contains special flags to distinguish it from
linking an executable. --- At least this is the theory. On some
systems the practice is much uglier.

In the following examples we assume that your source code is
in a file foo.c and we will create an
shared library foo.so. The
intermediate object file will be called foo.o unless otherwise noted. A shared library
can contain more than one object file, but we only use one
here.

BSD/OS

The compiler flag to create PIC is -fpic.
The linker flag to create shared libraries is -shared.

gcc -fpic -c foo.c
ld -shared -o foo.so foo.o

This is applicable as of version 4.0 of
BSD/OS.

FreeBSD

The compiler flag to create PIC is -fpic. To
create shared libraries the compiler flag is -shared.

gcc -fpic -c foo.c
gcc -shared -o foo.so foo.o

This is applicable as of version 3.0 of
FreeBSD.

HP-UX

The compiler flag of the system compiler to create
PIC is +z. When using GCC it's -fpic. The linker flag for shared libraries
is -b. So

cc +z -c foo.c

or

gcc -fpic -c foo.c

and then

ld -b -o foo.sl foo.o

HP-UX uses the extension
.sl for shared libraries, unlike most
other systems.

Irix

PIC is the default, no
special compiler options are necessary. The linker option
to produce shared libraries is -shared.

cc -c foo.c
ld -shared -o foo.so foo.o

Linux

The compiler flag to create PIC is -fpic. On
some platforms in some situations -fPIC must be used if -fpic does not work. Refer to the GCC
manual for more information. The compiler flag to create
a shared library is -shared. A
complete example looks like this:

cc -fpic -c foo.c
cc -shared -o foo.so foo.o

NetBSD

The compiler flag to create PIC is -fpic.
For ELF systems, the
compiler with the flag -shared is
used to link shared libraries. On the older non-ELF
systems, ld -Bshareable is
used.

gcc -fpic -c foo.c
gcc -shared -o foo.so foo.o

OpenBSD

The compiler flag to create PIC is -fpic.
ld -Bshareable is used to link
shared libraries.

gcc -fpic -c foo.c
ld -Bshareable -o foo.so foo.o

Digital Unix/Tru64 UNIX

PIC is the default, so
the compilation command is the usual one. ld with special options is used to do the
linking:

cc -c foo.c
ld -shared -expect_unresolved '*' -o foo.so foo.o

The same procedure is used with GCC instead of the system
compiler; no special options are required.

Solaris

The compiler flag to create PIC is -KPIC
with the Sun compiler and -fpic
with GCC. To link shared
libraries, the compiler option is -G with either compiler or alternatively
-shared with GCC.

cc -KPIC -c foo.c
cc -G -o foo.so foo.o

or

gcc -fpic -c foo.c
gcc -G -o foo.so foo.o

Unixware

The compiler flag to create PIC is -K PIC
with the SCO compiler and -fpic
with GCC. To link shared
libraries, the compiler option is -G with the SCO compiler and -shared with GCC.

cc -K PIC -c foo.c
cc -G -o foo.so foo.o

or

gcc -fpic -c foo.c
gcc -shared -o foo.so foo.o

Tip: If you want to package your extension
modules for wide distribution you should consider using
GNU Libtool for
building shared libraries. It encapsulates the platform
differences into a general and powerful interface. Serious
packaging also requires considerations about library
versioning, symbol resolution methods, and other
issues.

The resulting shared library file can then be loaded into
Postgres. When specifying the
file name to the CREATE FUNCTION
command, one must give it the name of the shared library file
(ending in .so) rather than the
simple object file.

Note: Actually, Postgres does not care what you name
the file as long as it is a shared library file.

Paths given to the CREATE FUNCTION
command must be absolute paths (i.e., start with /) that refer to directories visible on the
machine on which the Postgres
server is running. Relative paths do in fact work, but are
relative to the directory where the database resides (which is
generally invisible to the frontend application). Obviously, it
makes no sense to make the path relative to the directory in
which the user started the frontend application, since the
server could be running on a completely different machine! The
user id the Postgres server
runs as must be able to traverse the path given to the
CREATE FUNCTION command and be able to
read the shared library file. (Making the file or a
higher-level directory not readable and/or not executable by
the "postgres" user is a common
mistake.)