Writing Postgres Extensions - the Basics

Postgres has a ton of features and offers a wide range of data types,
functions, operators, and aggregates. But sometimes it’s just not enough for your
use case. Luckily, it’s easy to extend Postgres’ functionality through extension.
So why not write your own?

This is the first in a series of articles about extending Postgres through extensions.
You can follow the code examples here on branch part_i

base36

You might already know the trick used by url shorteners. Use some unique random characters
such as http://goo.gl/EAZSKW to point to something else. You have to remember what points to where, of course, so you need to store it in a database.
But instead of
saving 6 characters using varchar(6) (and thus wasting 7 bytes) why not use an integer
with 4 bytes and represent it as base36?

The Extension Skeleton

To be able to run the CREATE EXTENSION command in your database, your extension needs at least two files: a control file in the format extension_name.control, which tells Postgres some basics about your extension, and a extension’s SQL script file in the format extension--version.sql.
So let’s add them into our project directory.

As of now, our extension has no functionality. Let’s add some in an SQL
script file:

base36–0.0.1.sql

123456789101112131415161718192021222324252627

-- complain if script is sourced in psql, rather than via CREATE EXTENSION\echoUse"CREATE EXTENSION base36"toloadthisfile.\quitCREATEFUNCTIONbase36_encode(digitsint)RETURNStextLANGUAGEplpgsqlIMMUTABLESTRICTAS$$DECLAREcharschar[];retvarchar;valint;BEGINchars:=ARRAY['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'];val:=digits;ret:='';WHILEval!=0LOOPret:=chars[(val%36)+1]||ret;val:=val/36;ENDLOOP;RETURN(ret);END;$$;

The second line ensures that the file won’t be loaded into the database directly,
but only via CREATE EXTENSION.

The simple plpgsql function allows us to encode any integer into its base36
representation.
If we copied these two files into postgres SHAREDIR/extension directory, then we could start using the extension with CREATE EXTENSION.
But we won’t bother users with figuring out where to put these files and how to
copy them manually – that’s what Makefiles are made for. So, let’s add one to our project.

Makefile

Every PostgreSQL installation from 9.1 onwards provides a build infrastructure for extensions
called PGXS, allowing extensions to be easily built against an already-installed
server. Most of the environment variables needed to build an extension are
setup in pg_config and can simply be reused.

Write tests

These days, every serious developer writes tests. And as database developer who
deals with data (probably the most valuable thing in your company) you should as well.

You can easily add some regression tests to your project that can be invoked
by make installcheck after doing make install. For this to work you can put
test script files in a subdirectory named sql/. For each test file there should
also be a file containing the expected output in a subdirectory named expected/
with the same name and the extension .out. The make installcheck command executes
each test script with psql, and compares the resulting output to the matching expected file.
Any differences will be written to the file regression.diffs.
Let’s do so:

If we now run make install && make installcheck, then our tests would fail. This is because we didn’t specify the expected output. However, we’d find the new directory results, which would contain base36_test.out and base36_test.out.diff.
The former contains the actual output from our test script file. Let’s move
it into the desired directory.

You’ll notice that in line 6, base36_encode(0) returns an empty string where we’d expect 0. If we fix our
expectation, our test would fail again.

123456789101112

============== running regression test queries==============test base36_test ... FAILED====================== 1 of 1 tests failed.
======================The differences that caused some tests to fail can be viewed in the
file "regression.diffs". A copy of the test summary that you see
above is saved in the file "regression.out".
make: *** [installcheck] Error 1

And we can easily inspect the failing test by looking at the mentioned regression.diffs

Now let’s implement the fix in the encoding function to make the tests pass again (Line 12-14):

base36–0.0.1.sql

123456789101112131415161718192021222324252627282930

-- complain if script is sourced in psql, rather than via CREATE EXTENSION\echoUse"CREATE EXTENSION base36"toloadthisfile.\quitCREATEFUNCTIONbase36_encode(digitsint)RETURNScharactervaryingLANGUAGEplpgsqlIMMUTABLESTRICTAS$$DECLAREcharschar[];retvarchar;valint;BEGINIFdigits=0THENRETURN('0');ENDIF;chars:=ARRAY['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'];val:=digits;ret:='';WHILEval!=0LOOPret:=chars[(val%36)+1]||ret;val:=val/36;ENDLOOP;RETURN(ret);END;$$;

Optimize for speed, write some C

While shipping related functionality in an extension is a convenient way to share
code, the real fun starts when you implement stuff in C.
Let’s get the first 1M base36 numbers.

PG_MODULE_MAGIC is the “magic block” needed as of PostgreSQL 8.2 in one (and only one) of the module source files after including the header fmgr.h.

PG_FUNCTION_INFO_V1(base36_encode); introduces the function to Postges as Version 1 Calling Convention, and is only needed if
you want the function to interface with Postgres.

Datum is the return type of every C-language Postgres function and can be any data type. You can think of it as something similar to a void *.

base36_encode(PG_FUNCTION_ARGS) our function is named base36_encodePG_FUNCTION_ARGS and can take any number and any type of arguments.

int32 arg = PG_GETARG_INT32(0); get the first argument. The arguments are numbered starting from 0. You must use the PG_GETARG_XXX macros defined in fmgr.h to get the actual argument value.

char *buffer = palloc(7 * sizeof(char)); to prevent memory leaks when allocating memory, always use the PostgreSQL functions palloc and pfree instead of the corresponding C library functions malloc and free.
Memory allocated by palloc will be freed automatically at the end of each transaction. You can also use palloc0 to ensure the bytes are zeroed.

PG_RETURN_TEXT_P(cstring_to_text(&buffer[offset])); to return a value to Postgres you always have to use
one of the PG_RETURN_XXX macros. cstring_to_text converts the cstring to Postgres text type before.

Once we’re finished with the C-part, we need to modify our SQL function.

base36–0.0.1.sql

12345

-- complain if script is sourced in psql, rather than via CREATE EXTENSION\echoUse"CREATE EXTENSION base36"toloadthisfile.\quitCREATEFUNCTIONbase36_encode(integer)RETURNStextAS'$libdir/base36'LANGUAGECIMMUTABLESTRICT;

To be able to use the function we also need to modify the Makefile (Line 4)

Returning errors

You might have noticed that our simple implementation would not work with negative
numbers. Just as it did before with 0, it would return an empty string. We might want to add
a - sign for negative values or simply error out. Let’s go for the latter. (Line 12-20)