PGQ Tutorial

From PostgreSQL wiki

PGQ is the queueing solution from Skytools. The
Londiste replication solution is a consumer daemon
built on PGQ, and the API is accessible for you to create any asynchronous
processing facility, based on queuing.

What problems is PGQ a solution for?

PGQ will solve asynchronous batch processing of live transactions.

That means you're doing some INSERT/DELETE/UPDATE of rows from your live
environment, and you want to trigger some actions but not at COMMIT time,
later on. Not in a too distant future either, just asynchronously: without
blocking the live transaction.

Every application of a certain size will need to postpone some processing at
a later date, and PGQ is a generic high-performance solution built for
PostgreSQL that allows implementing just that: the batch processing. PGQ
will care about the asynchronous consuming of events, the error management,
the queuing behaviour, etc, and it comes with a plain SQL API to do this.

The ticker

The ticker will produce ticks, which will then serve as boundaries to batch
events. Those batches are produced on-demand, each time a consumer is asking
for a new batch, and will get produced such as any batch contains either
ticker_max_lag seconds worth of events or ticker_max_count events, whichever
comes first.

The daemon is responsible for the maintenance of the queues, which is done by
rotating the events INSERT in 3 tables and using TRUNCATE as soon as one
table does not contain unprocessed events anymore.

Producing and Consuming Events

In order not to lose any event, you'll want to make sure there's at least
one consumer registered before to produce any event, as events nobody's
interested in are lost.

You can have as many consumers as you want to on the same event queue, but
they will all see the same events rather than share the workload. That makes
sense if you consider that the main application is replication: consumers
are slaves, when you have more than one you want them to all process the same
events in order for all the replicas to have the same data set.

There's a work in progress in order to provide pgq cooperative consumers, the progress is currently stalled on this [1] mailing list thread.

More Info

Producing Events

The documentation is quite clear about this, you produce event either using
a function based API or using a trigger based one. This latter is prefered,
as it provides easy SQL data type validation, and a known interface to
insert event, the classic INSERT INTO.

Writing a PGQ consumer

Your best option is to write the consumer code in python, next best option is PHP. If you want to reuse existing code written in another language, you'll have to write yourself the Operating System daemon support code and the looping around the API entry points of PGQ.

All the PGQ logic is written as PostgreSQL extension and server side code, so all you need your language to provide is a way to connect to your PostgreSQL server and call functions.

How to use the PGQ SQL API

You will first need your consumer code to be able to register and unregister from an existing queue.

Then the main idea is pretty simple: the consumer loops over the pgq.next_batch() call as fast as possible until it receives NULL, that's when it should take a rest.

Please note that the ticker daemon will produce ticks every ticker_idle_period when your system is not producing events, so it's perfectly normal (and expected) to get empty batches in low activity periods, such as when debugging your first consumer. Empty batches need to get consumed as fast as possible, only have your custom subscriber rest when pgq.next_batch() returns NULL.

When pgq.next_batch() returns an id (bigint), you get to process the batch
events, which you obtain with pgq.get_batch_events(batch_id). What to do
with each event should be specific to the consumer code rather than in a
common PGQ consuming abstraction interface (or library, package, whatever
your programming environment facility is named).

The event processing function should be able to tag events as processed
(OK), failed or to retry later. In the retry case, the event will
get introduced again in a latter batch, which one depends on how much later
you want it retried.

Once all events of a batch are processed, you have to call
pgq.finish_batch(batch_id) then COMMIT your batch processing transaction.

In pseudo language this gives:

do
batch_id = pgq.next_batch()
if batch_id is not null
then
BEGIN;
events = pgq.get_batch_events(batch_id);
// this could be a function pointer or a virtual method or a delegate, e.g.
user_defined_process_event(events);
pgq.finish_batch(batch_id);
COMMIT;
end if;
while true;

Remote consuming

This is the situation where the processing of the events happens on another
database than the one where the events got produced. Londiste is a good
example of remote consuming.

In this case, you will need to implement a way to avoid processing again a
batch when the processing COMMIT succeeded but the pgq.finish_batch() (which
has to be done on the producer database rather than the consumer one)
fails. How the 3 other scenarios are not a problem is left to the reader.

Skytools provides pgq_ext (extension) as a way to ensure that you won't
process the same batch more than once, and this is discussed here:
[[2]].

The pgq_ext idea is to record on the consumer database the last batch_id
processed, and have the UPDATE be done in the processing transaction. This
can be refined to the last event id processed, for cases where you don't
have an easy way to rollback all the batch processing in case of failure at
the remote site.

As a side note, you might want to know that londiste is using pgq_ext but
moves the SQL entries (functions and tables) into the londiste schema.

Non transactional processing

Your consumer code could be sending mail rather than changing a database
state. In this case the processing isn't transactional (you can't rollback
the batch processing) and you'll have to solve the reliability issues
yourself, PGQ won't provide the magic.

Using the python API

Skytools are written mainly in python and provide everything you need to
write your own consumers.