Queries returning aggregate, summary, and computed data are frequently used in application development. Sometimes these queries are not fast enough. Caching query results using Memcached or Redis is a common approach for resolving these performance issues. However, these bring their own challenges. Before reaching for an external tool it is worth examining what techniques PostgreSQL offers for caching query results.

We will examine different approaches using the sample domain of a simplified account system. Accounts can have many transactions. Transactions can be recorded ahead of time and only take effect at post time. e.g. A debit that is effective on March 9 can be entered on March 1. The summary data we need is account balance.

Our query that we will optimize for is finding the balance of accounts. To start we will create a view that finds balances for all accounts. A PostgreSQL view is a saved query. Once created, selecting from a view is exactly the same as selecting from the original query, i.e. it reruns the query each time.

To retrieve the balance from each row we simple select from the materialized view.

select*frommatview.account_balanceswherebalance<0;

The performance impact is impressive. It now only takes 13ms to retrieve all the accounts with negative balances -- 453x faster! Unfortunately, these materialized views have two substantial limitations. First, they are only updated on demand. Second, the whole materialized view must be updated; there is no way to only update a single stale row.

-- refresh all rowsrefreshmaterializedviewmatview.account_balances;

In the case where possibly stale data is acceptable, they are an excellent solution. But if data must always be fresh they are not a solution.

Our next approach is to materialize the query into a table that is eagerly updated whenever a change occurs that would invalidate a row. We can do that with triggers. A trigger is a bit of code that runs when some event such as an insert or update happens.

The syntax for create function and create trigger is quite extensive. Refer to the documentation for details. But the summary explanation is this: We create the function eager.account_insert as a trigger function that will run with the permissions of the user who created it (security definer). Inside a insert trigger function, new is a variable that holds the new record.

For the update of a transaction, we have to account for the possibility that the account the transaction belongs to was changed. We use the old and new values of the row to determine which account balances are invalidated and need to be refreshed.

To query the negative account balances we simply select from the acount_balances table.

select*fromeager.account_balanceswherebalance<0;

This is really fast (13ms / 453x faster) just like the materialized view. But it has the advantage of it stays fresh even when transactions change. Unfortunately, this strategy doesn't account for one key requirement -- row invalidation by the passage of time.

The same data changes that could invalidate materialized rows in the eager strategy must be handled with the lazy strategy. The difference is that the triggers will only update expiration_time -- they will not actually recalculate the data.

As with the eager strategy, on account insertion we need to create a account_balances_mat record with a zero balance for the new account. But we also need to provide an expiration_time. The balance for an account with no transactions will be valid forever, so we provide the special PostgreSQL value Infinity as the expiration_time. Infinity is defined as greater than any other value.

For the insert of a transaction, we update the expiration_time if the post_time of the transaction is less than the current expiration_time. This means the update only happens when absolutely necessary. If the account will already be considered stale at the post_time of the new record we avoid the IO cost of the write.

Unlike when a transaction is inserted, when a transaction is updated, it is not possible to compute the new account expiration_time without reading the account's transactions. This makes it cheaper to simply invalidate the account balance. We will simply set expiration_time to -Infinity, a special value defined as being less than all other values. This ensures that the row will be considered stale.

To retrieve the all accounts with negative balances balances we simply select from the account_balances view.

select*fromlazy.account_balanceswherebalance<0;

The first time the query is run it takes about 5900ms because it is caching the balance for all accounts. Subsequent runs only take about 16ms (368x faster). In general, the query run time should not be nearly so variable because only a small fraction of the rows will be refreshed in any one query.

PostgreSQL's built-in materialized views offer the best performance improvement for the least work, but only if stale data is acceptable. Eager materialized views offer the absolute best read performance, but can only guarantee freshness if rows do not go stale due to the passage of time. Lazy materialized views offer almost as good read performance as eager materialized views, but they can guarantee freshness under all circumstances.

One additional consideration is read-heavy vs. write-heavy workloads. Most systems are read-heavy. But for a write-heavy load you should give consider leaning toward lazy and away from eager materialized views. The reason is that eager materialized views do the refresh calculation on every write whereas lazy materialized views only pay that cost on read.

PostgreSQL materialization strategies can improve performance by a factor of hundreds or more. In contrast to caching in Memcachd or Redis, PostgreSQL materialization provides ACID guarantees. This eliminates an entire category of consistency issues that must be handled at the application layer. In addition, the infrastructure for a system as a whole is simpler with one less part.

The increased performance and system simplicity is well worth the cost of more advanced SQL.