News

Welcome to End Point’s blog

WAL-based Estimates For When a Record Was Changed

I originally titled this: Inferring Record Timestamps by Analyzing PITR Streams for Transaction Commits and Cross-Referencing Tuple xmin Values. But that seemed a little long, though it does sum up the technique.

In other words, it's a way to approximate an updated_at timestamp column for your tables when you didn't have one in the first place.

PostgreSQL stores the timestamp of a transaction's commit into the transaction log. If you have a hot standby server, you can see the value for the most-recently-applied transaction as the output of the pg_last_xact_replay_timestamp() function. That's useful for estimating replication lag. But I hadn't seen any other uses for it, at least until I came up with the hypothesis that all the available values could be extracted wholesale, and matched with the transaction ID's stored along with every record.

If you're on 9.5, there's track_commit_timestamps in postgresql.conf, and combined with the pg_xact_commit_timestamp(xid) function has a similar result. But it can't be turned on retroactively.

This can -- sort of. So long as you have those transaction logs, at least. If you're doing Point-In-Time Recovery you're likely to at least have some of them, especially more recent ones.

I tested this technique on a pgbench database on stock PostgreSQL 9.4, apart from the following postgresql.conf settings that (sort of) turn on WAL archival -- or at least make sure the WAL segments are kept around:

wal_level = archive
archive_mode = on
archive_command = '/bin/false'

We'll be using the pg_xlogdump binary to parse those WAL segments, available from 9.3 on. If you're on an earlier version, the older xlogdump code will work.

Once pgbench has generated some traffic, then it's time to see what's contained in the WAL segments we have available. Since I have them all I went all the way back to the beginning.

The last line just indicates that we've hit the end of the transaction log records, and it's written to stderr, so it can be ignored. Otherwise, that output contains everything we need, we just need to shift around the components so we can read it back into Postgres. Something like this did the trick for me, and let me import it directly: