When I want to know if my application scales, I need to understand the work done by my queries. No need to run a huge amount of data from many concurrent threads. If I can get the relevant statistics behind a single unit-test, then I can infer how it will scale. For example, reading millions of pages to fetch a few rows will cause shared buffer contention. Or generating dozens of megabytes of WAL for a small update will wait on disk, and penalize the backup RTO, or the replication gap.

I’ll show some examples. From pgsql I’ll collect the statistics (which are cumulative from the start if the instance) before:

In my opinion, the volume of logging (aka redo log, aka xlog, aka WAL) is the most important factor for OLTP performance, availability and scalability, for several reasons:

This is the only structure where disk latency is a mandatory component of response time

This is a big part of the total volume of backups

This is sequential by nature, and very difficult to scale by parallelizing

In this post, I look at the volume of logging generated by some DML in Postgres and Oracle. I know Oracle quite well and just start to look at Postgres. The comparison here is not a contest but a way to better understand. For example, the default behavior of Postgres, with full_page_writes=on, is very similar to Oracle ‘begin backup’ mode. The comparison makes no sense for most of Postgres DBAs, but probably helps Oracle DBAs to understand it.