There is a good diagram of how all these levels of caching fit together at [http://momjian.us/main/writings/pgsql/hw_selection.pdf Database Hardware Selection Guidelines] (PDF, Bruce Momjian)

There is a good diagram of how all these levels of caching fit together at [http://momjian.us/main/writings/pgsql/hw_selection.pdf Database Hardware Selection Guidelines] (PDF, Bruce Momjian)

+

+

The recommended way to test reliability of all the caches on a server is with [http://highperfpostgres.com/disk-plug-pull-testing/ Disk Plug Pull Testing]. A plug pull test that fails is always accurate, but passing such a test cannot guarantee unsafe writes are impossible.

== References ==

== References ==

Latest revision as of 02:45, 9 June 2013

Databases have many levels of read and write caches they use. There's the database's own dedicated memory (shared_buffers), the operating system cache, potentially a disk controller cache (typical on a RAID card), and caches on each individual hard drive. PostgreSQL expects some of these caches to work effectively in order to obtain good performance. But periodically, the database issues a synchronous write request: requesting that something must be committed to disk before the database can proceed. If any of those writes say they are successful to the database, but the data is not really safe, database corruption can be expected if there is a server crash for any reason. Software bug, power outage, UPS failure, accidentally unplugging the server, any of these things can lead to a corrupted database that will not start normally if your system has unsafe writes.

The recommended way to test reliability of all the caches on a server is with Disk Plug Pull Testing. A plug pull test that fails is always accurate, but passing such a test cannot guarantee unsafe writes are impossible.

Server Configuration Tuning: lists things to be aware of when setting wal_sync_method, as well as warnings about settings changes that can make your server unsafe--fsync, full_page_writes, and synchronous_commit (Greg Smith)

The first and second generation Intel SSD drives such as the Intel X25-E will write unsafely, and this behavior is impossible to fix without serious impact to both the speed and longevity of the drive. See SSD, XFS, LVM, fsync, write cache, barrier and lost transactions (Vadim Tkachenko) for an introduction. Their consumer drives such as the X-25-M G2 have been reported to be even worse (Evan Jones). The 3rd generation Intel 320 and 710 series drives do not have any of these problems, having added a capacitor to provide battery-backup for flushing the drive cache when power drops.