Reliability is an important property of any serious database
system, and PostgreSQL does
everything possible to guarantee reliable operation. One aspect
of reliable operation is that all data recorded by a committed
transaction should be stored in a nonvolatile area that is safe
from power loss, operating system failure, and hardware failure
(except failure of the nonvolatile area itself, of course).
Successfully writing the data to the computer's permanent storage
(disk drive or equivalent) ordinarily meets this requirement. In
fact, even if a computer is fatally damaged, if the disk drives
survive they can be moved to another computer with similar
hardware and all committed transactions will remain intact.

While forcing data to the disk platters periodically might
seem like a simple operation, it is not. Because disk drives are
dramatically slower than main memory and CPUs, several layers of
caching exist between the computer's main memory and the disk
platters. First, there is the operating system's buffer cache,
which caches frequently requested disk blocks and combines disk
writes. Fortunately, all operating systems give applications a
way to force writes from the buffer cache to disk, and
PostgreSQL uses those features.
(See the wal_sync_method
parameter to adjust how this is done.)

Next, there might be a cache in the disk drive controller;
this is particularly common on RAID controller cards. Some of these caches
are write-through, meaning writes are
sent to the drive as soon as they arrive. Others are write-back, meaning data is sent to the drive at
some later time. Such caches can be a reliability hazard because
the memory in the disk controller cache is volatile, and will
lose its contents in a power failure. Better controller cards
have battery-backup units
(BBUs), meaning the card has a
battery that maintains power to the cache in case of system power
loss. After power is restored the data will be written to the
disk drives.

And finally, most disk drives have caches. Some are
write-through while some are write-back, and the same concerns
about data loss exist for write-back drive caches as for disk
controller caches. Consumer-grade IDE and SATA drives are
particularly likely to have write-back caches that will not
survive a power failure. Many solid-state drives (SSD) also have
volatile write-back caches.

These caches can typically be disabled; however, the method
for doing this varies by operating system and drive type:

On Linux, IDE drives can
be queried using hdparm -I; write
caching is enabled if there is a *
next to Write cache. hdparm -W can be used to turn off write
caching. SCSI drives can be queried using sdparm. Use
sdparm --get=WCE to check whether
the write cache is enabled and sdparm
--clear=WCE to disable it.

On FreeBSD, IDE drives
can be queried using atacontrol and
write caching turned off using hw.ata.wc=0 in /boot/loader.conf; SCSI drives can be queried
using camcontrol identify, and the
write cache both queried and changed using sdparm when available.

On Solaris, the disk
write cache is controlled by format -e. (The
Solaris ZFS file system is
safe with disk write-cache enabled because it issues its own
disk cache flush commands.)

On Windows, if wal_sync_method is open_datasync (the default), write caching can
be disabled by unchecking My
Computer\Open\disk
drive\Properties\Hardware\Properties\Policies\Enable
write caching on the disk. Alternatively, set wal_sync_method to fsync or fsync_writethrough, which prevent write
caching.

On Mac OS X, write
caching can be prevented by setting wal_sync_method to fsync_writethrough.

Recent SATA drives (those following ATAPI-6 or later) offer a drive cache flush
command (FLUSH CACHE EXT), while SCSI
drives have long supported a similar command SYNCHRONIZE CACHE. These commands are not directly
accessible to PostgreSQL, but
some file systems (e.g., ZFS,
ext4) can use them to flush
data to the platters on write-back-enabled drives. Unfortunately,
such file systems behave suboptimally when combined with
battery-backup unit (BBU) disk
controllers. In such setups, the synchronize command forces all
data from the controller cache to the disks, eliminating much of
the benefit of the BBU. You can run the pg_test_fsync module to see if you are
affected. If you are affected, the performance benefits of the
BBU can be regained by turning off write barriers in the file
system or reconfiguring the disk controller, if that is an
option. If write barriers are turned off, make sure the battery
remains functional; a faulty battery can potentially lead to data
loss. Hopefully file system and disk controller designers will
eventually address this suboptimal behavior.

When the operating system sends a write request to the storage
hardware, there is little it can do to make sure the data has
arrived at a truly non-volatile storage area. Rather, it is the
administrator's responsibility to make certain that all storage
components ensure data integrity. Avoid disk controllers that
have non-battery-backed write caches. At the drive level, disable
write-back caching if the drive cannot guarantee the data will be
written before shutdown. If you use SSDs, be aware that many of
these do not honor cache flush commands by default. You can test
for reliable I/O subsystem behavior using diskchecker.pl.

Another risk of data loss is posed by the disk platter write
operations themselves. Disk platters are divided into sectors,
commonly 512 bytes each. Every physical read or write operation
processes a whole sector. When a write request arrives at the
drive, it might be for some multiple of 512 bytes (PostgreSQL typically writes 8192 bytes, or
16 sectors, at a time), and the process of writing could fail due
to power loss at any time, meaning some of the 512-byte sectors
were written while others were not. To guard against such
failures, PostgreSQL
periodically writes full page images to permanent WAL storage
before modifying the
actual page on disk. By doing this, during crash recovery
PostgreSQL can restore
partially-written pages from WAL. If you have file-system
software that prevents partial page writes (e.g., ZFS), you can
turn off this page imaging by turning off the full_page_writes
parameter. Battery-Backed Unit (BBU) disk controllers do not
prevent partial page writes unless they guarantee that data is
written to the BBU as full (8kB) pages.