Tuning ZFS for Database Products

Review the following considerations when using ZFS with a database product.

If the database uses a fixed disk block or record size for I/O, set the ZFS recordsize property to match it. You can do this on a per-file system basis, even though multiple file systems might share a single pool.

With ZFS's copy-on-write design, tuning down the recordsize is a way to improve OLTP performance at the expense of batch reporting queries.

ZFS checksums every block stored on disk. This alleviates the need for the database layer to checksum data an additional time. If checksums are computed by ZFS instead of at the database layer, any discrepancy can be caught and fixed before the data is returned to the application.

UFS direct I/O is used to overcome some of the design deficiencies of UFS and to eliminate double buffering of data. In ZFS, the UFS design deficiencies do not exist and ZFS uses the primarycache and secondarycache properties to manage buffering data in the ARC. Note that using the secondarycache (L2ARC) property to improve random reads also requires the primarycache property to be enabled.

Keep pool space under 80% utilization to maintain pool performance.

Tuning ZFS for an Oracle Database

ZFS is recommended for any Oracle database version in single instance mode. ZFS
can be used with an Oracle RAC database when it is available
as a NFS-shared file system.

Review the following recommendations below for tuning ZFS for an Oracle database:

Verify that you are running the latest Oracle Solaris release

Start with the latest Oracle Solaris 10 or Oracle Solaris 11 release, with the Solaris 10 9/10 release as a minimum starting point.

For databases with high redo log activity, such as a typical OLTP database with many commits, use a separate LUN for a separate log device.

Create a storage pool for the archivelog

If available, a system's internal disk can handle this type of load. The archivelog file system can also be a file system in the dbpool.

# zpool create archivepool c0t5000C500335E106Bd0

Create the ZFS file systems and set the specific file system properties by using the following guidelines

Create separate file systems for redo, archive, undo, and temp database components using the default record size of 128 KB. The general rule is to set the file system recordsize = db_block_size for the file systems that contains Oracle data files. For table data and index components, create a file system with an 8 KB record size. Also consider providing metadata caching hints for your database file systems by using the primarycache property. For more information about ZFS file system properties, see Introducing ZFS Properties in Oracle Solaris ZFS Administration Guide.

Create file systems for the table data files and index data files with an 8 KB recordsize. Use the default value for primarycache.

ZFS aggregates read and write I/O and manages the priority of I/O before sending it to the driver level, which handles the device. The zfs_vdev_max_pending parameter defines the maximum number of I/Os that ZFS sends to any storage pool device.

In a legacy storage environment, the ssd_max_throttle and sd_max_throttle parameters define the maximum number of concurrent I/Os that the driver can send to the storage. By setting the zfs_vdev_max_pending default value equal to the value of the [s]sd_max_throttle parameter, we prevent ZFS from queuing I/O to yet another unnecessary SD layer.

If you have ssd:ssd_max_throttle or sd:sd_max_throttle in the /etc/system file in your existing environment, then set zfs:zfs_vdev_max_pending at the same value. For example, if the storage array administrator asked for the following setting:

set ssd:ssd_max_throttle=20

Then, also set this parameter as follows:

set ssd:ssd_max_throttle=20
set zfs:zfs_vdev_max_pending=20

Setting this parameter allows ZFS to control each LUN queue. This means that the total number of pending I/Os in the storage can grow as follows:

number of LUNs * ZFS_VDEV_MAX_PENDING

Allocate sufficient memory and swap resources

You can reduce ZFS memory consumption by tuning the zfs_arc_max parameter to a low value, but we still recommend provisioning enough memory to cache metadata for the actively used portion of the database, which is estimated at 1.5% with an 8 KB ZFS record size and proportionately less or more with larger or smaller records. The file system that hold index files is the one that has the largest benefit from file system caching because it is the last one to invalidate in case of lack of memory. The zfs_arc_max parameter is in bytes and accepts decimal or hexadecimal values. The following example sets this parameter to 2 GB: