19.6.Â Advanced Topics

19.6.1.Â Tuning

There are a number of tunables that can be adjusted to
make ZFS perform best for different
workloads.

vfs.zfs.arc_max
- Maximum size of the ARC.
The default is all RAM less 1Â GB,
or one half of RAM, whichever is more.
However, a lower value should be used if the system will
be running any other daemons or processes that may require
memory. This value can be adjusted at runtime with
sysctl(8) and can be set in
/boot/loader.conf or
/etc/sysctl.conf.

vfs.zfs.arc_meta_limit
- Limit the portion of the
ARC
that can be used to store metadata. The default is one
fourth of vfs.zfs.arc_max. Increasing
this value will improve performance if the workload
involves operations on a large number of files and
directories, or frequent metadata operations, at the cost
of less file data fitting in the ARC.
This value can be adjusted at runtime with sysctl(8)
and can be set in
/boot/loader.conf or
/etc/sysctl.conf.

vfs.zfs.arc_min
- Minimum size of the ARC.
The default is one half of
vfs.zfs.arc_meta_limit. Adjust this
value to prevent other applications from pressuring out
the entire ARC.
This value can be adjusted at runtime with sysctl(8)
and can be set in
/boot/loader.conf or
/etc/sysctl.conf.

vfs.zfs.vdev.cache.size
- A preallocated amount of memory reserved as a cache for
each device in the pool. The total amount of memory used
will be this value multiplied by the number of devices.
This value can only be adjusted at boot time, and is set
in /boot/loader.conf.

vfs.zfs.min_auto_ashift
- Minimum ashift (sector size) that
will be used automatically at pool creation time. The
value is a power of two. The default value of
9 represents
2^9 = 512, a sector size of 512 bytes.
To avoid write amplification and get
the best performance, set this value to the largest sector
size used by a device in the pool.

Many drives have 4Â KB sectors. Using the default
ashift of 9 with
these drives results in write amplification on these
devices. Data that could be contained in a single
4Â KB write must instead be written in eight 512-byte
writes. ZFS tries to read the native
sector size from all devices when creating a pool, but
many drives with 4Â KB sectors report that their
sectors are 512 bytes for compatibility. Setting
vfs.zfs.min_auto_ashift to
12 (2^12 = 4096)
before creating a pool forces ZFS to
use 4Â KB blocks for best performance on these
drives.

Forcing 4Â KB blocks is also useful on pools where
disk upgrades are planned. Future disks are likely to use
4Â KB sectors, and ashift values
cannot be changed after a pool is created.

In some specific cases, the smaller 512-byte block
size might be preferable. When used with 512-byte disks
for databases, or as storage for virtual machines, less
data is transferred during small random reads. This can
provide better performance, especially when using a
smaller ZFS record size.

vfs.zfs.prefetch_disable
- Disable prefetch. A value of 0 is
enabled and 1 is disabled. The default
is 0, unless the system has less than
4Â GB of RAM. Prefetch works by
reading larger blocks than were requested into the
ARC
in hopes that the data will be needed soon. If the
workload has a large number of random reads, disabling
prefetch may actually improve performance by reducing
unnecessary reads. This value can be adjusted at any time
with sysctl(8).

vfs.zfs.vdev.trim_on_init
- Control whether new devices added to the pool have the
TRIM command run on them. This ensures
the best performance and longevity for
SSDs, but takes extra time. If the
device has already been secure erased, disabling this
setting will make the addition of the new device faster.
This value can be adjusted at any time with
sysctl(8).

vfs.zfs.vdev.max_pending
- Limit the number of pending I/O requests per device.
A higher value will keep the device command queue full
and may give higher throughput. A lower value will reduce
latency. This value can be adjusted at any time with
sysctl(8).

vfs.zfs.top_maxinflight
- Maxmimum number of outstanding I/Os per top-level
vdev. Limits the
depth of the command queue to prevent high latency. The
limit is per top-level vdev, meaning the limit applies to
each mirror,
RAID-Z, or
other vdev independently. This value can be adjusted at
any time with sysctl(8).

vfs.zfs.l2arc_write_max
- Limit the amount of data written to the L2ARC
per second. This tunable is designed to extend the
longevity of SSDs by limiting the
amount of data written to the device. This value can be
adjusted at any time with sysctl(8).

vfs.zfs.l2arc_write_boost
- The value of this tunable is added to vfs.zfs.l2arc_write_max
and increases the write speed to the
SSD until the first block is evicted
from the L2ARC.
This “Turbo Warmup Phase” is designed to
reduce the performance loss from an empty L2ARC
after a reboot. This value can be adjusted at any time
with sysctl(8).

vfs.zfs.scrub_delay
- Number of ticks to delay between each I/O during a
scrub.
To ensure that a scrub does not
interfere with the normal operation of the pool, if any
other I/O is happening the
scrub will delay between each command.
This value controls the limit on the total
IOPS (I/Os Per Second) generated by the
scrub. The granularity of the setting
is determined by the value of kern.hz
which defaults to 1000 ticks per second. This setting may
be changed, resulting in a different effective
IOPS limit. The default value is
4, resulting in a limit of:
1000Â ticks/sec / 4 =
250Â IOPS. Using a value of
20 would give a limit of:
1000Â ticks/sec / 20 =
50Â IOPS. The speed of
scrub is only limited when there has
been recent activity on the pool, as determined by vfs.zfs.scan_idle.
This value can be adjusted at any time with
sysctl(8).

vfs.zfs.resilver_delay
- Number of milliseconds of delay inserted between
each I/O during a
resilver. To
ensure that a resilver does not interfere with the normal
operation of the pool, if any other I/O is happening the
resilver will delay between each command. This value
controls the limit of total IOPS (I/Os
Per Second) generated by the resilver. The granularity of
the setting is determined by the value of
kern.hz which defaults to 1000 ticks
per second. This setting may be changed, resulting in a
different effective IOPS limit. The
default value is 2, resulting in a limit of:
1000Â ticks/sec / 2 =
500Â IOPS. Returning the pool to
an Online state may
be more important if another device failing could
Fault the pool,
causing data loss. A value of 0 will give the resilver
operation the same priority as other operations, speeding
the healing process. The speed of resilver is only
limited when there has been other recent activity on the
pool, as determined by vfs.zfs.scan_idle.
This value can be adjusted at any time with
sysctl(8).

vfs.zfs.scan_idle
- Number of milliseconds since the last operation before
the pool is considered idle. When the pool is idle the
rate limiting for scrub
and
resilver are
disabled. This value can be adjusted at any time with
sysctl(8).

vfs.zfs.txg.timeout
- Maximum number of seconds between
transaction groups.
The current transaction group will be written to the pool
and a fresh transaction group started if this amount of
time has elapsed since the previous transaction group. A
transaction group my be triggered earlier if enough data
is written. The default value is 5 seconds. A larger
value may improve read performance by delaying
asynchronous writes, but this may cause uneven performance
when the transaction group is written. This value can be
adjusted at any time with sysctl(8).

19.6.2.Â ZFS on i386

Some of the features provided by ZFS
are memory intensive, and may require tuning for maximum
efficiency on systems with limited
RAM.

19.6.2.1.Â Memory

As a bare minimum, the total system memory should be at
least one gigabyte. The amount of recommended
RAM depends upon the size of the pool and
which ZFS features are used. A general
rule of thumb is 1Â GB of RAM for every 1Â TB of
storage. If the deduplication feature is used, a general
rule of thumb is 5Â GB of RAM per TB of storage to be
deduplicated. While some users successfully use
ZFS with less RAM,
systems under heavy load may panic due to memory exhaustion.
Further tuning may be required for systems with less than
the recommended RAM requirements.

19.6.2.2.Â Kernel Configuration

Due to the address space limitations of the
i386™ platform, ZFS users on the
i386™ architecture must add this option to a
custom kernel configuration file, rebuild the kernel, and
reboot:

options KVA_PAGES=512

This expands the kernel address space, allowing
the vm.kvm_size tunable to be pushed
beyond the currently imposed limit of 1Â GB, or the
limit of 2Â GB for PAE. To find the
most suitable value for this option, divide the desired
address space in megabytes by four. In this example, it
is 512 for 2Â GB.

19.6.2.3.Â Loader Tunables

The kmem address space can be
increased on all FreeBSD architectures. On a test system with
1Â GB of physical memory, success was achieved with
these options added to
/boot/loader.conf, and the system
restarted: