{{Note|You can also set the discard flag with tune2fs: tune2fs -o discard /dev/sda1.}}

{{Note|You can also set the discard flag with tune2fs: tune2fs -o discard /dev/sda1.}}

+

{{Accuracy}}

{{Warning|It is critically important that users switch the controller driving the SSD to AHCI mode (not IDE mode) to ensure that the kernel is able to use the TRIM command.}}

{{Warning|It is critically important that users switch the controller driving the SSD to AHCI mode (not IDE mode) to ensure that the kernel is able to use the TRIM command.}}

{{Warning|Users need to be certain that kernel version 2.6.33 or above is being used AND that their SSD supports TRIM before attempting to mount a partition with the discard flag. Data loss can occur otherwise!}}

{{Warning|Users need to be certain that kernel version 2.6.33 or above is being used AND that their SSD supports TRIM before attempting to mount a partition with the discard flag. Data loss can occur otherwise!}}

Introduction

Solid State Drives (SSDs) are not PnP devices. Special considerations such as partition alignment, choice of file system, TRIM support, etc. are needed to setup SSDs for optimal performance. This article attempts to capture referenced, key learnings to enable users to get the most out of SSDs under Linux. Users are encouraged to read this article in its entirety before acting on recommendations as the content is organized by topic, not necessarily by any systematic or chronologically relevant order.

Note: This article is targeted at users running Linux, but much of the content is also relevant to our friends using both Windows and MacOS X.

Cells wear out. Consumer MLC cells at mature 50nm processes can handle 10000 writes each; 35nm generally handles 5000 writes, and 25nm 3000 (smaller being higher density and cheaper). If writes are properly spread out, are not too small, and align well with cells, this translates into a lifetime write volume for the SSD that is a multiple of its capacity. Daily write volumes have to be balanced against life expectancy.

Firmwares and controllers are complex. They occasionally have bugs. Modern ones consume power comparable with HDDs. They implement the equivalent of a log-structured filesystem with garbage collection. They translate SATA commands traditionally intended for rotating media. Some of them do on the fly compression. They spread out repeated writes across the entire area of the flash, to prevent wearing out some cells prematurely. They also coalesce writes together so that small writes aren't amplified into as many erase cycles of large cells. Finally they move cells containing data so that the cell doesn't lose its contents over time.

Tips for Maximizing SSD Performance

Partition Alignment

High-level Overview

Proper partition alignment is essential for optimal performance and longevity. Key to alignment is partitioning to (at least) the EBS (erase block size) of the SSD.

Note: The EBS is largely vendor specific; a google search on the model of interest would be a good idea! The Intel X25-M for example is thought to have an EBS of 512 KiB, but Intel has yet to publish anything officially to this end.

Note: If you don't know the EBS of your SSD, you can still use a size of 512 KiB (or 1024 KiB if you want to be sure and you don't care loosing the fisrt MiB of your disk). Those numbers are greater or equal than all the current EBS. Aligning partitions for such an EBS will result in partitions also aligned for all lesser sizes. This is how Windows Seven and Ubuntu "optimises" partitions to work with SSD.

If the partitions aren't aligned to begin at multiples of the EBS (512 KiB for example), aligning the file system is a pointless exercise because everything is skewed by the start offset of the partition. Traditionally, hard drives were addressed by indicating the cylinder, the head, and the sector at which data was to be read or written. These represented the radial position, the drive head (= platter and side) and the axial position of the data respectively. With LBA (logical block addressing), this is no longer the case. Instead, the entire hard drive is addressed as one continuous stream of data.

Using GPT - RECOMMENDED METHOD

GPT is an alternative, contemporary partitioning style. The GPT-able tool equivalent to fdisk, gdisk, can perform partitions alignment automatically on a 2048 sectors (or 1024KiB) block size base which should be compatible with the vast majority of SSD if not all. GNU parted also support GPT, but is less user-friendly for aligning partitions.

Gdisk Usage Summary:

Install gdisk from the extra repository.

Simply start gdisk against your SSD.

If the SSD is brand new or if you want to start over, create a new empty GUID partition table (aka GPT) with the 'o' command.

Create a new partition with the 'n' command (primary type/1st partition).

Assuming your partition is new, gdisk will pick the highest possible alignment. Otherwise, it will pick the largest power of two that divides all partition offsets.

If you choose to start on a sector before the 2048th gdisk will automatically shift your partition start to the 2048th disk sector. This is to ensure a 2048-sectors alignment (as a sector is 512B, this is a 1024KiB alignment which should fit any SSD NAND erase block).

Use the +x{M,G} format to extend the partition x megabytes or gigabytes, if you choose a size that is not a multiple of the alignment size (1024kiB) gdisk will shrink the partition to the nearest inferior multiple).

Select the partition's type id, the default, 'Linux/Windows data' (code 0700), should be fine for most use. Press L to show the codes list.

Assign other partitions in a like fashion.

Write the table to disk and exit via the 'w' command.

Create the filesystems as usual.

Warning: If you plan to use the disk as boot-disk on a BIOS based system (most systems except Apple computers and some very rare motherboard models with intel chipset) you may have to create, preferably at the disk's beginning, a 1MiB partition with the type BIOS boot partition (code ef02). This is necessary if you intend to use GRUB2, but for Syslinux it's enough to make a separate /boot partition at this point. See GPT for more information.

Warning: GRUB legacy does not support GUID partitioning scheme, you have to use burg, GRUB2 or Syslinux.

Warning: If you plan to dual boot with Windows (XP, Vista or 7) do NOT use GPT since they do NOT support booting from a GPT disk! You will need to use the depreciated MBR method described below! This limitation doesn't apply if you run an EFI-powered machine and Windows Vista (64bits) or Seven (both 32 and 64bits).

Detailed Usage Example

Note: The following section is meant to be illustrative of the process of partitioning a Crucial C300 Real SSD with a single partitions : 10GiB, Linux partition. Even if it should work on any SSD, with adapt it to your own partition scheme. Again, don't forget to add a 1MiB BIOS boot partition as the first partition if needed.

ommand (? for help): w
Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!
Do you want to proceed, possibly destroying your data? (Y/N): y
OK; writing new GUID partition table (GPT).
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
[root@archlinux ~]#

Now create the file system as usual :

# mkfs.ext4 /dev/sda1

Using MBR - DEPRECIATED METHOD - Using GPT is Recommended

The Linux utility fdisk, however, still uses a virtual C-H-S system where users can define any number of heads and sectors (the cylinders are calculated automatically from the drive's capacity), with partitions always starting and ending at intervals of heads x cylinders. Thus, one needs to choose a number of heads and sectors of which the SSD's erase block size is a multiple. This is accomplished by setting the number of heads (or tracks per cylinder) and sectors (per track) to coincide with the EBS.

Ted Tso recommends using a setting of 224*56 (=2^8*49) which results in (2^8*512=) 128 KiB alignment:

# fdisk -H 224 -S 56 /dev/sdX

While others advocate a setting of 32*32 (=2^10) which results in (2^10*512=) 512 KiB alignment:

# fdisk -H 32 -S 32 /dev/sdX

How does the math work out? The alignment number is the largest power-of-two divisor of the cylinder boundary positions on the disk. The size in bytes of the cylinders is H*S*512 = (tracks per cylinder) * (sectors per track) * (sector size). Factorize H, S and sector size (512=2^9) into prime factors, and take all the 2s. In the first case above we have to ignore the non-power-of-two factor of 7^2=49.

Note: In order to be compatible with MS-DOS, a partition starting on the first cylinder would skip one track, reducing its alignment to track level (4k for -S 56 and 16k for -S 32). The easiest way to maximally align the first partition is to start it at cylinder 2 rather than the default of cylinder 1 as shown in the example below.

Fdisk Usage Summary

Start fdisk using the correct values for H and S specific to your SSD as described above.

If the SSD is brand new, create a new empty DOS partition table with the 'o' command.

Create a new partition with the 'n' command (primary type/1st partition).

Start on sector 2 rather than on sector 1 to ensure MS-DOS compatibility if this is required; accept the default value if not.

Use the +xG format to extend the partition x gigabytes.

Change the partition's system id from the default type of Linux (type 83) to the desired type via the 't' command. This is an optional step should the user wish to create another type of partition for example, swap, NTFS, etc. Note that a complete listing of all valid partition types is available via the 'l' command.

Assign other partitions in a like fashion.

Write the table to disk and exit via the 'w' command.

When finished, users may format their newly created partitions with the 'mkfs.x /dev/sdXN' where x is the filesystem, X is the drive letter, and N is the partition number.
The following example will format the first partition on the first disk to ext4 using the defaults specified in Template:Filename:

# mkfs.ext4 /dev/sda1

Warning: Using the mkfs command can be dangerous as a simple mistake can result in formatting the WRONG partition and in data loss! TRIPLE check the target of this command before hitting the Enter key!

Detailed Usage Example

Note: The following section is meant to be illustrative of the process of partitioning an Intel X25-M SSD with a single 12 Gig, Linux partition. It is in no way the definitive method for doing so, nor are the switches used to start fdisk in this specific example necessarily the correct values for other brands/models of SSDs!

# fdisk -H 32 -S 32 /dev/sdb
The number of cylinders for this disk is set to 15711.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): o
Building a new DOS disklabel with disk identifier 0x8cb3d286.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.
The number of cylinders for this disk is set to 15711.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-15711, default 1): 2
Last cylinder, +cylinders or +size{K,M,G} (2-15711, default 15711): +12G
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.

The rest of the SSD was partitioned in a like fashion giving two partitions totally. Here is the output of an fdisk list command:

Warning: Pay attention to this last sanity check step. If the heads and sectors/track do not remain 32/32, it is either due to a bug in fdisk, or for an unknown reason, the partitions are not aligned! See the [using cfdisk/post process observation] wiki page for a work around.

Special Considerations for RAID0 Setups with Multiple SSDs

Encrypted partition

But remember that DISCARD/TRIM feature is NOT SUPPORTED by device-mapper (but they're working on it, see here)

Mount Flags

There are several key mount flags to use in one's Template:Filename entries for SSD partitions.

noatime - Reading accesses to the file system will no longer result in an update to the atime information associated with the file. The importance of the noatime setting is that it eliminates the need by the system to make writes to the file system for files which are simply being read. Since writes can be somewhat expensive as mentioned in previous section, this can result in measurable performance gains. Note that the write time information to a file will continue to be updated anytime the file is written to with this option enabled.

discard - The discard flag will enable the benefits of the TRIM command so long as one is using kernel version >=2.6.33. It does not work with ext3; using the discard flag for an ext3 root partition will result in it being mounted read-only.

Note: You can also set the discard flag with tune2fs: tune2fs -o discard /dev/sda1.

The factual accuracy of this article or section is disputed.

Reason:please use the first argument of the template to provide a brief explanation. (Discuss in Talk:Solid State Drives#)

Warning: It is critically important that users switch the controller driving the SSD to AHCI mode (not IDE mode) to ensure that the kernel is able to use the TRIM command.

Warning: Users need to be certain that kernel version 2.6.33 or above is being used AND that their SSD supports TRIM before attempting to mount a partition with the discard flag. Data loss can occur otherwise!

Warning: Using an OCZ Vertex II SSD (supports TRIM) and kernel 2.6.37.4 on Arch, some people experienced trouble with the discard option: all changes made to the filesystem would disappear after a reboot! According to this, discard is disabled by default because it not stable enough.

Special considerations for Mac computers

By default, Apple's firmware switches SATA drives into IDE mode (not AHCI mode) when booting any OS besides Mac OS. It is easy to switch back to AHCI if you are using GRUB2 with an Intel SATA controller.

First determine the PCI identifier of your SATA controller. Run the command

# lspci -nn

and find the line that says "SATA AHCI Controller". The PCI identifier is in square brackets and should look like 8086:27c4 (but the last digits may be different).

Now edit /boot/grub/grub.cfg and add the line

# setpci -d 8086:27c4 90.b=40

right above the "set root" line of each OS you want to enable AHCI for. Be sure to substitute the appropriate PCI identifier.

I/O Scheduler

Note: This should not be necessary since the cfq scheduler checks if the drive is non-rotational and then behaves correctly for an SSD.

Consider switching from the default scheduler, which under Arch is cfq (completely fair queuing), to the noop or deadline scheduler for an SSD. Using the noop scheduler, for example, simply processes requests in the order they are received, without giving any consideration to where the data physically resides on the disk. This option is thought to be advantageous for SSDs since seek times are identical for all sectors on the SSD.

However, for some SSDs, particularly earlier, JMicron-based ones, you may experience better performance sticking with the default scheduler (see here for one such benchmark); on these, while seek times are similar for all sectors, random access throughput is bad enough to offset any advantage. If your SSD was manufactured within the last year or so, or is made by Intel, this probably doesn't apply to you.

For more on schedulers, see this Linux Magazine article (needs registration).

Swap Space on SSDs

One can place a swap partition on an SSD. Note that most modern desktops with an excess of 2 Gigs of memory rarely use swap at all. The notable exception is systems which make use of the hibernate feature. The following is recommended tweak for SSDs using a swap partition that will reduce the "swapiness" of the system thus avoiding writes to swap.

SSD Memory Cell Clearing

On occasion, users may wish to completely reset an SSD's cells to the same virgin state they were at the time he/she installed the device thus restoring it to its factory default write performance. Write performance is known to degrade over time even on SSDs with native TRIM support. TRIM only safeguards against file deletes, not replacements such as an incremental save.

Tips for Minimizing SSD Read/Writes

An overarching theme for SSD usage should be 'simplicity' in terms of locating high-read/write operations either in RAM (Random Access Memory) or on a physical HDD rather than on an SSD. Doing so will add longevity to an SSD. This is primarily due to the large erase block size (512 KiB in some cases); a lot of small writes result in huge effective writes.

Note: A 32GB SSD with a mediocre 10x write amplification factor, a standard 10000 write/erase cycle, and 10GB of data written per day, would get an 8 years life expectancy. It gets better with bigger SSDs and modern controllers with less write amplification.

Use "iotop -oPa" and sort by disk writes to see how much your programs are writing to disk.

Intelligent Partition Scheme

Consider relocating the /var partition to a physical disc on the system rather than on the SSD itself to avoid read/write wear. Many users elect to keep only /, and /home on the SSD (/boot is okay too) locating /var and /tmp on a physical HDD.

If the SSD is the only storage device on the system (i.e. no HDDs), consider allocating a separate partition for /var to allow for better crash recovery for example in the event of a broken program wasting all the space on / or if some run away log file maxes out the space, etc.

Another intelligent option is to locate /tmp is into RAM provided the system has enough to spare. See the next section for more on this procedure.

The noatime Mount Flag

Assign the noatime flag to partitions residing on SSDs. See the Mount Flags section below for more.

Locate /tmp in RAM

For systems with >=2 gigs of memory, locating /tmp in the RAM is desirable and easily achieved by first clearing the physical /tmp partition and then mounting it to tmpfs (RAM) in the Template:Filename. The following line gives an example:

none /tmp tmpfs nodev,nosuid,noatime,size=1000M,mode=1777 0 0

Locate Browser Profiles in RAM

One can easily mount browser profile(s) such as firefox, chromium, etc. into RAM via tmpfs and also use rsync to keep them synced with HDD-based backups. For more on this procedure, see the Speed-up Firefox Using tmpfs article. In addition to the obvious speed enhancements, users will also save read/write cycles on their SSD by doing so.

Compiling in /dev/shm

Intentionally compiling in /dev/shm is a great idea to minimize this problem. For systems with >4 Gigs of memory, the shm line in Template:Filename can be tweaked to use more than 1/2 the physical memory on the system via the size flag.

Example of a machine with 8 GB of physical memory:

shm /dev/shm tmpfs nodev,nosuid,size=6G 0 0

Disabling Journaling on the Filesystem?

Using a journaling filesystem such as ext3 or ext4 on an SSD WITHOUT a journal is an option to decrease read/writes. The obvious drawback of using a filesystem with journaling disabled is data loss as a result of an ungraceful dismount (i.e. post power failure, kernel lockup, etc.). With modern SSDs, Ted Tso advocates that journaling can be enabled with minimal extraneous read/write cycles under most circumstances:

Amount of data written (in megabytes) on an ext4 file system mounted with noatime.

operation

journal

w/o journal

percent change

git clone

367.0

353.0

3.81 %

make

207.6

199.4

3.95 %

make clean

6.45

3.73

42.17 %

"What the results show is that metadata-heavy workloads, such as make clean, do result in almost twice the amount data written to disk. This is to be expected, since all changes to metadata blocks are first written to the journal and the journal transaction committed before the metadata is written to their final location on disk. However, for more common workloads where we are writing data as well as modifying filesystem metadata blocks, the difference is much smaller."

Note: The make clean example from the table above typifies the importance of intentionally doing compiling in /dev/shm as recommended in the preceding section of this article!

Choice of Filesystem

Btrfs

Btrfs support has been included with the mainline 2.6.29 release of the Linux kernel. Some feel that it is not mature enough for production use while there are also early adopters of this potential successor to ext4. It should be noted that at the time this article was originally written (27-June-2010), a stable version of btrfs did not exist. See this blog entry for more on btrfs. Be sure to read the btrfs wiki as well.

Warning: At the time this entry was written (21-Nov-2010) there is NO fsck utility to fix/diagnose errors on btrfs partitions. While Btrfs is stable on a stable machine, it is currently possible to corrupt a filesystem irrecoverably in the event of a crash or power loss on disks that don't handle flush requests correctly.

Ext4

Ext4 is another filsesystem that has support for SSD. It is considered as stable since 2.6.28 and is mature enough for daily use. Contrary to Btrfs, ext4 does not automatically detects the disk nature and you have to explicitly enable the TRIM command support using the discard mounting option in your fstab (or with tune2fs -o discard /dev/sdaX).
See the official in kernel tree documentation for further information on ext4.

SSD Benchmarking

See the SSD Benchmarking article for a general process of benchmarking your SSD or to see some of the SSDs in the database.