Lab: Introduction to Oracle Solaris 11 ZFS File System

Hands-On Labs Of the System Admin and Developer Community of OTN

Oracle Solaris ZFS is a revolutionary file system that changes the way we look at storage allocation for open systems. This session is a hands-on tutorial on the basics of Oracle Solaris ZFS. It describes how devices are used in storage pools and considers performance and availability. It also looks at the various types of Oracle Solaris ZFS datasets that can be created and when to use each type. Participants will learn about file system snapshots, cloning data, allocation limits and recovering from common errors.

In the following exercises, we will create some zpools and explore different types of virtual devices (vdevs). We will also create two different types of ZFS datasets, file systems and volumes. We will customize some properties, snapshot and clone them, and finally perform some upgrades. In the advanced section, we will look at how some of the other Oracle Solaris services, like NFS (Network File System) and FMA (Fault Management Architecture) are tied into ZFS.

These exercises are meant to explore some of the possibilities. Armed with the manual pages and a group of willing assistants, you are encouraged to explore any other features of ZFS.

Lab activities

Activity

Estimated Time

Exercises

Lab Overview

5 min

N/A

Working with Pools

15 min

Yes

Working with File Systems

20 min

Yes

Error handling and recovery

10 min

Yes

Lab Overview

In this lab will be be using a VirtualBox guest for all of the exercises. We will be using a combination of flat files and virtual disks for different parts of the lab. Here is a quick overview of the configuration before we get started.

Lab Setup

We need to add the two 8 GB virtual disks used throughout this lab to our VirtualBox guest.

If the Solaris virtual machine is running, shut it down.

In VirtualBox, select the settings for the OracleSolaris11_11-11 machine and select the Storage category on the left. Then click the Add Controller icon to add a SAS Controller:

The click the icon to add a new disk to the SAS Controller:

Create a new disk:

This will launch the Create New Virtual Disk Wizard:

Click Next. We'll use the default Dynamically expanding storage type:

Click Next. Set the Disk's name to 8GDisk1.vdi. Set its size to 8 GB.

Click Next.

Then Create.

Repeat those steps, naming the second disk 8GDisk2.

Start the Solaris VM.

This will be an interactive lab run in a GNOME terminal window. Once logged in, bring up a terminal window and become the root user. The root password is the password you defined when you have imported Oracle Solaris 11 VM appliance into Oracle VM VirtualBox.

Exercise 1: Working with Pools

In the ZFS file sytems, storage devices are grouped into pools, called zpools. These pools provide all of the storage allocations that are used by the file systems and volumes that will be allocated from the pool. Let's begin by creating a simple zpool, called datapool.

First we need some storage devices. We will create 4 files and use them for our first pool.

What we can see from this output that our new pool called datapool has a single ZFS virtual device (vdev) called raidz1-0. That vdev is comprised of our four disk files that we created in the previous step.

This type of vdev provides single device parity protection, meaning that if one device develops an error, no data is lost because it can be reconstructed using the remaining disk devices. This organization is commonly called a 3+1, 3 data disks plus one parity.

ZFS provides additional types of availability: raidz2 (2 device protection), raidz3 (3 device protection), mirroring and none. We will look at some of these in later exercises.

Before continuing, let's take a look at the currently mounted file systems.

NOTE: In Oracle Solaris 11 zfs list command shows how much space ZFS filesystems consume. In case you need to see how much space is available on non-ZFS filesystem, such as mounted over the network via NFS or another protocol, traditional df(1) command exists in Oracle Solaris 11 and can be used. System administrators familiar with df(1) can continue to use df(1), while using zfs list for ZFS filesystems is encouraged.

Notice that when we created the pool, ZFS also created the first file system and also mounted it. The default mountpoint is derived by the name of the pool, but can be changed if necessary. With ZFS there's no need to create a file system, make a directory to mount the file system. It is also unnecessary to add entries to /etc/vfstab. All of this is done when the pool is created, making ZFS much easier to use than traditional file systems.

Before looking at some other types of vdevs, let's destroy the datapool, and see what happens.

The usage error indicates that /dev/dsk/disk1 has been identified as being part of an existing pool called datapool. The -f flag to the zpool create command can override the failsafe in case datapool is no longer being used, but use that option with caution.

Adding capacity to a pool

Since we have two additional disk devices (disk3 and disk4), let's see how easy it is to grow a ZFS pool.

Notice that you don't have to grow file systems when the pool capacity increases. File systems can use whatever space is available in the pool, subject to quota limitations, which we will see in a later exercise.

Importing and exporting pools

ZFS zpools can also be exported, allowing all of the data and associated configuration information to be moved from one system to another. For this example, let's use two of our SAS disks (c4t0d0 and c4t1d0).

As before, we have created a simple mirrored pool of two disks. In this case, the disk devices are real disks, not files. In this case we've told ZFS to use the entire disk (no slice number was included). If the disk was not labeled, ZFS will write a default label.

Notice that we didn't have to tell ZFS where the disks were located. All we told ZFS was the name of the pool. ZFS looked through all of the available disk devices and reassembled the pool, even if the device names had been changed.

Without an argument, ZFS will look at all of the disks attached to the system and will provide a list of pool names that it can import. If it finds two pools of the same name, the unique identifier can be used to select which pool you want imported.

Pool Properties

There are a lot of pool properties that you might want to customize for your environment. To see a list of these properties, use zpool get

These properties are all described in the zpool(1M) man page. Type man zpool to get more information. To set a pool property, use zpool set. Note that not all properties can be changed (ex. version, free, allocated).

When you patch or upgrade Oracle Solaris, a new version of the zpool may be available. It is simple to upgrade an existing pool, adding the new functionality. In order to do that, let's create a pool using an older version number (yes, you can do that too), and then upgrade the pool.

And that's it - nothing more complicated than zpool upgrade. Now you can use features provided in the newer zpool version, like log device removal (19), snapshot user holds (18), etc.

One word of warning - this pool can no longer be imported on a system running a zpool version lower than 33.

We're done working with zpools. There are many more things you can do. If you want to explore, see the man page for zpool (man zpool) and ask a lab assistant if you need help.

Let's now clean up before proceeding.

# zpool destroy pool2
# zpool destroy datapool

Exercise 2: Working with Datasets (File Systems, Volumes)

Now that we understand how to manage ZFS zpools, the next topic are the file systems. We will use the term datasets because a zpool can provide many different types of access, not just through traditional file systems.

As we saw in the earlier exercise, a default dataset (file system) is automatically created when creating a zpool. Unlike other file system and volume managers, ZFS provides hierarchical datasets (peer, parents, children), allowing a single pool to provide many storage choices.

ZFS datasets are created, destroyed and managed using the zfs(1M) command. If you want to learn more, read the associated manual page by typing man zfs.

To begin working with datasets, let's create a simple pool, again called datapool and 4 additional datasets called bob joe fred and pat.

By using zfs list -r datapool, we are listing all of the datasets in the pool named datapool. As in the earlier exercise, all of these datasets (file systems) have been automatically mounted.

If this was a traditional file system, you might think there was 39.05 GB (7.81 GB x 5) available for datapool and its 4 datasets, but the 8GB in the pool is shared across all of the datasets. Let's see how that works.

Notice that in the USED column, datapool/bob shows 1GB in use. The other datasets show just the metadata overhead (21k), but their available space has been reduced to 6.81GB. That's because that is the amount of free space available to them after datapool/bob had consumed the 1GB.

Hierarchical Datasets

A dataset can have children, just as a directory can have subdirectories. For datapool/fred, let's create a dataset for documents, and then underneath that, additional datasets for pictures, video and audio.

ZFS Dataset Properties

A common question is when do you create a subdirectory and when would you use a dataset? The simple answer is if you would want to change any of the ZFS dataset properties between a parent and child, you would create a new dataset. Since properties are applied to a dataset, all directories in that dataset have the same properties. If you want to change one, like quota, you have to create a child dataset.

That's a lot of properties. Each of these is described in the {{ zfs(1M) }} manual page (man zfs).

Let's look at a few examples.

Quotas and Reservations

ZFS dataset quotas are used to limit the amount of space consumed by a dataset and all of its children. Reservations are used to guarantee that a dataset can consume a specified amount of storage by removing that amount from the free space that the other datasets can use.

The first thing to notice is that the available space for datapool/fred and all of its children is now 2GB, which was the quota we set with the command above. Also notice that the quota is inherited by all of the children.

The reservation is a bit harder to see.

Original pool size 7.81GB
In use by datapool/bob 1.0GB
Reservation by datapool/fred 1.5GB

ZFS Volumes (zvols)

So far we have only looked at one type of dataset: the file system. Now let's take a look at zvols and what they do.

Volumes provide a block level (raw and cooked) interface into the zpool. Instead of creating a file system where you place files and directories, a single object is created and then accessed as if it were a real disk device. This would be used for things like raw database files, virtual machine disk images and legacy file systems. Oracle Solaris also uses this for the swap and dump devices when installed into a zpool.

In this example, rpool/dump is the dump device for Solaris and it 516MB. rpool/swap is the swap device and it is 1GB. As you can see, you can mix files and devices within the same pool.

Use zfs create -V to create a volume. Unlike a file system dataset, you must specific the size of the device when you create it, but you can change it later if needed. It's just another dataset property.

# zfs create -V 2g datapool/vol1

This creates two device nodes: /dev/zvol/dsk/datapool/vol1 (cooked) and /dev/zvol/rdsk/datapool/vol1 (raw). These can be used like any other raw or cooked device. We can even put a UFS file system on it.

Expanding a volume is just a matter of setting the dataset property volsize to a new value. Be careful when lowering the value as this will truncate the volume and you could lose data. In this next example, let's grow our volume from 2GB to 4GB. Since there is a UFS file system on it, we'll use growfs to make the file system use the new space.

Now that we can create these point in time snapshots, we can use them to create new datasets. These are called clones. They are datasets, just like any other, but start off with the contents from the snapshot. Even more interesting, these clones only require space for the data that's different than the snapshot. That means that if 5 clones are created from a single snapshot, only 1 copy of the common data is required.

Remember that datapool/bob has a 1GB file in it? Let's snapshot it, and then clone it a few times to see this.

Notice that the 1GB has not been freed (avail space is still 3.28G), but the USEDSNAP value for datapool/bob has gone from 0 to 1GB, indicating that the snapshot is now holding that 1GB of data. To free that space you will have to delete the snapshot. In this case you would also have to delete any clones that are derived from it.

Now the 1GB that we deleted has been freed because the last snapshot holding it has been deleted.

One last example and we'll leave snapshots. You can also take a snapshot of a dataset and all of its children. A recursive snapshot is atomic, meaning that it is a consistent point in time picture of the contents of all of the datasets. Use -r for a recursive snapshot.

Compression

Compression is an interesting feature to be used with ZFS file systems. ZFS allows both compressed and noncompressed data to coexist. By turning on the compression property, all new blocks written will be compressed while the existing blocks will remain in their original state.

There are now 2 different 1GB files in /datapool/bob, but df only says 1GB is used. It turns out that mkfile creates a file filled with zeroes. Those compress extremely well - too well, as they take up no space at all. To make things even more fun, copy the compressed file back on top of the original and they will both be compressed, and you'll get an extra 1GB of free space back in the pool.

Exercise 3: ZFS Integration with other parts of Solaris

In this section, we will explore a few examples of Solaris services that are integrated with ZFS. The first of these is the NFS server.

NFS

Each file system dataset has a property called sharenfs. This can be set to the values that you would typically place in /etc/dfs/dfstab. See the manual page for share_nfs for details on specific settings.

Create a simple pool called datapool with 3 datasets, fred, barney and dino.

NOTE: Setting NFS properties has been changed in Oracle Solaris 11 comparing to earlier Oracle Solaris versions, see man zfs_share for details.
Now let's set a few properties to enable sharing the filesystem over NFS:

Error recovery and FMA

This is the most fun part of the ZFS hands on lab. In this part of the lab we are going to create a mirrored pool and place some data in it. We will then force some data corruption by doing some really dangerous things to the underlying storage. Once we've done this, we will watch ZFS correct all of the errors.

Since the data errors were injected silently, we had to tell ZFS to compare all of the replicas. zpool scrub does exactly that. When it finds an error, it generates an FMA error report and then tries to correct the error by rewriting the block, and reading it again. If too many errors are occuring, or the rewrite/reread cycle still fail, a hot spare is requested, if available. Notice that the hot spare is automatically resilvered and the pool is returned to the desired availability.

The zfs-diagnosis module was invoked each time an ZFS error was discovered. Once an unsatisfactory error threshold was reached, the zfs-retire agent was called to record the fault and start the hot sparing process. An error log message was written (syslog-msgs > 0).