Is there a way in which ZFS can be prompted to redistribute a given filesystem over the all of the disks in its zpool?

I'm thinking of a scenario where I have a fixed size ZFS volume that's exported as a LUN over FC. The current zpool is small, just two 1TB mirrored disks, and the zvol is 750GB in total. If I were to suddenly expand the size of the zpool to, say, 12 1TB disks, I believe the zvol would still effectively be 'housed' on the first two spindles only.

Given that more spindles = more IOPS, what method could I use to 'redistribute' the zvol over all 12 spindles to take advantage of them?

There is no reason for the zvol to be stored on the initial devices only. If you enlarge the pool, ZFS will span the updated data on all of the available underlying devices. There is no fixed partitioning with ZFS.

In my experience, this isn't true. While there's no 'fixed paritioning', ZFS won't move data around of its own free will outside of client IO requests. If you create the scenario I described, add more disks and then do some heavy IO on the original LUN, you'll only see activity on the first two disks in the array, because that's where the data is. ewwhite points out that over time it gets balanced, but I'm curious to know if there's a faster way of doing this.
–
growseDec 16 '11 at 10:23

Sorry if I was unclear. Of course, the existing data won't move magically. Only updated data will be relocated evenly. That's what I meant with "new IOs". As far as existing static data is concerned, caching will also improve performance as long as blocks are read more than once.
–
jlliagreDec 16 '11 at 16:47

One just needs to install the PHP CLI tool with sudo apt-get install php5-cli and run the script, passing the path to your pools data as the first argument. E.g.

php main.php /path/to/my/files

Ideally you should run the script twice across all of the data in the pool. The first run will balance the drive utilization, but individual files are going to be overly allocated to the drives that were added last. The second run will ensure that each file is "fairly" distributed across drives. I say fairly instead of evenly because it will only be evenly distributed if you aren't mixing drive capacities as I am with my raid 10 of different size pairs (4tb mirror + 3TB mirror + 3TB mirror).

Reasons for Using a Script

I have to fix the problem "in-place". E.g. I cannot write the data out to another system, delete it here and write it all back again.

I filled my pool over 50%, so I could not just copy the entire filesystem at once before deleting the original.

If there are only certain files that need to perform well, then one could just run the script twice over those files. However, the second run is only effective if the first run managed to succeed in balancing the drives utilization.

I have a lot of data and want to be able to see an indication of progress being made.

How Can I Tell if Even Drive Utilization is Achieved?

Use the iostat tool over a period of time (e.g. iostat -m 5) and check the writes. If they are the same, then you have achieved an even spread. They are not perfectly even in the screenshot below because I am running a pair of 4TBs with 2 pairs of 3TB drives in RAID 10, so the two 4's will be written to slightly more.

If your drive utilization is "unbalanced", then iostat will show something more like the screenshot below where the new drives are being written to disproportionately. You can also tell that they are the new drives because the reads are at 0 since they have no data on them.

The script is not perfect, only a workaround, but it works for me in the meantime until ZFS one day implements a rebalancing feature like BTRFS has (fingers crossed).

Well this is a bit of a hack but given that you have stopped the machine using the zvol, you could zfs send the file system to a local file on localhost called bar.zvol, and then you receive the file back system again. That should rebalance the data for you.

@reco: zvols aren't file systems so you cannot delete or duplicate data on them. You might overwrite data but that would corrupt it unless you do it with the same content which would effectively span the data on the underlying volumes, but this is what ewwhite already suggested one year ago.
–
jlliagreNov 24 '12 at 0:16

yes you are right. i was looking around and researching the same topic. what i realized is that with zfs redistributing data over the vdevs is not needed. but if you still want to for any reason duplicating data and deleting the originals will accelerate what zfs would do over time.
–
recoNov 24 '12 at 0:55

1

Redistributing data over the vdevs is a legitimate request. I'm afraid you are still missing the question is about zvols, not file systems. You cannot duplicate or delete data on a volume, that doesn't make sense.
–
jlliagreNov 24 '12 at 1:55