I have a very large Maildir I am copying to a new machine (over 100BASE-T) with rsync. The progress is slow. VERY SLOW. Like 1 MB/s slow. I think this is because it is a lot of small files that are being read in an order that essentially is random with respect to where the blocks are stored on disk, causing a massive seek storm. I get similar results when trying to tar the directory. Is there a way to get rsync/tar to read in disk block order, or otherwise overcome this problem?

Edit: I tried tar cf /dev/zero Maildir/ and on the old system, this took 30 minutes! On the new system when the rsync finally finished, the same test took 18 minutes. Dumping the same directory on the old system took 8 minutes, and on the new system, dump -0f /dev/zero -b 1024 /home/psusi/Maildir/ finished in only 30 seconds.

3 Answers
3

I ended up writing a little python script to calculate the correlation between directory names and inodes, inodes and data blocks, and directory names to data blocks. It turns out that ext4 tends to have rather poor correlation between the order the file names appear in the directory, and where they are stored on disk. After discussing it on the ext4 mailing list, it turns out that this is the result of the hashed directory indexes used to speed up lookups in large directories. The names are stored in hash order, which effectively scrambles their order relative to anything else.

It seems to me and at least one other commenter that this is a deficiency in the fs that should be fixed. Ted Ts'o ( the ext maintainer ) feels that it would be too difficult to do in the fs, and that good tools ( like rsync and tar ) should have an option to sort the directory by inode number before reading the files.

So it looks like feature enhancement requests need filed for rsync and tar.

I have to agree with Ted Ts'o that the performance for this use case needs to be fixed at application level. There's no reason to assume that file data should be stored in alphabetical order on the storage device. If some another application wants to read files in the order of last modification time, the fs cannot do both operations with high speed anyway.
–
Mikko RantalainenApr 11 '14 at 8:51

@MikkoRantalainen, this isn't about what arbitrary order the application "wants", but what the best order is based on how the filesystem works internally. Applications can not really be expected to know that, so the fs should be trying to make sure that it lists the files in the best order to read them, which may not always be inode order.
–
psusiApr 11 '14 at 13:48

@psusi, how is the fs supposed to handle the case where you have two applications that require files in different order? The fs cannot optimize the physical storage order for both! Any application interested in performance should request files in storage order from the fs. If POSIX does not allow for such ordering (other than by inode order which may or may not match the actual physical storage order), that's a shortcoming of POSIX, not the fs.
–
Mikko RantalainenApr 15 '14 at 5:43

@MikkoRantalainen, the ordering is not a requirement of the application, it is a requirement of the filesystem, hence, why the filesystem should order them however it is best.
–
psusiApr 15 '14 at 16:40

How many files are we talking about? find /path/to/your/maildir/ | wc -l should give you a rough indication. Hundreds of thousands should be okay. Hundreds of millions might suggest you need to prune, archive and generally clean up.

Is the disk slow? There are many benchmarks available like a the comprehensive bonnie++ through to the quick and simple Disk Utility benchmarker. Run one and see if you're suffering.

That may raise hardware issues - replace for something faster

Filesystem issues - are you using something known to be very slow at high random read IOPS?

But ultimately, tarring and then transferring should give you the best overall throughput at the cost of you needing to be there to set up the transfer once you've generated the tar.

Maybe a hundred thousand files, but not millions. The disk on the old system does somewhere around 50-60 mb/s, and the new system is a raid5 that does around 160. Both greatly exceeding the 11 or so mb/s the fast ethernet can handle. The problem seems to be the random access pattern.
–
psusiMar 11 '11 at 1:36

Try setting disabling atime tracking or using relative atime on the new disk partition. This will limit overhead. Changing from a non-journaling file system like ext2 to a journaling file system like ext3 or ext4 will have some performance hits

When I moved Maildirs, I did a preparatory rsync to get all the directories in place ahead of time. Then, there were only updates to do.

When you are ready to do the real move you may want to ensure the directories are stable.

place the SMTP daemon in queue only mode,

disable queue runs by the SMTP daemon, and

disable access by the user.

Reactivate after the file move is done.

EDIT: I think you have identified the problem. Tar and rsync will both walk the directories. Due to normal file changes in the Maildir, files for each directory will end up scattered around the disk. A tool like dump would read the partition in block order, but would replicate the problem to the new partition. A second rsync should run much faster than the second.

Tar bypasses atime updates, and I think rsync does too. This is with ext4.
–
psusiMar 11 '11 at 14:29

@psusi: Atime change is general fix for heavily read partitions. On second thought it will no help writing files from tar or rsync. The directories will be written anyway.
–
BillThor Mar 11 '11 at 14:50

Dump doesn't replicate the problem to the new partition. While dump reads the raw block device, restore does not write to the raw block device; it goes through the normal file IO. Also I believe that dump is reading in inode order. This is why it was so fast on the new disk, since there is likely very strong correlation between inode and block order, but on the old disk this correlation was not so strong, but better than the correlation between file names and blocks, which is why it did much better than tar.
–
psusiMar 11 '11 at 15:57

@psusi: It may compress any free space, but the inodes in an older Maildir directory will be relatively random, as will be block location of the files. Files may move, but the randomness of location will likely remain. It may be somewhat better, but could be worse. rsync and tar should make the inodes and space allocation relatively sequential, especially on a new partition. The second rsync I suggested will start the randomization process.
–
BillThor Mar 11 '11 at 16:03

@BillThor yes, whether they get to the new partition via rsync, tar, or dump, they generally will start out in pretty good order. The question is how to fix the old Maildir so that reading it with tar or rsync isn't so slow? Or maybe fix tar and rsync so they read in a more optimal order.
–
psusiMar 11 '11 at 17:06