I've been looking at our backups recently and noticed that the tape throughput is a lot lower when writing lots of small files, so was thinking of tarring those small files up into one big tar file and the writing that to tape instead of the small files directly. (Much like Tar: avoid archiving of files larger than certain size)

However, when I then write this tar file to tape am I going to have problems if there is a tape error during it? I mean, am I going to lose that whole (large) file containing a lot of smaller files, or will I just lose a particular block of that tar file and be able to recover the rest of the files?

Also, how do backup programs like Amanda or Bacula cope with lots of small files? Do they just write the files individually to tape or do they do something like this pre-tarring into larger files which will write faster?

Note : It might just be that our staging disks are too slow, but I'm assuming that small files cause a backup performance problem like this for most people.

1 Answer
1

First: Backing up tar files instead of single files is highly recommended to avoid the shoe shining effect, which is what you experience: The computer can't deliver files fast enough and the tape drive has to stop and before starting to write again wind back a little to find the precise point where the stream ended. This isn't only much slower but puts a lot of wear on both the drive and the tape (modern drives, i.e. LTO4, are said to be better at preventing/reducing this effect as they slow down when their input buffer runs empty and don't need to rewind).

Second: It is possible to skip damaged sections of tar files, at the very least for uncompressed archives.

Third: Bacula indeed can (and should) be configured to create a spool file which is then written to the tape. Unfortunately, it is unable to spool to a spool file and write out another to tape at the same time, effectively reducing the backup speed by ~50%.

I am avoiding compressed archives for the very reason you mention -- damaged archives tend to mean that the rest of the archive is unusable. I guess I will lose slightly more in the event of part of a tape becoming unreadable, but it will be fast enough that I could probably do the whole backup twice to make up for this!
–
David GardnerOct 20 '11 at 9:39

I hadn't heard of shoe-shining, but it makes sense given how polished the rollers in tape drives get as they wear... :)
–
David GardnerOct 20 '11 at 9:41

piping through dd should reduce the impact of this.
–
symcbeanOct 20 '11 at 13:18

@symcbean Could you be more specific : does this need a particular blocksize or anything like that for dd to add buffering and get a reduction in the impact of this problem?
–
David GardnerNov 3 '11 at 14:29

Yes - the block size defines the size of a buffer which should be full before being rewritten - so while you may still get overrruns, they will be less frequent with a big block size.
–
symcbeanNov 3 '11 at 16:58