9.4 Blocking

Block and record terminology is rather confused, and it
is also confusing to the expert reader. On the other hand, readers
who are new to the field have a fresh mind, and they may safely skip
the next two paragraphs, as the remainder of this manual uses those
two terms in a quite consistent way.

John Gilmore, the writer of the public domain tar from which
GNUtar was originally derived, wrote (June 1995):

The nomenclature of tape drives comes from IBM, where I believe
they were invented for the IBM 650 or so. On IBM mainframes, what
is recorded on tape are tape blocks. The logical organization of
data is into records. There are various ways of putting records into
blocks, including F (fixed sized records), V (variable
sized records), FB (fixed blocked: fixed size records, n
to a block), VB (variable size records, n to a block),
VSB (variable spanned blocked: variable sized records that can
occupy more than one block), etc. The JCL`DD RECFORM='
parameter specified this to the operating system.

The Unix man page on tar was totally confused about this.
When I wrote PD TAR, I used the historically correct terminology
(tar writes data records, which are grouped into blocks).
It appears that the bogus terminology made it into POSIX (no surprise
here), and now François has migrated that terminology back
into the source code too.

The term physical block means the basic transfer chunk from or
to a device, after which reading or writing may stop without anything
being lost. In this manual, the term block usually refers to
a disk physical block, assuming that each disk block is 512
bytes in length. It is true that some disk devices have different
physical blocks, but tar ignore these differences in its own
format, which is meant to be portable, so a tar block is always
512 bytes in length, and block always mean a tar block.
The term logical block often represents the basic chunk of
allocation of many disk blocks as a single entity, which the operating
system treats somewhat atomically; this concept is only barely used
in GNUtar.

The term physical record is another way to speak of a physical
block, those two terms are somewhat interchangeable. In this manual,
the term record usually refers to a tape physical block,
assuming that the tar archive is kept on magnetic tape.
It is true that archives may be put on disk or used with pipes,
but nevertheless, tar tries to read and write the archive one
record at a time, whatever the medium in use. One record is made
up of an integral number of blocks, and this operation of putting many
disk blocks into a single tape block is called reblocking, or
more simply, blocking. The term logical record refers to
the logical organization of many characters into something meaningful
to the application. The term unit record describes a small set
of characters which are transmitted whole to or by the application,
and often refers to a line of text. Those two last terms are unrelated
to what we call a record in GNUtar.

When writing to tapes, tar writes the contents of the archive
in chunks known as records. To change the default blocking
factor, use the `--blocking-factor=512-size' (`-b
512-size') option. Each record will then be composed of
512-size blocks. (Each tar block is 512 bytes.
See section Basic Tar Format.) Each file written to the archive uses at least one
full record. As a result, using a larger record size can result in
more wasted space for small files. On the other hand, a larger record
size can often be read and written much more efficiently.

Further complicating the problem is that some tape drives ignore the
blocking entirely. For these, a larger record size can still improve
performance (because the software layers above the tape drive still
honor the blocking), but not as dramatically as on tape drives that
honor blocking.

When reading an archive, tar can usually figure out the
record size on itself. When this is the case, and a non-standard
record size was used when the archive was created, tar will
print a message about a non-standard blocking factor, and then operate
normally(24). On some tape
devices, however, tar cannot figure out the record size
itself. On most of those, you can specify a blocking factor (with
`--blocking-factor') larger than the actual blocking factor,
and then use the `--read-full-records' (`-B') option.
(If you specify a blocking factor with `--blocking-factor' and
don't use the `--read-full-records' option, then tar
will not attempt to figure out the recording size itself.) On some
devices, you must always specify the record size exactly with
`--blocking-factor' when reading, because tar cannot
figure it out. In any case, use `--list' (`-t') before
doing any extractions to see whether tar is reading the archive
correctly.

tar blocks are all fixed size (512 bytes), and its scheme for
putting them into records is to put a whole number of them (one or
more) into each record. tar records are all the same size;
at the end of the file there's a block containing all zeros, which
is how you tell that the remainder of the last record(s) are garbage.

In a standard tar file (no options), the block size is 512
and the record size is 10240, for a blocking factor of 20. What the
`--blocking-factor' option does is sets the blocking factor,
changing the record size while leaving the block size at 512 bytes.
20 was fine for ancient 800 or 1600 bpi reel-to-reel tape drives;
most tape drives these days prefer much bigger records in order to
stream and not waste tape. When writing tapes for myself, some tend
to use a factor of the order of 2048, say, giving a record size of
around one megabyte.

If you use a blocking factor larger than 20, older tar
programs might not be able to read the archive, so we recommend this
as a limit to use in practice. GNUtar, however,
will support arbitrarily large record sizes, limited only by the
amount of virtual memory or the physical characteristics of the tape
device.