Largefile Support Problems

The Unix98 standard requires largefile support, and many of the latest
operating systems provide it. However, some systems still chose not to
make it the default, resulting in two models: Some parts of the system
use the traditional 32bit off_t, while others are compiled with a
largefile 64bit off_t. Mixing libraries and plugins is not a good
idea.

64on32

While systems like FreeBSD and Darwin simply use a 64bit off_t as the
default, there are also systems like Linux and Solaris that do
not. Instead, they implement what's called the "transitional API" in
the largefile specifications; many calls are given a 64 cousin, so
that there are both "open" and "open64" in the C library, as well as
"lseek" and "lseek64".

Using a define like -D_LARGEFILE_SOURCE will bring about some magic so
traditional calls are remapped to the transitional API, so your source
code might read "concat(...)" and "lseek(...)", but it will really be
linked to the symbols "open64" and "lseek64", and will make off_t a
64bit entity.

Headers

As a result, however, it is highly dangerous to use off_t in header
files in largefile-sensitive systems. Most software writers do not
expect that an integral type like off_t can change its size; it is
just used in the exported interface, as in making a new call "off_t
my_lseek(int,off_t,int)".

In reality, the situation is similar to the old DOS modes, with a
"small" mode and a "large" mode for compiling source code. The
library code might be compiled with a 64bit off_t, while the
application code using the library is compiled with 32bit off_t,
possibly ending with a callframe mismatch.

A similar problem arises when using off_t in exported structures, as
these can have different sizes and offsets for the member variables
therein. A library maker should take measures to defend against
improper off_t sizes, possibly making dualmode func/func64, as the C
library does. Unfortunately, many software writers have not been
aware of the problem.

The seek problem

Another problem is described in the section of the largefile documents
that deals with holes in the protection system. It stems from the
fact that some file descriptors might be opened in largefile mode
while others are not, and they can even be transferred from a
non-largefile application into largefile libraries, and vice versa.

The 64on32 transitional API is trying to support this scheme, mostly
by introducing a new error code EOVERFLOW that will be returned when a
"small"file application accesses a file that has grown beyond the two
gigabyte limit due to calls from other software parts compiled as
"large"file.

However, most "small"file software does not expect this error code,
and many software writers do not check the return value of lseek. This
can easily lead to data corruption when the file pointer is not
actually moved.

Mixing it up

Most of the software problems arise on the side of "small"file
applications. Generally, one should compile all software as largefile
as soon as the system provides these interfaces. This is pretty easy;
AC_SYS_LARGEFILE in autoconfed software can do it, or just some
_LARGEFILE_SOURCE to be defined somewhere.

A lot of software, however, is not aware of a need to enable largefile
mode somewhere. Hundreds of Open Source applications are compiled
with 32bit off_t by default. It's simply been forgotten, and it would
take a lot of work and publicity to make everyone aware, with the only
result that the next new developer would miss it again.

Because of this, we should use technical support tools to track the
problem area of mixing compiled code from sides which support
largefile and those which do not yet do so.

Checking mismatches

A Perl script to do this can be fetched from http://ac-archive.sf.net/largefile/.
It tries to classify binaries and libraries according to whether they
are using "-32-" or "-64-" modes by looking for fopen() vs. fopen64()
in the list of dynamic symbols. Each argument binary is checked, along
with the dynamic dependencies it has. If there are mismatches, a list
of them is printed.

Furthermore, the script can detect when a library is trying to exhibit
itself as dualmode, exporting both func() and func64() calls
(libgstreamer is an example of a library which does this). For these,
it is okay that software may be in either -32- or -64- mode when
linking to them, so actually, only three combinations are rejected:
-64- which depends on -32-, -32- which depends on -64-, and 3264
dualmode libraries which depend on simple -32- libraries.

The distro problem

When the script is run on /usr/lib/*.so (or just /usr/bin) on a
contemporary Linux system, it detects a lot of largefile mismatches.
The common user will not experience any problems with that, so long as
no file being handled is larger than two gigabytes. (Note that Unix98
mandates that base utilities like "cat" and "cp" be compiled with
largefile support.)

Open Source OS distributions, however, carry a lot of code from many
different sources. In particular, there are several graphical
frontends of the filemanager type which are not compiled in largefile
mode. Sooner or later, the problem will come up. It would be best if
no rpm/deb/whatever binary package has a largefile mismatch in the
first place.

This can be done if packagers and distro makers check binary packages
while making them. It would be easy to integrate the checking routine
into the set of post-%files tools (as they are called in RPM), which
need to check the libraries and binaries anyway for dependent
libraries (and do a "chrpath" on them, since they have been relinked
in the DESTDIR staging area).

The future

The future should see all packages compiled in largefile mode,
eliminating any problems with mixing libraries from different sides. A
distro maker can ensure that, and if it means a few patches, that's
good, since it makes the software more portable to FreeBSD/Darwin.

At some point, one should really think about dropping the 32bit off_t
default altogether, as was done with FreeBSD. Linux 2.4 and glibc 2.2
should be ready for this the step, leaving the days of "small"files
behind.

> % Take a look at ugly self-promo with say
> % /lib/ld-lsb.so. Why change things that
> % aren't broken? :(
> Choosing a different name
> means that if glibc changes in an incompatible
> way, distros can distribute a separate dynamic
> linker for LSB apps, that will load up LSB compliant
> shared libraries from another location. Would you
> prefer if the LSB stood in the way of Linux improving?

Hmm... somehow it seemed to me not to be "compat" option but rather "mainstream". Then it makes sense, thanks for explanation of what I've not finished reading in anger.

Improving: well but _not_ collecting crap. Still 3rd party software has its price to pay, this way too.

Well, you need to choose some name for the dynamic linker. Since I started using linux, it has changed from /lib/ld.so (a.out) to /lib/ld-linux.so.1 (libc5 elf) to /lib/ld-linux.so.2 (glibc). Choosing a different name means that if glibc changes in an incompatible way, distros can distribute a separate dynamic linker for LSB apps, that will load up LSB compliant shared libraries from another location. Would you prefer if the LSB stood in the way of Linux improving?