Automating DESTDIR for Packaging

2009-02-19 (updated 2011-07-29)

It is unnecessarily hard to create native
packages (like deb and RPM), and unnecessarily hard to
directly install source code packages under the control of
programs like GNU stow, because many software source packages fail
to support the
DESTDIR convention. The
DESTDIR convention makes it easy to compile a program so that it
will run in some directory X, but be installed in directory $DESTDIR/X instead.
There are a vast number of source packages that do not support DESTDIR, and
it’s often difficult to add DESTDIR
support to complex makefiles.
This paper discusses what could be done to
automatically support DESTDIR, instead of requiring every source
package in the universe be changed to support DESTDIR.
This paper shows
that there are practical ways to automate support for DESTDIR, and
points to tools like
auto-DESTDIR
and
user-union that implement some of
these solutions.

Introduction

Today’s users of Linux and Unix systems don’t want to follow
complicated instructions to install programs
— they want to click on one button, and have
everything installed as necessary. Ideally, they should be able
to install programs using their native package format and download
tools, such as deb
(used by Debian and Ubuntu) and RPM
(used by Fedora, Red Hat, and SuSE/Novell). But that means that
someone has to create those
packages. Alternatively, they should be able to use a program
that can automatically download, compile, and install from source code,
perhaps with some program like
GNU stow
that can place each package in its own separate directory while appearing
to all be in one place.
Whether you are
creating native packages, or automatically installing source packages,
it’s often vital to be able to compile
the program so that it will run in some directory X, but install the program in some other
directory $DESTDIR/Y. There is a standard way to do this: Have the
source package support the
DESTDIR convention.

Unfortunately, many
software source packages fail to support the
DESTDIR convention, and it’s sometimes a real pain to add DESTDIR
support.
The build programs (e.g., Makefiles) can be large, complex, and
multilayered...
so it can be painful for a packager to modify the build scripts to
add DESTDIR support.
A packager can send DESTDIR patches upstream, but they
may be ignored or improperly maintained... which means that the packager
may need to keep re-modifying the build system, every time the program
is updated.
Ugh.
All too often, packaging can be completely automated except for
the lack of DESTDIR support.

Why is DESTDIR important?
There are many reasons, but let's look at two examples:

DESTDIR helps create native packages.
The tools for creating native packages in deb and rpm formats,
(the two most popular Linux distribution foramts)
require that “installed” files be specially-placed
in a subdirectory
by a “make install” during package creation.
This is something DESTDIR enables.
For example, Debian’s documentation explains that
during the packaging process, you must
“install the program into a temporary subdirectory from which
the maintainer tools will build a working... package.
Everything that is contained in this directory will be installed on a
user’s system when they install your package...”.
Fedora’s
Creating Package HowTo has similar requirements for Fedora.
The packaging software then copies files from that intermediate
location
into an archive for later installation in the “right” place.
It’s easy to specially place files-to-be-installed
if the program already supports the “DESTDIR” variable, because DESTDIR
tells the installer the intermediate location to install
software. Otherwise, it can be difficult to do.

DESTDIR helps install local packages from source.
Many people use programs like GNU stow (or similar conventions)
to help manage locally-installed packages from source code.
For example, GNU stow is designed to let you store a program
into some directory like /usr/local/stow/MYPROGRAM and
have binaries in /usr/local/stow/MYPROGRAM/bin/myprogram, yet have the
program be invoked as /usr/local/bin/myprogram. That way,
plug-ins and extensions will automatically work correctly. GNU
stow’s documentation specifically notes that you need to do this.
GNU stow's documentation suggests
using make prefix=Y install as a work-around, but as they note, many
programs (including emacs!) automatically force
a recompilation when the prefix is changed, making this moot.
It can also cause subtle problems when installing; it makes more sense to
have a separate prefix and DESTDIR value, so that each can be used
where appropriate.

It’d be much better if the equivalent of DESTDIR could be automated,
without requiring application programs to
add DESTDIR support to their installers. After all,
almost any “real” Unix/Linux program with source code available
supports “make install” (or equivalent)
for installation. A “make install” process presumes
that it is writing the files to the “real” filesystem. Ideally,
there’d be a way to reroute writes to the “real” filesystem to some
other directory tree, so that they could be packaged or used with
programs like GNU stow.
This shouldn’t be that hard - “make install” may invoke many programs
and
do a lot of recursive directory descending to
figure out what to install, but the commands that actually do
the installing are usually simple ones like “install” and “cp”.

Ideally this re-routing process would work without
requiring the program running “make install” to run as root.
That way, a non-root user could do
“make-redir DESTDIR=newdir install”
(or equivalent) and have all the “installed” files
show up inside newdir.
Ideally, it would be efficient, and could track other information (such
as permissions and owners). Also, these should be able to work
without re-routing programs that don’t descend from “make install”;
often final packaging is done on a shared machine that is packaging
multiple programs simultaneously. A lot of tools don’t quite do
this;
they primarily just ‘track’ what’s
changed, require special privileges, and so on.
Some tools that you would think do this can’t do anything
remotely
like it; for example, “fakeroot” (widely used by Debian)
can record owners, but it can’t redirect writes to files (because it
doesn’t wrap the system call “open”).

This turns out to be harder than I thought, and in particular,
some of the “obvious” ways to do this turn out to be
more complicated that you’d like.
So here are
some various technical approaches, and a list of some related tools
that
implement that approach (and might possible to use as a baseline to
implement automating DESTDIR).

After looking at the alternatives, I’ve decided
that the
“wrappers” approach is
especially promising.
See the Auto-DESTDIR
software, which implements this wrappers approach.
The wrappers approach at first seems like an odd solution,
but the advantages of the wrappers approach
are only compelling when you start thinking about
the problems of the alternatives (as described below).

Not covered: General issues in program-specific directories or
simplified source/package installation

First, let me clarify that this paper
is not about the general idea of (1) creating
separate directories for each different program or program
installation,
nor is it about (2) simplifying source/package installation in its
entirety.
Instead, I am focusing on a specific step, copying files into one place
that will be run from another, that turns out to be important in both
of these general issues (and probably others as well).
The following subsections point to other programs/papers about
those issues in general,
and explain how automatically supporting DESTDIR can simplify these
general issues.

Program-specific directories

Creating separate directories for each different program or program
installation is a widely-implemented idea.
For example, using the tool GNU stow,
all files that implement perl might be stored in “/usr/local/stow/perl”
while all files that implement emacs might be stored in
“/usr/local/stow/emacs”,
and the executable of emacs might be “/usr/local/stow/emacs/bin/emacs”.
Many of these tools (including GNU stow)
run your installation script (or have you run them) with a special
setting of “prefix” (so that each program is
installed in a special program-specific location).
Then, they set up symbolic links to point to the “real” files
(e.g., so you don’t have to have a massive constantly-changing PATH).

If this is all you’re doing,
and you have all necessary rights to install to the stowed directories,
you might think you don’t need DESTDIR at all... just set up the prefix
and store in these special directories.
But it turns out that you still often want to install files to one
place,
yet have them run in another, which means you want to automate DESTDIR:

As noted in the GNU
stow manual section 6.1 (“Compile-time and install-time”),
“Software whose installation is managed with Stow needs to be installed
in one place (the package directory, e.g. ‘/usr/local/stow/perl’) but
needs to appear to run in another place (the target tree, e.g.,
‘/usr/local’). Why is this important? What’s wrong with Perl, for
instance, looking for its files in ‘/usr/local/stow/perl’ instead of in
‘/usr/local’?
The answer is that there may be another package, e.g.,
‘/usr/local/stow/perl-extras’, stowed under ‘/usr/local’. If Perl is
configured to find its files in ‘/usr/local/stow/perl’, it will never
find the extra files in the ‘perl-extras’ package, even though they’re
intended to be found by Perl. On the other hand, if Perl looks for its
files in ‘/usr/local’, then it will find the intermingled Perl and
‘perl-extras’ files.
This means that when you compile a package, you must tell it the
location of the run-time, or target tree; but when you install it, you
must place it in the stow tree.”

If you are trying to set up files so that they will
eventually run in a “stowed” location, but you cannot currently
write to that stowed location, then you may want to use DESTDIR so that
you can “install” files to an intermediate location which is not the
final location for execution.

If the program you’re dealing with doesn’t properly support
“--prefix” or “make prefix=value install”, you need something
that can automatically redirect files to another location
(so that these tools can manage them).

Simplified installation from source code

This paper does not cover the entire problem of automatically
installing packages directly from source code,
though it does potentially cover a piece of the problem.
The idea of making it easier to install from source tarballs is
nothing new; this has been raised by
Francesco
Montorsi
and
myself.
There are several existing tools that try to automatically install
programs
from source tarballs, though most of them do not do a good job of
automatically determining what is to be done, and few understand
dependencies
or integrate well with an existing package management system.
Here are some related papers/projects:

The tool Spkgtool
can act as a GUI front-end to various “symbolic link package systems”
(currently supporting stow, graft, and encap/epkg), and it
can automatically build and install source tarballs
if they comply with the basic GNU standards
(e.g., ./configure, make, and make install, with support for the
make variable “prefix”).

Dan’s autospec
automatically creates RPM .spec files from tarballs.
"It uses the information it can determine (from a Makefile, manual pages, an LSM file, etc.) to fill in the proper spec file fields. This allows a human packager to use the generated spec file as an almost complete template to quickly create an RPM package from a typical source or binary archive."

toast (GPLv3+) is a
“simple source-and-symlinks package manager for root and non-root
users”.
It is a “simple, self-contained tool for downloading, building,
installing, uninstalling and managing software packages. Unlike
traditional package-management systems, toast is primarily intended to
work directly with software distributed as source code, rather than in
some precompiled or specialized binary format, such as RPM. Binary
packages are also supported.”
It includes some of the capabilities of GNU stow, etc., but it also
includes heuristics so that it can compile straight from
source code.
(Which means that toast does not fit my categories well —
it includes stow-like capabilities and source installation capabilities.
It also has lots of heuristics to try to automatically implement DESTDIR
when the underlying system fails to do so.)
The toast man page has links to other interesting programs.

GNU Source
Installer
is a “source package manager for Unix-likes.
It provides configuration, compilation, installation, upgrade, tracking
and removal of packages built from source code following the GNU coding
standards.”

Bulldozer
works
with the Nautilus file manager of GNOME and supports
make, Ant, NAnt, and several other formats, letting you automatically
invoke build targets.

Urpkg (GPL) tries to
install software
in a safe way, especially from source code; it does this by creating
a new user for each program that it installs, as well as using some
sticky bit trickery, so that programs are protected from each other.

Luau: The Lib
Update/AutoUpdate Suite
enables people to download and install programs on their local systems,
but it requires that software developers encode information for it
(in an XML file).

Autopackage
“makes software installation on Linux easy. Software distributed using
Autopackage can be installed on multiple Linux distributions and
integrate well into the desktop environment.”
“An autopackage (a .package file) contains all the files needed for
the package in a distribution neutral format with special control
files inside, wrapped in a tarball with a stub script appended to the
beginning. In order to install a .package file, you run it, and the
scripts then check your system for the autopackage tools and offers to
download them if they’re not present.”
It’s essentially a special package format, designed for
interoperability.
The format is an API-based approach, which is different than many
others.

Paco (discussed below)
tries to install from source code automatically, using LD_PRELOAD
(see below).
But, like many other programs (like checkinstall), it simply watches
what
a program tries to do when it installs... it doesn’t intercept what is
done, to make it right.

van.pydeb
makes "egg metadata information available for Debian packaging".
It is a collection of "Tools for introspecting Python package metadata and translating the resulting information into Debian metadata", including version numbers, package names, and dependencies.

pkg-config (GPLv2+)
is a "helper tool used when compiling applications and libraries. It helps you insert the correct compiler options on the command line so an application can use gcc -o test test.c `pkg-config --libs --cflags glib-2.0` for instance, rather than hard-coding values on where to find glib (or other libraries)."
It's run out of the freedesktop.org site.
The information it uses is stored in ".pc" files.
See the
Pkg-config Wikipedia page,
the pkg-config man page,
and this
pkg-config guide.

CPANPLUS::Dist::RPM is "a distribution class to create RPM packages from CPAN modules, and all its dependencies. This allows you to have the most recent copies of CPAN modules installed, using your package manager of choice, but without having to wait for central repositories to be updated."

And of course, this paper is not about package management in general,
e.g.,
programs that support .rpm and .deb formats.
However, to create .rpm and .deb files, it is important to support
DESTDIR.
This paper is about how to easily support DESTDIR,
without twiddling makefiles.

Why not just support DESTDIR or make prefix=X install?

There’s no need for a special tool to support DESTDIR
for programs that already support DESTDIR.
In some programs that don’t support DESTDIR,
you can have the effect by setting
the “prefix” variable when running make install, that is,
make prefix=MY_DESTDIR_VALUE install.
It would be far better if source code releases followed the
normal
good practices for releasing FLOSS software source packages,
including support for DESTDIR.

But this does not always work, for a variety of reasons.
Many makefiles do not support DESTDIR at all.
Many makefiles also don’t support “prefix”, or if they do,
they forceably re-build the program when the prefix value is changed
for
make install (making the workaround useless).
There are so many programs that do not follow
normal
good practices
that we must to deal with the world as it is,
not as we wish it would be.
We could modify tiny makefiles, but large multi-directory makefiles
can be hideously hard to modify correctly, and then there is the
problem of getting those changes accepted upstream.
Since so many programs don’t support DESTDIR, it’d be nice to
be able to automatically support DESTDIR without having to
constantly muck around in complicated makefiles or other
build/installation systems for program after program.
Then, instead of having programmers around the world constantly
changing their makefiles, it will just work.

The Linux kernel gets all read and write requests, so re-routing at
the
kernel level would be great - in theory, the re-routing
would be perfect, and should have good performance.
The big problem is that it requires basic changes in low-level
infrastructure,
where any mistakes could create a massive security hole... making it
understandably difficult to get people to accept changes at
this level.

Union mounts can merge multiple directories (e.g., one is “read
only” and
the other written to).
Generally, these require root privileges, though that’s not a killer -
a setuid program could use them, for example.

There are several kernel modules that implement union mounts, but
they’re not widely avaiable on Linux distributions (as of early 2009).
The best-known union mount implementation is
UnionFS,
and another implementation is aufs; both implement union mounts as
a new filesystem.
Union mounts
implement union mounts inside Linux, but at the VFS layer instead of as
a new
filesystem; at this moment this is very immature and not ready for
normal use.
Many Linux distributions do NOT have unionfs, aufs, or
“union mounts” since they are not in the default Linux kernel.

A FUSE-based implementations of a union file system can be used today,
and
doesn’t require changes to the Linux kernel (as Unionfs, etc. require).
FUSE is already part of the usual Linux kernel, and it allows file
requests
to be redirected out to user programs.
In particular,
funionfs
implements a union filesystem using FUSE, and
is included in Fedora, Debian, and Ubuntu.
PlasticFS version 1.12
uses FUSE as well.
By design a FUSE-based approach requires more work than unionfs (due to
extra
context switches), but for only a “make install” this isn’t so bad.
One implementer of a
unionfs-on-FUSE
reports that the I/O processing completely buries this
overhead anyway. Unfortunately, funionfs (at least) is also
global (instead of per-process) - again, a problem for a shared
packaging systems if used the “obvious” way.
I should note that instead of reusing existing union file systems
to redirect DESTDIR,
FUSE
could be used to directly implement this approach.

By themselves, these kinds of union mounts described above
are always global to the
whole system, so if you directly did a union mount of directories
like “/usr” you would have trouble using a shared packaging system.
Such a global approach to redirecting could easily cause
problems administering the system.
And there are a lot of security problems, too, if this is just a global
situation.
So this should really be done for a set of processes rather than the
whole system, as discussed next.

Process Group Unique Root

A union mount can be made unique to a process group through a variety
of mechanisms.
The “obvious” way is to recreate a new filesystem tree in a
subdirectory,
using mount --bind and union mounts (as above, say using
funionfs and FUSE) to create a new
filesystem that looks like the old one but is not visible to all.
You can then
use chroot (or pivot_root) to set the process group to the new
filesystem.
A variant of this approach would be to use
mount
namespaces, which again create filesystems that are
specific to a process group (instead of being global to all processes).
Again, the point would be to redirect writes to /usr, /bin, /lib, /etc.
All of this could be implemented with a small suid program.

Ideally, it’d be rigged so that the process group isn’t
root, but it can still write to the new local /usr (etc.). Bonus
points if it pretends to be
root and records the parameters (a la fakeroot) - which could fool even
complex “make install” routines.

For security, the key problem is that the process
running “make install” should never be privileged.
In particular, the process should not have root privileges, nor should
it be
allowed to raise its privileges by running set-uid programs that
actually
setuid.
Otherwise, it could use its root privileges to get out of the jail, or
run an suid program that wouldn’t realize that the filesystem is rigged
(and then get exploited).
Traditionally, “make install” is given total privileges, but we want to
not do that.
If “make install” is started with normal user privileges, that at least
gets us started, but we need to make sure that privileges can’t be
added later
via setuid programs.
We could do this by making sure all mounts disable setuid/setgid;
mount already has this ability.
Alternatively, we could forbid running executables with that setting
(I believe SELinux and “cuppabilities” can do this).

This approach - having a FUSE-based union mount approach
that is local to a process group (e.g., chroot) -
is the most robust technically, since it can redirect
any non-setuid command used in the “make install”. It also has
low
overhead.
But the effort of making sure it’s secure may make it difficult
for distributors to accept it.

Many programs, like installwatch, use LD_PRELOAD to intercept
library functions.
There are various positives: LD_PRELOAD already exists, and it works
per-process (so it doesn’t interfere with other programs).
Unfortunately, LD_PRELOAD has many technical downsides.

LD_PRELOAD based approaches can’t redirect statically linked
executables.
Unfortunately, the programs most used by most install scripts
are also the ones most likely to be statically linked
(to increase reliability and enable recovery from serious
library management problems).
I know that
SuSE’s “ln” is statically linked, and that
FreeBSD and OpenBSD’s key routines used in installation
are statically linked, and this is true for many other systems as well.

You might think that once you override open(), all calls to open() would be
overridden, but this isn't true by default if the caller
is inside the C library itself.
It turns out that the standard GNU C library uses names prefixed by "__"
whenever it calls internal functions.
For example, the C library implements fopen(), but the fopen() implementation
internally calls __open(), not open().
In addition - and this is the kicker - by default the GNU C library
will not let you override these __functions using LD_PRELOAD.
So if you override just open(), an application that calls open() directly
will be overridden... but an application that uses fopen() will skip
right past it.
You can recompile the GNU C library so that the redirection will occur by
using the poorly-documented "--disable-hidden-plt" option.
But in practice, this means that you have to recompile the C library.
This is generally not well-received by distributions;
Debian specifically rejected doing this, and I suspect
others will do the same.
Few will want to change the default, because doing things this way speeds
up normal use.
An alternative is wrapping all the C library calls,
but that's more work.

I found no program my needs for automating DESTDIR while using LD_PRELOAD,
so I've started writing such a program:
user-union.
User-union creates union mounts, without requiring special privileges,
using LD_PRELOAD, and it can integrate with
auto-destdir.

Here are some existing related programs that already
use LD_PRELOAD (though most just watch what files are changed,
and do not let us change where the files go):

toast uses LD_PRELOAD
as one of its tricks for changing where things install;
unlike many of the other items noted below, it actually changes
where files are placed instead of just watching them.

gnashley
(src2pkg developer) believes
that checkinstall is no longer properly maintained, so instead,
has developed a
‘trackinstall’ program (a drop-in replacement for ‘checkinstall’)
as part of src2pkg (this is built on “libsentry”).
These let you run “make install” and track what changed, as part of
larger tools to auto-create packages from source code.
But
src2pkg’s approaches don’t seem quite right; it supports
(1) “real root” which doesn’t redirect, overwrites, and requires root,
(2) DESTDIR method, which requires that DESTDIR work (it often
doesn’t, and that’s the problem we’re trying to solve), and
(3) JAIL, which redirects writes but
doesn’t seem to correctly redirect reads to the right place (ugh) - so
it doesn’t work well on many scripts.

PlasticFS up to version 1.11 used LD_PRELOAD to create a filesystem. It tried to not redirect many calls, and instead asked users to first recompile the GNU C compiler with "--disable-hidden-plt". That was completely impractical; rather than covering more functions (such as fopen), in version 1.12 it switched to FUSE.

FL-COW ("Copy on write") copies files if they're being opened for writing and they are hard linked to somewhere else, using LD_PRELOAD. Not exactly what I was looking for, but some similar ideas. GPLv2. It only covers a few functions (open, openat, fopen, freopen, and their *64 versions).

paco (Package Organizer)
(GPL)
is a “source code package organizer for Unix/Linux systems...
When installing a package from sources, paco wraps the ‘make install’
command (or whatever is needed to install the files into the system),
and generates a log containing the list of all installed files.
Technically, this is done by preloading a shared library before
installation using the environment variable LD_PRELOAD. During
installation this library catches the system calls that cause
filesystem alterations, logging the created files...
Gpaco is the graphic interface of paco.”
The Paco home page specifically notes that
“Paco does not work on systems in which the executables involved in the
installation of the packages (mv, cp, install...) are statically linked
against libc, like FreeBSD and OpenBSD.”
Paco, like many other tools, can only log what a program tries to do...
it cannot redirect files elsewhere.
But in a number of cases, we want to control where the files go, not
just
watch them go to the wrong place.

Brent Baccala's Preload libraries can redirect file accesses using LD_PRELOAD. Unfortunately they don't seem to be OSS (no license found).
It wants a glibc patch to be installed; it appears to me this is the same effect as the using "--disable-hidden-plt" option and recompiling glibc.

EPOR (GPL)
is an “extensible package organiser for Unix like systems.
It’s written to trace filesystem changes (something being installed)
and save those information in a simple textual db (but this as any
other provided feature is customisable by embedded guile interpreter
see chapter Customise epor).
So, when a package is installed using epor to trace it, an entry is
created in a local db. This entry contains informations supplied by
command line (package name, version, ...) and traced by filesystem
changes (new directories, files ...). This is achieved using the
“LD_PRELOAD method”.
Using informations stored, epor let you remove files installed by a
particular package or view them in different ways.”

Fakeroot tracks
permissions, but doesn’t redirect at all.
Fakeroot is heavily used in Debian, and has a server daemon to help it,
but it intentionally doesn’t wrap open() and create.
That's because
previous experience with the libtricks package suggested that trying
to wrap these would lead to endless problems.

libtricks
can redirect open-for-read (a mega-VPATH), but it’s not clear
that it can redirect writes... and it doesn’t seem to be maintained
anyway.
The creator of the fakeroot and libtricks packages found that trying to wrap
open() and create
“creates other problems, as demonstrated by the libtricks package.
This package wrapped many more functions, and tried to do a
lot more than fakeroot.
It turned out that a minor upgrade of libc (from one where the BR
stat() function didn’t use open() to one [that sometimes did])
would cause unexplainable segfaults...
once fixed, it was just a matter of time before another function
started to use open()...
Thus I decided to keep the number of functions
wrapped by fakeroot as small as possible, to limit the likelihood
[of] collisions...
I choose not to wrap open(), as open() is used by many
other functions in libc (also those that are already wrapped), thus
creating loops (or possible future loops, when the implementation of
various libc functions slightly change).”
An
April 3 1999 posting by Joel Klecker
noted that, in order to make this work,
"libtricks grovels deep within glibc internals (to the point that it
has its own copies of internal glibc headers), I am not entirely sure
if it can ever work with glibc 2.1, since an external lib and programs
cannot access internal libc symbols.
fakeroot grovels at a much higher level and still works."
In short, previous experience trying to wrap open() with LD_PRELOAD
suggests that this is a bad idea.

Ptrace() is a kernel-level call for “process tracing” (e.g., to
watch/change system calls made by a process). It’s intended for
debugging, but since it can watch another process, it can be used for
this purpose. This has serious advantages; these can handle
statically linked executables (while LD_PRELOAD-based approaches
can’t), so SuSE’s “ln” is a non-issue. Since they track at the system
call level, this approach is immune to the
races and other problems of fakeroot and libtricks.
Unfortunately, while ptrace() is great for watching what a program
does,
using it to change what a program does is far more
complicated.
Perhaps the premiere example of a related program that uses
the ptrace() approach is TrackFS:

TrackFS.
“trackfs runs the child program(s) with tracing enabled
and tracks the system calls they make.”

Such a program could be implemented using ptrace and semantics like
this:

On open for read (not write), look at DESTDIR first - if it’s
there, use it. Otherwise, try to open un-redirected. This saves
disk space, as well as saving time by not copying files. In
short, if a file is only used for reading/executing, and never written
to, then just use it as-is.

On open for read AND write, look at DESTDIR first, and use it if
there. Otherwise, copy any existing file to DESTDIR, then use it.

On open for write (not read), create any prefixed directories
that exist on the original side. Then, open for write under DESTDIR.
(It should fail if the original would have failed.)

“chmod” is essentially a write operation; redirect as above, but
copy the file if it doesn’t exist.

“unlink”: If in DESTDIR, remove it. Bonus points: remember what
you “removed”, so that later queries about it will claim it’s not
there. Note that unionfs, funionfs, and so on have to handle this
too (e.g., using “whitelists”). So you can “rm /bin/sh”, and it’s
not there... and there’s no harm to the “real” filesystem.

I may have missed a corner case, but it should work in principle.

I’ve had a very interesting email exchange with TrackFS’s creator,
Michael Riepe, about this. He pointed out that this requires that
the controlling process actually change
the name of the file being opened, which requires using memory space
somewhere. I suggested that it’d be possible to patch the stack
with the new filename, use it, then halt and restore things back... so
that you don’t have to try to do memory allocations and such.
Michael isn’t sure that the stack grows correctly at all times (what
about stack overflow?), but it’s plausible. Another issue he
noted is that (absolute) symbolic links might not work correctly.
I’m not sure about that, but that would cause complications to the
rules above.

Originally, I thought the ptrace approach would be best, but the
rules kept getting more complex, and the stack twiddling was more than
I was hoping for.
In short, implementing DESTDIR this way is quite complicated,
involving a lot of architecture-dependent tweaking.
Is there a better, simpler way?
For this particular problem, I think there is.

In most software, the “make install” command
only uses a few simple commands to actually install the software.
In my experience, the most common command by far is “install”, which
is hardly surprising.
Other common commands used in “make install” that might need
redirecting from privileged directories
(like /bin, /usr, and /etc) include
cp, mkdir, ln, mv, touch, chmod, chown, ls, rm, and rmdir.
It might also be useful to redirect “test”,
though this is also a bash built-in (making its
replacement more complicated) and I haven’t found any Makefiles where
redirection of “test” is needed for “make install”.
Programs that use
libtool
usually support DESTDIR directly, but even if they didn’t,
the point is the same:
“make install” tends to use only a very few programs.

So given that “make install” tends to only a few commands,
one “obvious” approach would be to modify just these basic
commands so that they will redirect their writes
(e.g., if an environmental variable is set).
Then the packager can just set and environmental variable and run
“make install”
This seems completely appropriate for “install”; the
whole purpose is to perform installs, so adding functionality so it
can do installs in a common case (creating packages) seems appropriate.

It’d be best if setting this up would be easy.
I would suggest REDIR_DESTDIR as the environmental variable name.
If it’s set, then writers are redirected to that as the root of the
filesystem.
For the “install” command,
I think any use should be redirected (since by definition, all
invocations
are installs).
With one exception: If install is invoked to install to inside
REDIR_DESTDIR,
then don’t re-prefix it;
this avoids some awkward loops, and makes it easy for packagers to
“automate” installation by always setting REDIR_DESTDIR.
Other commands are only sometimes use for installation, but I
suspect a simple detection method would be sufficient.
For example, perhaps all
attempts to write to a directory which only a
privileged user/group (e.g., root) can write to would be redirected so
that
“/” becomes REDIR_DESTDIR.
This way, “install xyz /bin” immediately becomes
“install xyz ${REDIR_DESTDIR}/bin”, and similarly for “cp”,
but
“cp xyz .” doesn’t get redirected if the directory is local
and
writable by an ordinary user (such as a user creating the package).
Don’t use “INSTALL_DESTDIR” as the environment variable name - it turns
out
this gets used by many installation makefiles, and would cause trouble
instead of helping.
This way, you don’t have to list which directories get redirected -
temporary and local files aren’t redirected, while files getting
installed
will get redirected.

Obviously, if an attacker can control the environment variable of a
root
user, then the root user’s commands will get redirected.
But if an attacker
controls root’s environment, the system is already compromised
(environment variables can already control the system in lethal ways
in such cases).
Never transition to root (from non-root) without removing all
environment
variables, and then adding in only the ones that you are certain are
okay.
This would be no different.

This approach is easy to apply (once the commands are changed),
executes quickly, clearly works in a shared
environment, and has no security issues.
So there’s a lot going for it.

This approach only redirects those particular commands, and that is
its fundamental weakness. However, although a lot of “make
install” routines recurse deeply and do complicated
things in their source directories, I’ve found in a quick scan that
most only use a few limited commands
(like cp, install, and mkdir) to actually do the installing where this
would matter, so I think this approach would be remarkably
successful.
Unlike the LD_PRELOAD approach, this even works if programs like
/bin/ln are
installed statically.

Unfortunately, this requires changing some really low-level
key programs like “cp” and “mkdir”.
This is probably easily justified for install,
since the whole purpose of “install” is to install programs, and
packages are very common in today’s world.
But changing “cp” and “mkdir” is no small matter;
even if all agreed to it (and such agreement is rare), it’d
take a long time to widely deploy (think of not only the many
Linux distros, but also the *BSDs, Cygwin, etc.).
So while this could be a long-term strategy, it’s not so great
for the short term.
Is there any way we can make things simpler?
I believe there is, as discussed next.

As noted above, typically only a few commands in “make install”
actually need to be redirected.
We could simply modify the PATH environment variable so that
its first directory is a “wrapper” directory.
The wrapper directory would contain specialized “wrapped”
versions of common commands that are used to install software
when running in “make install”.
These wrapped versions would then redirect the file-writing.
As noted earlier, such commands might include:
install, cp, mkdir, ln, mv, touch, chmod, chown, ls, rm, and rmdir.
(As noted above, programs that use
libtool
usually support DESTDIR directly and thus don’t need help.)

This approach has the most of the same pluses and minuses
as the previous approach, in particular, it
only redirects those few commands.
However, since those are the commands
actually used by “make install”, in many cases this approach should be
fine.
One big additional positive, however, is that this can be done right
away;
no changing of fundamental programs is required.
It requires no special privileges (and thus has no security
impact), and running the wrappers can be quick
(so there is practically no performance impact).
The wrappers can be written in portable shell, which means that the
wrappers can be really small and have no extra dependencies
(so there would be no reason to avoid using them for
installation).

One weakness of this approach
is that a “make install” that invokes one of these
commands using its fullpathname (e.g., /bin/cp or /bin/install instead
of “cp” or “install”) will not use the new redirecting
command. I’ve found that some makefiles set INSTALL as
/bin/install, and it’s possible a few other programs are done that way
too.
However, in many cases these are trivially overridden by invoking the
“make install” program as
“make INSTALL=install CP=cp MV=mv install”,
so this problem is typically easy to overcome.

This is a limited approach: It only redirects a few commands.
But as long as “make install” routines use only a few commands
to install programs — which seems to be the norm —
then this approach is remarkably simple and effective.
You could do worse than something that’s simple and effective.
If you really need lots of programs redirected, you might be able to
combine this with LD_PRELOAD based approaches; LD_PRELOAD works with
many
programs but tends to fail on the few that most matter (e.g., cp,
mkdir, ln),
so you can wrap the programs that LD_PRELOAD fails on, and let it
pick up the rest.

I’ve implemented this approach (with a small dosage of the
“make using special SHELL approach” described next).
I implemented it using the bash shell.
Most installations already have bash installed;
if I’d used perl or Python, a user would have to install a
much bigger program just to run it.
C is terrible for string processing, so C would not be a great way
to implement it either.
If you’re interested, please go take a look at
Auto-DESTDIR.

Many make programs, such as GNU make, include the ability to set what
shell to run.
E.G., “make SHELL=... “ lets you override which shell to run.
A special shell could then be used to override where the files go.
By itself, this could be easily fooled; a number of install scripts
call
sub-scripts which then do the work.
However, this might integrate very nicely with other approaches;
it would make it possible to “catch” file redirections, for example,
and override calls to “/usr/bin/install” and such.
One problem is that
this can easily lead to completely re-implementing the shell,
which is a terrible idea.

The “chroot” call is available everywhere.
A traditional chroot jail could
be created so that “written” files aren’t
written to the “real” system.
This was one of the first approaches I thought of.
Unfortunately, on most Linux-based systems it can be rather complicated
to set up proper chroot environments.
Calling chroot() is easy, but setting up the right environment to use
it may
involve either a large number of shared mounts that can’t be written to
(a security concern) or a vast amount of file copying.
Calling on chroot() requires root privileges, which distributions are
loathe
to give, and root privileges must be later dropped after the
environment is
set up (since root privileges can escape chroot() jails);
this can make it difficult to integrate with other tools (such as
many package recompilation tools).

FreeBSD
automatically implements DESTDIR using chroot,
but it isn’t entirely clear that how FreeBSD handles this is entirely
desirable.
FreeBSD’s approach stores package information in $DESTDIR/var/db/pkg,
which in many cases is not where you wanted that information.
I suspect that
FreeBSD’s approach depends on
features that other kernels (including Linux) do not have, e.g.,
it depends on mount_nullfs(1), which appears to make their approach
(implemented in file bsd.destdir.mk) hard to
move to more-popular systems.
It’s interesting to note that implementing this was not easy;
it
took two tries to get this functionality working.
(Here’s
a web-accessible copy of bsd.destdir.mk implementing DESTDIR.)
In any case, it is not at all clear
that various Linux distributors are willing to use chroot
to automate DESTDIR.
See the kernel material (above) for some of the negatives of this.

RUST
is “a toolkit for creating RPM packages to distribute software...
[it] is both a drag & drop RPM creation GUI and a ‘sandboxing’
toolkit that allows you to do software installations within a
chrooted environment and automatically generate RPMs from arbitrary
source code, without ever seeing a spec file.”
But since it doesn’t actually create
.spec files, its results cannot be submitted to typical Linux
repositories (like Fedora’s).

There are other sandboxing approaches beyond chroot,
such as User Mode Linux, that
could be used. But these appear really heavyweight for the
purpose. Examples include:

Plash is
an approach for sandboxing. As explained on the Plash website, it
performs sandboxing by using a chroot to prevent all file access, and
then
modifying “library calls (such as open()) so that they make
remote procedure calls (RPCs) to another process instead of
making the usual Linux system calls.”
I think this is too heavyweight for this task.

I’ve implemented the
“Wrappers for basic install commands and a special
PATH”
approach above to automate DESTDIR.
To get it,
get the
auto-DESTDIR package.
As long as “make install” only
uses a limited set of commands —
which seems to be true in practice — this approach seems to
solve the problem without requiring security issues or complicated
reconfiguration of low-level infrastructure.
It’s also very, very portable.

You might also find the
user-union program useful, if you are
trying to automate DESTDIR for existing programs.
User-union creates union mounts, without requiring special privileges,
and it can work with
auto-DESTDIR.