Modifying The TIFF Library

This chapter provides information about the internal structure of
the library, how to control the configuration when building it, and
how to add new support to the library.
The following sections are found in this chapter:

Information on compiling the library is given
elsewhere in this documentation.
This section describes the low-level mechanisms used to control
the optional parts of the library that are configured at build
time. Control is based on
a collection of C defines that are specified either on the compiler
command line or in a configuration file such as port.h
(as generated by the configure script for UNIX systems)
or tiffconf.h.

Configuration defines are split into three areas:

those that control which compression schemes are
configured as part of the builtin codecs,

those that control support for groups of tags that
are considered optional, and

those that control operating system or machine-specific support.

If the define COMPRESSION_SUPPORT is not defined
then a default set of compression schemes is automatically
configured:

two experimental schemes intended for images with high dynamic range
(compression 34676 and 34677).

Lempel-Ziv & Welch (LZW) algorithm (compression 5), is no longer supported by default, due to Unisys patent enforcement (c.f Burn All GIFs). To enable lzw compression, you must obtain the libtiff-lzw-compression-kit from ftp://ftp.remotesensing.org/libtiff/. N.B. to use this kit legally, you must live in a country where the patent doesn't apply or you must obtain a license from Unisys.

To override the default compression behaviour define
COMPRESSION_SUPPORT and then one or more additional defines
to enable configuration of the appropriate codecs (see the table
below); e.g.

Several other compression schemes are configured separately from
the default set because they depend on ancillary software
packages that are not distributed with libtiff.

Support for JPEG compression is controlled by JPEG_SUPPORT.
The JPEG codec that comes with libtiff is designed for
use with release 5 or later of the Independent JPEG Group's freely
available software distribution.
This software can be retrieved from the directory
ftp.uu.net:/graphics/jpeg/.

Enabling JPEG support automatically enables support for
the TIFF 6.0 colorimetry and YCbCr-related tags.

The deflate algorithm is experimental. Do not expect
to exchange files using this compression scheme;
it is included only because the similar, and more common,
LZW algorithm is claimed to be governed by licensing restrictions.

This software is developed on Silicon Graphics UNIX
systems (big-endian, MIPS CPU, 32-bit ints,
IEEE floating point).
The configure shell script generates the appropriate
include files and make files for UNIX systems.
Makefiles exist for non-UNIX platforms that the
code runs on -- this work has mostly been done by other people.

In general, the code is guaranteed to work only on SGI machines.
In practice it is highly portable to any 32-bit or 64-bit system and much
work has been done to insure portability to 16-bit systems.
If you encounter portability problems please return fixes so
that future distributions can be improved.

The software is written to assume an ANSI C compilation environment.
If your compiler does not support ANSI function prototypes, const,
and <stdarg.h> then you will have to make modifications to the
software. In the past I have tried to support compilers without const
and systems without <stdarg.h>, but I am
no longer interested in these
antiquated environments. With the general availability of
the freely available GCC compiler, I
see no reason to incorporate modifications to the software for these
purposes.

An effort has been made to isolate as many of the
operating system-dependencies
as possible in two files: tiffcomp.h and
libtiff/tif_<os>.c. The latter file contains
operating system-specific routines to do I/O and I/O-related operations.
The UNIX (tif_unix.c),
Macintosh (tif_apple.c),
and VMS (tif_vms.c)
code has had the most use;
the MS/DOS support (tif_msdos.c) assumes
some level of UNIX system call emulation (i.e.
open,
read,
write,
fstat,
malloc,
free).

Native CPU byte order is determined on the fly by
the library and does not need to be specified.
The HOST_FILLORDER and HOST_BIGENDIAN
definitions are not currently used, but may be employed by
codecs for optimization purposes.

The following defines control general portability:

BSDTYPES

Define this if your system does NOT define the
usual BSD typedefs: u_char,
u_short, u_int, u_long.

HAVE_IEEEFP

Define this as 0 or 1 according to the floating point
format suported by the machine. If your machine does
not support IEEE floating point then you will need to
add support to tif_machdep.c to convert between the
native format and IEEE format.

HAVE_MMAP

Define this if there is mmap-style support for
mapping files into memory (used only to read data).

HOST_FILLORDER

Define the native CPU bit order: one of FILLORDER_MSB2LSB
or FILLORDER_LSB2MSB

HOST_BIGENDIAN

Define the native CPU byte order: 1 if big-endian (Motorola)
or 0 if little-endian (Intel); this may be used
in codecs to optimize code

On UNIX systems HAVE_MMAP is defined through the running of
the configure script; otherwise support for memory-mapped
files is disabled.
Note that tiffcomp.h defines HAVE_IEEEFP to be
1 (BSDTYPES is not defined).

The software makes extensive use of C typedefs to promote portability.
Two sets of typedefs are used, one for communication with clients
of the library and one for internal data structures and parsing of the
TIFF format. There are interactions between these two to be careful
of, but for the most part you should be able to deal with portability
purely by fiddling with the following machine-dependent typedefs:

uint8

8-bit unsigned integer

tiff.h

int8

8-bit signed integer

tiff.h

uint16

16-bit unsigned integer

tiff.h

int16

16-bit signed integer

tiff.h

uint32

32-bit unsigned integer

tiff.h

int32

32-bit signed integer

tiff.h

dblparam_t

promoted type for floats

tiffcomp.h

(to clarify dblparam_t, it is the type that float parameters are
promoted to when passed by value in a function call.)

The following typedefs are used throughout the library and interfaces
to refer to certain objects whose size is dependent on the TIFF image
structure:

typedef unsigned int ttag_t;

directory tag

typedef uint16 tdir_t;

directory index

typedef uint16 tsample_t;

sample number

typedef uint32 tstrip_t;

strip number

typedef uint32 ttile_t;

tile number

typedef int32 tsize_t;

i/o size in bytes

typedef void* tdata_t;

image data ref

typedef void* thandle_t;

client data handle

typedef int32 toff_t;

file offset (should be off_t)

typedef unsigned char* tidata_t;

internal image data

Note that tstrip_t, ttile_t, and tsize_t
are constrained to be
no more than 32-bit quantities by 32-bit fields they are stored
in in the TIFF image. Likewise tsample_t is limited by the 16-bit
field used to store the SamplesPerPixel tag. tdir_t
constrains
the maximum number of IFDs that may appear in an image and may
be an arbitrary size (without penalty). ttag_t must be either
int, unsigned int, pointer, or double
because the library uses a varargs
interface and ANSI C restricts the type of the parameter before an
ellipsis to be a promoted type. toff_t is defined as
int32 because
TIFF file offsets are (unsigned) 32-bit quantities. A signed
value is used because some interfaces return -1 on error (sigh).
Finally, note that tidata_t is used internally to the library to
manipulate internal data. User-specified data references are
passed as opaque handles and only cast at the lowest layers where
their type is presumed.

General Comments

The library is designed to hide as much of the details of TIFF from
applications as
possible. In particular, TIFF directories are read in their entirety
into an internal format. Only the tags known by the library are
available to a user and certain tag data may be maintained that a user
does not care about (e.g. transfer function tables).

To add support for a new directory tag you have three options. If your
tag is specific to a compression algorithm, see below. If you have a lot
of tags you may want to try using Niles Ritter's runtime tag-extension
scheme in the "contrib/tags" directory, which makes the changes
orthogonal to the main libtiff code. Otherwise use
the following guidelines to add support to the ``core library''.

Define the tag in tiff.h.

Add a field to the directory structure in tif_dir.h
and define a FIELD_* bit (also update the definition of
FIELD_CODEC to reflect your addition).

Add an entry in the TIFFFieldInfo array defined at the top of
tif_dirinfo.c.
Note that you must keep this array sorted by tag
number and that the widest variant entry for a tag should come
first (e.g. LONG before SHORT).

Add entries in _TIFFVSetField() and _TIFFVGetField()
for the new tag.

(optional) If the value associated with the tag is not a scalar value
(e.g. the array for TransferFunction) and requires
special processing,
then add the appropriate code to TIFFReadDirectory() and
TIFFWriteDirectory(). You're best off finding a similar tag and
cribbing code.

Add support to TIFFPrintDirectory() in tif_print.c
to print the tag's value.

If you want to maintain portability, beware of making assumptions
about data types. Use the typedefs (uint16, etc. when dealing with
data on disk and t*_t when stuff is in memory) and be careful about
passing items through printf or similar vararg interfaces.

To add builtin support for a new compression algorithm, you can either
use the "tag-extension" trick to override the handling of the
TIFF Compression tag (see Adding New Tags, above),
or do the following to add support directly to the core library:

Define the tag value in tiff.h.

Edit the file tif_codec.c to add an entry to the
_TIFFBuiltinCODECS array (see how other algorithms are handled).

Add the appropriate function prototype declaration to
tiffiop.h (close to the bottom).

Create a file with the compression scheme code, by convention files
are named tif_*.c (except perhaps on some systems where the
tif_ prefix pushes some filenames over 14 chars.

Edit Makefile.in (and any other Makefiles)
to include the new source file.

A codec, say foo, can have many different entry points:

TIFFInitfoo(tif, scheme)/* initialize scheme and setup entry points in tif */
fooSetupDecode(tif) /* called once per IFD after tags has been frozen */
fooPreDecode(tif, sample)/* called once per strip/tile, after data is read,
but before the first row is decoded */
fooDecode*(tif, bp, cc, sample)/* decode cc bytes of data into the buffer */
fooDecodeRow(...) /* called to decode a single scanline */
fooDecodeStrip(...) /* called to decode an entire strip */
fooDecodeTile(...) /* called to decode an entire tile */
fooSetupEncode(tif) /* called once per IFD after tags has been frozen */
fooPreEncode(tif, sample)/* called once per strip/tile, before the first row in
a strip/tile is encoded */
fooEncode*(tif, bp, cc, sample)/* encode cc bytes of user data (bp) */
fooEncodeRow(...) /* called to decode a single scanline */
fooEncodeStrip(...) /* called to decode an entire strip */
fooEncodeTile(...) /* called to decode an entire tile */
fooPostEncode(tif) /* called once per strip/tile, just before data is written */
fooSeek(tif, row) /* seek forwards row scanlines from the beginning
of a strip (row will always be >0 and <rows/strip */
fooCleanup(tif) /* called when compression scheme is replaced by user */

Note that the encoding and decoding variants are only needed when
a compression algorithm is dependent on the structure of the data.
For example, Group 3 2D encoding and decoding maintains a reference
scanline. The sample parameter identifies which sample is to be
encoded or decoded if the image is organized with PlanarConfig=2
(separate planes). This is important for algorithms such as JPEG.
If PlanarConfig=1 (interleaved), then sample will always be 0.

(Actually you may decide not to override the
tif_printdir method, but rather just specify it).

Create a private TIFFFieldInfo array for your tags and
merge them into the core tags at initialization time using
_TIFFMergeFieldInfo; e.g.

_TIFFMergeFieldInfo(tif, fooFieldInfo, N(fooFieldInfo));

(where N is a macro used liberaly throughout the distributed code).

Fill in the get and set routines. Be sure to call the parent method
for tags that you are not handled directly. Also be sure to set the
FIELD_* bits for tags that are to be written to the file. Note that
you can create ``pseudo-tags'' by defining tags that are processed
exclusively in the get/set routines and never written to file (see
the handling of TIFFTAG_FAXMODE in tif_fax3.c
for an example of this).

Fill in the print routine, if appropriate.

Note that space has been allocated in the FIELD_* bit space for
codec-private tags. Define your bits as FIELD_CODEC+<offset> to
keep them away from the core tags. If you need more tags than there
is room for, just increase FIELD_SETLONGS at the top of
tiffiop.h.

The library handles most I/O buffering. There are two data buffers
when decoding data: a raw data buffer that holds all the data in a
strip, and a user-supplied scanline buffer that compression schemes
place decoded data into. When encoding data the data in the
user-supplied scanline buffer is encoded into the raw data buffer (from
where it is written). Decoding routines should never have to explicitly
read data -- a full strip/tile's worth of raw data is read and scanlines
never cross strip boundaries. Encoding routines must be cognizant of
the raw data buffer size and call TIFFFlushData1() when necessary.
Note that any pending data is automatically flushed when a new strip/tile is
started, so there's no need do that in the tif_postencode routine (if
one exists). Bit order is automatically handled by the library when
a raw strip or tile is filled. If the decoded samples are interpreted
by the decoding routine before they are passed back to the user, then
the decoding logic must handle byte-swapping by overriding the
tif_postdecode
routine (set it to TIFFNoPostDecode) and doing the required work
internally. For an example of doing this look at the horizontal
differencing code in the routines in tif_predict.c.

The variables tif_rawcc, tif_rawdata, and
tif_rawcp in a TIFF structure
are associated with the raw data buffer. tif_rawcc must be non-zero
for the library to automatically flush data. The variable
tif_scanlinesize is the size a user's scanline buffer should be. The
variable tif_tilesize is the size of a tile for tiled images. This
should not normally be used by compression routines, except where it
relates to the compression algorithm. That is, the cc parameter to the
tif_decode* and tif_encode*
routines should be used in terminating
decompression/compression. This ensures these routines can be used,
for example, to decode/encode entire strips of data.

In general, if you have a new compression algorithm to add, work from
the code for an existing routine. In particular,
tif_dumpmode.c
has the trivial code for the "nil" compression scheme,
tif_packbits.c is a
simple byte-oriented scheme that has to watch out for buffer
boundaries, and tif_lzw.c has the LZW scheme that has the most
complexity -- it tracks the buffer boundary at a bit level.
Of course, using a private compression scheme (or private tags) limits
the portability of your TIFF files.