Well, thanks for all of the responses. I got several new ideas, and got
reminded of several things I had forgotten to include. It would be a
little impractical for me to "summarize" the entire report here, since it
is 26 pages long. However, there is a fairly complete draft available (at
26 pages it ought to be fairly complete :-), and if anyone is interested
in reading it, it is available in compressed postscript by anonymous ftp
from cs.arizona.edu in the file "/usr/ftp/janus/jc/tags.ps.Z". If anyone
has trouble getting this or needs an uncompressed file, or wants the latex
source, let me know. I haven't had time to discuss all of the ideas I got
from the net, so I've included draft notes in the text to give a hint
about these (and I've tried to credit the people who wrote me). A few of
the draft notes are essentially the remainders of my outline. Since I
have to go on to other work, the report is probably going to remain in
this unfinished state for a while.

For those who are curious, but not curious enough to read a 26-page
report, here is a rough outline of the methods (this does not include all
of the tricks and special cases)

tagged-words (type information is in the machine word)
tag fields (word is broken up into tag and data fields)
tag field in high end (most-significant bits) of word
use tag of all zeros for one type to avoid tagging cost
negative ints get a tag of all ones, non-negative ints
get a tag of all zeros
use sign bit for one type
use sign-bit = 0 for one type and optimize another type by
giving it the tag of all ones in the
high end and tagging by negation.
tag field in low end of word
use n-bit tags to represent pointers data aligned on n-byte boundaries
this allows tagging without shifting
use a tag of all zeros to avoid tagging and untagging
use tags that contain only one non-zero bit to make testing faster
use all zeros to optimize integer arithmetic
optimize integer arithmetic by adding/subtracting tag
after subtraction/addition
tag field in both ends of word
various combinations of tricks from the other two options
partitioning by pattern (type is encoded in the representation of the value,
no boxing/unboxing is done, usually words are
partitioned into ranges of numbers they rep.)
simple range tests to identify types
segments with statically determined types (a segment is a range of
numbers that can be identified by an
initial field in the bit-patterns used to
represent them).
segments with dynamically determined types
use BIBOP (table indexed by segments) to identify types
identify type in the segment itself
first word of segment
last word of segment
on a segmented architecture like the 80x86, one location has
many addresses so the (machine) segment
part of the address can be set to the
(representation) segment of the data type
being represented, and the offset can be
set in such a way that any object can be in
any machine segment.
on machines that ignore the high bits of pointer, these bits can
be used for free boxing/unboxing
make everything a legal IEEE float, using the NaN bit patterns to
encode all other types of data.

object-pointers (untagged pointers refering to self-identifying blocks
on the heap)
combinations of this scheme with the tagged-word scheme

descriptors (two-word data elements divided into a type word and a
value word)
encoding a qualifier (address+length representation of a sequence)
in a descriptor
encoding a cons cell in a descriptor
direct representation of floats