6.2 Equality and comparison

Comparing two objects seems trivial, but if you look into deeper,
there are lots of subtleties hidden in the corners. What should it
mean if two procedures are equal to each other? How to order two
complex numbers? It all depends on your purpose; there’s no single
generic answer. So Scheme (and Gauche) provides several options,
as well as the way to make your own.

6.2.1 Equality

Scheme has three different general equality test predicates.
Other than these, some types have their own comparison predicates.

Function: eq?obj1 obj2

[R7RS]
This is the fastest and finest predicate.
Returns #t if obj1 and obj2 are identical
objects—that is, if they represents the same object on memory
or in a register. Notably, you can compare two symbols with
eq? to check if they are the same or not.
You can think eq? as a pointer comparison for
any heap-allocated objects.

Booleans can be compared with eq?, but you can’t
compare characters and numbers reliably—objects with the same
numerical value may or may not eq? to each other.
If you identity comparison needs to include those objects,
use eqv? below.

[R7RS]
When obj1 and obj2 are both exact or both inexact numbers
(except NaN),
eqv? returns #t iff (= obj1obj2) is true.
When obj1 and obj2 are both characters,
eqv? returns #t iff (char=? obj1obj2) is true.
Otherwise, eqv? is the same as eq? on Gauche.

Note that comparison of NaNs has some peculiarity.
Any numeric comparison fails if there’s at least one NaN in its argument.
Therefore, (= +nan.0 +nan.0) is always #f. However,
Gauche may return #t for (eq? +nan.0 +nan.0)
or (eqv? +nan.0 +nan.0).

Function: equal?obj1 obj2

[R7RS+]
If obj1 and obj2 are both aggregate types,
equal? compares its elements recursively.
Otherwise, equal? behaves the same as eqv?.

If obj1 and obj2 are not eqv? to each other,
not of builtin types, and the class of both
objects are the same, equal? calls the generic
function object-equal?.
By defining the method, users can extend the behavior of equal?
for user-defined classes.

Note: This procedure correctly handles the case when
both obj1 and obj2 have cycles through pairs and vectors,
as required by R6RS and R7RS. However, if the cycle involves
user-defined classes, equal? may fail to terminate.

Generic Function: object-equal?obj1 obj2

This generic function is called when equal? is called on the objects
it doesn’t know about. You can define this method on your class
so that equal? can check equivalence. This method is supposed
to return #t if obj1 is equal to obj2, #f
otherwise. If you want to check equivalence of elements recursively,
do not call object-equal? directly; call equal? on each element.

This method catches equal? between two objects of a user-defined
classe, in case the user doesn’t define a specialized method for the class.

When called, it scans the registered default comparators that can handle
both obj1 and obj2,
and if it finds one, use the comparator’s equality
predicate to see if two arguments are equal to each other. When no
matching comparators are found, it just returns #f.
See Predefined comparators, about the default comparators:
Look for the entries of
default-comparator and comparator-register-default!.

Note: If you define object-equal? with exactly the same
specializers of this method,
you’ll replace it and that breaks
default-comparator operation. Future versions of Gauche
will prohibit such redefinition. For now, be careful not to redefine
it accidentally.

Sometimes you want to test if two aggregate structures
are topologically equal, i.e., if one has a shared substructure,
the other has a shared substructure in the same way.
Equal? can’t handle it; module util.isomorph
provides a procedure isomorphic? which does the job
(see Determine isomorphism).

6.2.2 Comparison

Equality only concern about whether two objects are equivalent or not.
However, sometimes we want to see the order among objects. Again,
there’s no single “universal order”. It doesn’t make mathematical
sense to ask if one complex number is greater than another, but
having some artificial order is useful when you want a consistent
result of sorting a list of objects including numbers.

Function: compareobj1 obj2

A general comparison procedure. Returns -1 if obj1 is less than
obj2, 0 if obj1 is equal to obj2, and 1 if obj1
is greater than obj2.

If obj1 and obj2 are incomparable, an error is signalled.
However, compare defines total order between most Scheme objects,
so that you can use it on wide variety of objects. The definition is
upper-compatible to the order defined in srfi-114.

Some built-in types are handled by this procedure reflecting
“natural” order of comparison if any (e.g. real numbers are compared
by numeric values, characters are compared by char< etc.)
For convenience,
it also defines superficial order between objects that doesn’t have
natural order; complex numbers are ordered first by their real part,
then their imaginary part, for example. That is,
1+i comes before 2-i,
which comes before 2, which comes before 2+i.

Boolean false comes before boolean true.

Lists are ordered by dictionary order: Take the common prefix.
If either one is () and the other is not, () comes first.
If both tails are not empty, compare the heads of the tails.
(This makes empty list the “smallest” of all lists).

Vectors (including uniform vectors) are compared first by their
lengths, and if they are the same, elements are compared from left to right.
Note that it’s different from lists and strings.

This method catches compare between two objects of a
user-defined class, in case the user doesn’t define a specialized
method for the class.

When called, it scans the registered default comparators that can
handle both obj1 and obj2,
and if it finds one, use the comparator’s compare procedure
to determine the order of obj1 and obj2.
When no matching comparators are found, it returns #f, meaning
two objects can’t be ordered.
See Predefined comparators, about the default comparators:
Look for the entries of
default-comparator and comparator-register-default!.

Note: If you define object-compare with exactly the same
specializers of this method,
you’ll replace it and that breaks
default-comparator operation. Future versions of Gauche
will prohibit such redefinition. For now, be careful not to redefine
it accidentally.

Function: eq-compareobj1 obj2

Returns -1 (less than), 0 (equal to) or 1 (greater than)
according to a certain total ordering of
obj1 and obj2. Both arguments can be any Scheme
objects, and can be different type of objects.
The following properties are guaranteed.

(eq-compare x y) is 0 iff (eq? x y) is #t.

The result is consistent within a single run of the process
(but may differ between runs).

Other than these, no actual semantics are given to the ordering.

This procedure is useful when you need to order arbitrary Scheme
objects, but you don’t care the actual order as far as it’s consistent.

6.2.3 Hashing

Hash functions have close relationship with equality predicate,
so we list them here.

Function: eq-hashobj

Function: eqv-hashobj

These are hash functions suitable to be used
with eq? and eqv?, respectively.
The returned hash value is system- and process-dependent,
and can’t be carried over the boundary of the running process.

Note: don’t hash numbers by eq-hash. Two numbers
are not guaranteed to be eq? even if they are numerically equal.

Function: default-hashobj

[SRFI-128+]
This is a hash function suitable to be used with equal?.

If obj is either a number, a boolean, a character,
a symbol, a keyword, a string, a list or a vector,
internal hash function is used to calculate the hash value.
If obj is other than that,
a generic function object-hash is called to calculate the hash value
(see below).

The hash value also depends on hash-salt, which differs
for every run of the process.

Function: portable-hashobj salt

Sometimes you need to calculate a hash value that’s “portable”,
in a sense that the value won’t change across multiple runs of
the process, nor between different platforms. Such hash value can
be used with storing objects externally to share among processes.

This procedure calculates a hash value of obj with such
characteristics; the hash value is the same for the same
object and the same salt value.
Here “same object” roughly means having
the same external representation. Objects equal? to each
other are same. If you write out an object with write,
and read it back, they are also the same objects in this sense.

This means objects without read/write invariance, such as
ports, can’t be handled with portable-hash. It is caller’s
responsibility that obj won’t contain such objects.

The salt argument is a nonnegative fixnum and gives variations
in the hash function. You have to use the same salt to get consistent
results.

If obj is other than a number, a boolean, a character,
a symbol, a keyword, a string, a list or a vector,
this procedure calls
a generic function object-hash is called to calculate the hash value
(see below).

Function: legacy-hashobj

Up to 0.9.4, Gauche had a hash function called hash
that was used in both equal?-hashtable and
for the portable hash function. It had a problem, though.

There was no way to salt the hash function, which makes the
hashtables storing externally provided data vulnerable
to collision attack.

The hash function behaves poorly, especially on flonums.

There are bugs in bignum and flonum hashing code that
have produced different results on different architectures.

Since there are existing hash values calculated with
the old hash function, we preserve the behavior of the
original hash function as legacy-hash.
Use this when you need to access old data.
(The hash function also behaves as legacy-hash
by default, but it has tweaks; see below.)

The new code that needs portable hash value should use
portable-hash instead.

Generic Function: object-hashobj rec-hash

By defining a method for this generic function, objects of
user-defined types can have a hash value and can be used
in a equal? hash table.

The method has to return an exact non-negative integer,
and must return the same value for two object which are equal?.
Furthermore, the returned value must not rely on the platform
or state of the process, if obj is a portable object
(see portable-hash above for what is portable.)

If the method needs to get hash value of obj’s elements,
it has to call rec-hash on them. It guarantees that
the proper hash function is called recursively. So you can
count on rec-hash to calculate a portable hash value
when object-hash itself is called from portable-hash.

If obj has several elements,
you can call combine-hash-value on the elements’
hash values.

These two methods are defined by the system and ensures the
backward compatibility and the behavior of default-comparator.
Be careful not to replace these methods by defining the exactly same
specializers. In future versions of Gauche, attempts to replace
these methods will raise an error.

Function: combine-hash-valueha hb

Returns a hash value which is a combination of two hash
values, ha and hb. The guaranteed invariance
is that if (= ha1 ha2) and (= hb1 hb2)
then (= (combine-hash-value ha1 hb1) (combine-hash-value ha2 hb2)).
This is useful to write user-defined object-hash method.

Function: hashobj

This function is deprecated.

Calculate a hash value of obj suitable for equal? hash.
By default, it returns the same value as legacy-hash. However,
if this is called from default-hash or portable-hash
(via object-hash method), it recurses to the calling hash
function.

The behavior is to keep the legacy code work. Until 0.9.5,
hash is the only hash function to be used for both portable
hash and equal?-hash, and object-hash method takes single
argument (an object to hash) and calls hash recursively whenever it
needs to get a hash value of other objects pointed from the argument.

As of 0.9.5 we have more than one hash functions that calls
object-hash, so the method takes the hash function as
the second argument to recurse. However, we can’t just break
the legacy code; so there’s a default method defined in
object-hash which is invoked when no two-arg method
is defined for the given object, and dispatches to one-arg method.
As far as the legacy object-hash
code calls hash, it calls proper function.
The new code shouldn’t rely on this behavior, and must use
the second argument of object-hash instead.

Function: boolean-hashbool

Function: char-hashchar

Function: char-ci-hashchar

Function: string-hashstr

Function: string-ci-hashstr

Function: symbol-hashsym

Function: number-hashnum

[SRFI-128]
These are hash functions for specific type of objects.
In Gauche, these procedures are just a wrapper of default-hash
with type checks (and case folding when relevant). These are
mainly provided to conform srfi-128; in your code you might
just want to use default-hash (or eq-hash/eqv-hash,
depending on the equality predicate).

The case-folding versions, char-ci-hash and
string-ci-hash, calls char-foldcase and
string-foldcase respectively, on the argument before
passing it to hash.
(See Characters, for char-foldcase.
See Full string case conversion, for string-foldcase).

Function: hash-bound

Function: hash-salt

[SRFI-128]
Both evaluates to an exact nonnegative integers.

(Note: SRFI-128 defines these as macros, in order to allow
implementatins optimize runtime overhead. In Gauche we use
procedures but the overhead is negligible.)

User-defined hash functions can limit the range of the result between
0 and (hash-bound), respectively, without worrying to lose
quality of hash function. (User-defined hash functions don’t need
to honor (hash-bound) at all; hashtables takes modulo when
necessary.)

User-defined hash function can also take into account of the
value (hash-salt) into hash calculation; the salt value may
differ between runs of the Scheme processes, or even between
hash table instances. It is to avoid collision attack.
Built-in hash functions already takes the salt value into account,
so if your hash function is combining the hash values of
primitive types, you don’t need to worry about salt values.

6.2.4 Basic comparators

Equality and comparison procedures are parameters in various
data structures. A treemap needs to order its keys;
a hashtable needs to see if the keys are the same or not,
and it also need a hash function consistent with the equality predicate.

If we want to work on generic data structures, we need to abstract those
variations of comparison schemes. So here comes the comparator,
a record that bundles closely-related comparison procedures together.

There are two SRFIs that define comparators. The current one
is SRFI-128, and we recommend new code to use it. Gauche has all of
SRFI-128 procedures built-in. The older, and rather complex one is SRFI-114;
Gauche also supports it mainly for the backward compatibility.
Importantly, Gauche’s native <comparator> object is compatible
to both SRFI-128 and SRFI-114 comparators.

6.2.4.1 Comparator class and constructors

See if given two objects
are equal to each other; returns a boolean value.

Ordering predicate

Compare given two objects, and returns true iff the first one is
strictly precedes the second one. That is, this is a less-than predicate.

Comparison procedure

Compare given two objects, and returns either -1 (the first one is less
than the second), 0 (they are equal), or 1 (the first one is greater
than the second).

Hash function

Returns a hash value of the given object.

SRFI-128 comparators use the ordering predicate,
while SRFI-114 comparators use the comparison procedure.
Gauche’s <comparator> supports both by automatically
generating the missing one; that is, if you create a comparator
with SRFI-128 interface, by giving an ordering predicate,
Gauche automatically fills the comparison procedure, and
if you create one with SRFI-114 interface by giving a comparison
procedure, Gauche generates the ordering predicate.

A comparator may not have an ordering predicate / comparison procedure,
and/or a hash function. You can check if the comparator
can be used for ordering or hashing by
comparator-ordered? and
comparator-hashable?, respectively.

Some built-in data types such as hashtables (see Hashtables)
and treemaps (see Treemaps), take a comparator in their constructors.
The sort and merge procedures also accept comparators
(see Sorting and merging).

Function: make-comparatortype-test equal order hash :optional name

[SRFI-128+]
Creates a new comparator form the given type-test,
equal, order and hash functions, and returns it.

See the description of <comparator> above for the role of
those procedures.

Note: Both SRFI-128 and SRFI-114 defines make-comparator,
but where SRFI-128 takes order argument, SRFI-114 takes
compare argument. Since SRFI-128 is preferable, we adopt
it for the built-in interface, and give a different name
(make-comparator/compare) for SRFI-114 constructor.

Actually, some arguments can be non-procedures, to use predefined
procedures, for the convenience. Even if non-procedure arguments
are passed, the corresponding accessors (e.g.
comparator-type-test-procedure for the type-test
procedure) always return a procedure—either the given one
or the predefined one.

The type-test argument must be either #t or
a predicate taking one argument to test suitability of
the object for comparing by the resulting comparator.
If it is #t,
a procedure that always return #t is used.

The equal argument must
a predicate taking two arguments to test equality.

the order argument must be either #f or
a procedure taking two arguments and returning a boolean value.
It must return #t iff the first argument strictly precedes
the second one. If #f is passed, the comparator can not be
used for ordering.

The hash argument must be either #f, or
a procedure taking one argument and returning nonnegative exact
integer. If #f is given, it indicates the comparator
can’t hash objects; the predefined procedure just throws an error.

The fifth, optional argument name, is Gauche’s extension.
It can be any object but usually a symbol; it is only used when
printing the comparator, to help debugging.

This is SRFI-114 comparator constructor. In SRFI-114, this is called
make-comparator. Avoiding name conflict, we renamed it.
If you (use srfi-114) you get the original name make-comparator
(and the built-in make-comparator is shadowed).
This is provided for the backward
compatibility, and new code should use make-comparator as defined
in SRFI-128.

It’s mostly the same as make-constructor above, except
the following:

The third argument (compare) is a comparison procedure
instead of an ordering predicate.
It must be either #f, or a procedure taking two arguments
and returning either -1, 0, or 1, depending on whether the
first argument is less than, equal to,
or greater than the second argument.
If it is #f, it indicates the comparator can’t order objects.

You can pass #t to the equal argument when you give
a comparison procedure. In that case, equality is determined
by calling the comparison procedure and see if the result is 0.

Returns a symbol ordering if cmpr is created with
SRFI-128 constructor, and returns comparison if cmpr
is created with SRFI-114 constructor.

Usually applications don’t need to distinguish these two kinds of
comparators, for either kind of comparators can behave just as another
kind. This procedure is for some particular cases when one
wants to optimize for the underlying comparator implementation.

Function: comparator-ordered?cmpr

Function: comparator-hashable?cmpr

[SRFI-128]
Returns true iff a comparator cmpr can be used to order
objects, or to hash them, respectively.

These accessors always return procedures; if you give #f
to the order or hash argument of the constructor,
comparator-ordering-predicate and
comparator-hash-function still return a procedure,
which will just raise an error.

Function: comparator-comparison-procedurecmpr

[SRFI-114]
This is a SRFI-114 procedure, but sometimes handy with
SRFI-128 comparators. Returns a procedure that takes two objects
that satisfy the type predicates of cmpr. The procedure
returns either -1, 0 or 1, depending on whether the first object
is less than, equal to, or greater than the second. The comparator
must be ordered, that is, it must have an ordering predicate
(or a comparison procedure, if it is created by SRFI-114 consturctor).

Function: comparator-test-typecmpr obj

Function: comparator-check-typecmpr obj

[SRFI-128]
Test whether obj can be handled by a comparator cmpr,
by applying cmpr’s type test predicate.
The former (comparator-test-type) returns a boolean
values, while the latter (comparator-check-type)
signals an error when obj can’t be handled.

Function: =?cmpr obj obj2 obj3 …

Function: <?cmpr obj obj2 obj3 …

Function: <=?cmpr obj obj2 obj3 …

Function: >?cmpr obj obj2 obj3 …

Function: >=?cmpr obj obj2 obj3 …

[SRFI-128]
Compare objects using a comparator cmpr. All of
obj, obj2, obj3 … must satisfy
the type predicate of cmpr. When more than two objects
are given, the order of comparison
is undefined.

In order to use <?, <=?, >? and >=?,
comparator must be ordered.

Function: comparator-hashcmpr obj

[SRFI-128]
Returns a hash value of obj with the hash function of
a comparator cmpr. The comparator must be hashable,
and obj must satisfy comparator’s type test predicate.

Function: comparator-comparecmpr a b

[SRFI-114]
Order two objects a and b
using cmpr, and returns either one of
-1 (a is less than b),
0 (a equals to b), or
1 (a is greater than b).
Objects must satisfy cmpr’s type test predicate.

A simple comparison can be done by <? etc, but sometimes
three-way comparison comes handy. So we adopt this procedure
from srfi-114.

As you see in the definition, equality, ordering and hashing are
handled by equal?, compare and default-hash,
respectively. They takes care of builtin objects, and
also equal? and compare handle the case
when two objects () are of different types.

For objects of user-defined classes,
those procedures calls generic functions
object-equal?, object-compare and object-hash,
respectively. Defining methods for them automatically extended
the domain of default-comparator.

Srfi-128 defines another way to extend default-comparator.
See comparator-register-default! below for the details.

Function: comparator-register-default!comparator

[SRFI-128]
This is the SRFI-128 way for user programs to extend the behavior of
the default-comparator
(which is what make-default-comparator returns).

Note that, in Gauche, you can also extend default comparator’s behavior
by defining specialized methods for object-equal?,
object-compare and object-hash.
See the description of default-comparator above, for the details.

In fact, Gauche uses those generic functions to
handle the registered comparators; methods specialized for
<top> are defined for these generic functions, which
catches the case when default-comparator is applied
on object(s) of user-defined classes that don’t have specialized
methods defined for those generic functions.
The catching method examines registered comparators to find
one that can handle passed argument(s), and if it finds one,
use it.

You might frown at this procedure having a global side-effect.
Well, SRFI-128 explicitly prohibits comparators registered
by this procedure alters the behavior of the default comparator
in the existing domain—it is only allowed to handle objects that
aren’t already handled by the system’s original default comparator
and other already registered comparators. So, the only effect of
adding new comparator should make the default comparator
work on objects that had been previously raised an error.

In reality, it is impossible to enforce the condition. If you
register a comparator whose domain overlaps overlaps the domain
the default comparator (and its extensions via Gauche’s methods),
the program becomes non-portable at that moment. In the current version,
the comparators registered by comparator-register-default! has
the lowest precedence on the dispatch mechanism, but you shouldn’t count
on that.

Variable: eq-comparator

Variable: eqv-comparator

Variable: equal-comparator

[SRFI-114]
Built-in comparators that uses eq?, eqv? and equal?
for the equality predicate, respectively. They accept any kind of
Scheme objects. Each has corresponding hash functions
(i.e. eq-hash for eq-comparator,
eqv-hash for eqv-comparator and
default-hash for equal-comparator).
Only eq-comparator is ordered, using eq-compare to
order the objects
(see Comparison, for eq-compare).

Note that eq-comparator and eqv-comparator are not
equivalent from what make-eq-comparator and make-eqv-comparator
return, respectively. The latter two are defined in SRFI-128 and
specified to use default-hash for the hash function. It is
heavier than eq-hash/eqv-hash, and it can’t be used
for circular objects, nor for the mutable objects with which you want
to hash them by identity. We provide eq-comparator and
eqv-comparator in case you want to avoid limitations of
default-hash.

[SRFI-114]
Compare exact integers, integers, rational numbers, real numbers,
complex numbers and general numbers, respectively.
In Gauche number-comparator is the same as complex-comparator.

The equality are determined by =.
For exact integer, integer, rational and real comparators,
the order is the numerical order. Two complex numbers are
compared first by their real components, and then their imaginary
components only if the real components are the same.

Note that those comparator rejects NaN.
You need make-inexact-real-comparator in srfi-114 module
to compare NaNs with your own discretion.
See Comparators, for the details.

Variable: pair-comparator

Variable: list-comparator

Variable: vector-comparator

Variable: uvector-comparator

Variable: bytevector-comparator

[SRFI-114]
The default comparators to compare pairs, lists, vectors, uniform
vectors and bytevectors (which is synonym to u8vector).
Their respective elements are compared with the default comparators.

Note that lists are compared by dictionary order ((1 2 3) comes
before (1 3)), while in vector-families
shorter ones are ordered first (
#(1 3) comes before #(1 2 3)).

6.2.4.4 Combining comparators

Function: make-default-comparator

[SRFI-128]
Returns a default comparator. In Gauche, this returns the
default-comparator object.

Function: make-eq-comparator

Function: make-eqv-comparator

[SRFI-128]
Returns comparators that use eq? and eqv? for its
equality predicate, respectively. Note that they use default-hash
for hash functions, as specified by SRFI-128, which has a few drawbacks:
You can’t use it if you want to hash based on identity of mutable objects,
it diverges on circular objects, and it is slow if applied on a large
structures. We recommend to use eq-comparator or
eqv-comparator if possible (see Predefined comparators).

Function: make-reverse-comparatorcmpr

[SRFI-114]
Returns a comparator with the same type test predicate,
equality procedure, and hash function as the given comparator,
but the comparison procedure is flipped.

Function: make-key-comparatorcmpr test key

Suppose you have some kind of structure, but you only need to look
at one part of it to compare them.

Returns a new comparator that uses test as type test predicate.
Its equality predicate, comparison procedure and hash function are
constructed by applying key to the argument(s) then passing
the result to the corresponding procedure of cmpr. If cmpr
lacks comparison procedure and/or hash function, so does the returned
comparator.

In the following example, the tree-map users compares
the given user records only by the username slots:

Creates a comparator that compares lists of the form (x1 x2 …),
where each element is compared with the corresponding comparator.
For example, (make-tuple-comparator c1 c2 c3) will compare
three-element list, whose first elements are compared by c1,
second elements by c2 and third elements by c3.