As far as I can tell, an equality index is an index of hashes of the
normalized values, not of the entire normalized values. Also, I think
an equality index can have hash collisions with substring indices.

Yes. Pretty sure I raised this issue a long time ago, when 2.1 was still
in Alpha. I've definitely seen these collisions occur.

If so:

If a value's equality hash collides with a substring hash which matches
a lot of values (e.g. the last 4 characters of an e-mail address), it
will be impossible or very slow to look up that value by an equality
filter. I expect the same can happen with presence indices, though I
haven't checked the code for that. It will be rare, but when or if it
happens, it will be experienced as a _very_ mysterious bug.

The key for presence indexing is not generated by the frontend. In
either back-bdb or back-ldbm the likelihood of a collision is
vanishingly small. Nor is there any consequence if any collision with
the presence key does occur.

So I suggest that 2 bits of the hash value are reserved to indicate
whether it is an equality, presence, substring or approx hash.

Sounds like a good idea. Right now an indicator is fed into the hash
input, and that obviously is still prone to collisions. Explicitly
masking it onto the hash output would cure that issue. Also sounds like
something that will have to wait for RE23 since it will cause an
incompatible format change.