If two pointers to object or incomplete types both point to the same object,
or both point one past the last element of the same array object,
they compare equal.
If the objects pointed to are members of the same aggregate object,
pointers to structure members declared later
compare greater than pointers to members declared earlier in the structure,
and pointers to array elements with larger subscript
values compare greater than pointers to elements
of the same array with lower subscript values.
All pointers to members of the same union object compare equal.
If the expression P points to an element of an array object
and the expression Q points to the last element of the same array object,
the pointer expression Q+1 compares greater than P.
In all other cases, the behavior is undefined.

Now remember that the C language was defined to cover a large
range of computer architectures,
including many which would be considered museum relics today.
It therefore takes a very conservative view of what is permitted,
so that it remains possible to write C programs for those ancient systems.
(Which weren't quite so ancient at the time.)

Bearing that in mind,
it is still possible for an allocation to generate a pointer
that satisfies the condition
despite the pointer not pointing into the region.
This will happen, for example, on an 80286 in protected mode,
which is used by Windows 3.x in Standard mode and OS/2 1.x.

In this system,
pointers are 32-bit values,
split into two 16-bit parts,
traditionally written as XXXX:YYYY.
The first 16-bit part (XXXX)
is the "selector", which chooses a bank of
64KB.
The second 16-bit part (YYYY)
is the "offset",
which chooses a byte within that 64KB bank.
(It's more complicated than this,
but let's just leave it at that for the purpose of this discussion.)

Memory blocks larger than 64KB are broken up into 64KB chunks.
To move from one chunk to the next, you add 8 to the selector.
For example, the byte after 0101:FFFF
is 0109:0000.

But why do you add 8 to move to the next selector?
Why not just increment the selector?
Because the bottom three bits of the selector are used for other things.
In particular,
the bottom bit of the selector is used to choose the selector table.
Let's ignore bits 1 and 2 since they are not relevant to the discussion.
Assume for convenience that they are always zero.²

There are two tables
which describe how selectors correspond to physical memory,
the Global Descriptor Table
(for memory shared across all processes)
and the Local Descriptor Table
(for memory private to a single process).
Therefore, the selectors available for process private memory
are 0001,
0009,
0011,
0019,
etc.
Meanwhile, the selectors available for global memory are
0008,
0010,
0018,
0020,
etc.
(Selector 0000 is reserved.)

Okay, now we can set up our counter-example.
Suppose regionStart = 0101:0000
and
regionSize = 0x00020000.
This means that the guarded addresses are
0101:0000
through
0101:FFFF
and
0109:0000
through
0109:FFFF.
Furthermore, regionStart + regionSize = 0111:0000.

Meanwhile,
suppose there is some global memory that happens to be
allocated at 0108:0000.
This is a global memory allocation because the selector is an even number.

Observe that the global memory allocation
is not part of the guarded region,
but its pointer value does satisfy the numeric inequality
0101:0000 ≤ 0108:0000 < 0111:0000.

Bonus chatter:
Even on CPU architectures with a flat memory model,
the test can fail.
Modern compilers take advantage of undefined behavior
and optimize accordingly.
If they see a relational comparison between pointers,
they are permitted to assume that the pointers point
into the same aggregate or array
(or one past the last element of that array),
because any other type of relational comparison is undefined.
Specifically,
if regionStart points to the start of an array or aggregate,
then
the only pointers that can legally be relationally compared with
regionStart
are the ones of the form
regionStart,
regionStart + 1,
regionStart + 2,
…,
regionStart + regionSize.
For all of these pointers,
the condition p >= regionStart
is true and can therefore be optimized out,
reducing the test to

if (p < regionStart + regionSize)

which will now be satisfied for pointers
that are numerically less than regionStart.

(You might run into this scenario if, as in the original question
that inspired this answer,
you allocated the region with
regionStart = malloc(n),
or if your region is a "quick access" pool of preallocated items
and you want to decide whether you need to free the pointer.)

Moral of the story:
This code is not safe, not even on flat architectures.

But all is not lost:
The pointer-to-integer conversion is implementation-defined,
which means that your implementation must document how it works.
If your implementation defines the pointer-to-integer conversion
as producing the numeric value of the
linear address of the object referenced by the pointer,
and you know that you are on a flat architecture,
then what you can do is compare
integers rather than pointers.
Integer comparisons are not constrained
in the same way that pointer comparisons are.

You may say vintage architectures are broken from our modern point of view, now that non-flat models are little more than relics. But good luck taking the time machine and convincing designers of former architectures, which had to put up with compatibility, performance and memory economy. The original 8086, for example, introduced the segmented model so it could both run 8-bit software with little modification (if you use the “tiny memory model”, where all segments pointed to the same 64 KB area, you get something pretty similar to a “virtual 8080” mode) while allowing modern software to address up to 1 MB of memory.

Of course, the 8086 is not binary compatible with the 8080. But Intel specifically designed its architecture and instruction set so each 8080 opcode had an 8086 equivalent. That way, you could use an special “translation assembler” to generate 8086 binaries from 8080 assembly files, which would run fine in the tiny memory model. This allowed to easily port CP/M software into MS-DOS, which was very important back in the day.

There’s a very inefficient method that is 100% portable and doesn’t rely on implementation-defined behavior: Use == rather than . Walk the region one byte at a time and compare each byte == to p. In theory the compiler could convert this into the efficient form in assembly, but I’ve never seen this optimization occur in practice.

That’s not portable. It will crash if you make a pointer to memory that doesn’t exist, and if the hardware checks for such pointers when they are loaded into registers rather than when the register is dereferenced. (Checking on load may be more efficient if a pointer is loaded once and dereferenced many times.)

> If they see a relational comparison between pointers, they are permitted to assume that the pointers point into the same aggregate or array
Okay
> Specifically, the only pointers that can legally be relationally compared with regionStart are the ones of the form regionStart,
regionStart + 1, regionStart + 2, …,

How does the second follow from the first? If the compiler can’t see where regionStart comes from, it may be possible regionStart itself points inside an array, but not to the first element. For example:

You’re right. The optimization is available only if the compiler can observe that regionStart is the start of an array. (Which might be something people do. “If the pointer is in my “quick access” array, then don’t call free.”)

Because you’re not allowed to do that. 7.1.3 Reserved Identifiers: “[A]ll external identifiers defined by the library are reserved in a hosted environment. This means, in effect, that no user-supplied external names may match library names, not even if the user function has the same specification.” (Emphasis in original.) This is what allows, for example, the compiler to inline standard library functions.

It should be clear that the standard cannot provide standard-compliant ways of working around bugs in the implementation. It’s not like the standard can say “If the implementation implements integer division incorrectly, a conforming program can use this workaround to force the implementation to use an alternate division algorithm,” (And repeat for each possible thing an implementation could have a bug in.) Working around implementation bugs is clearly an implementation-specific matter, outside the scope of the standard.

Basically, the specification boils down to “only comparisons of pointers to the same object [array, structure, etc.] are defined”. As noted by Raymond, the only way to do it “properly” depends on the conversion from pointer to integer. And, as that operation is machine-dependant, there is not way to make it in a portable fashion.

Not that it matters much. It would be argued that code which compares pointers to different objects and, thus, depends on memory allocation order is broken. Range-checking only makes sense in the cases defined by the C specification.

Sometimes depending on the memory allocation order is exactly what you want. I have few times used the memory addresses to provide total ordering for a group of objects. It’s especially handy to breaking ties (two objects compare equal in the ordering criteria? Fine, the one with lower address is less than), or sometimes the objects don’t even have any useful way to compare them (in which case you probably don’t care in what order the are, just that they are in some stable order, so memory order is fine).

Bits 0 and 1 contains the “requested RPL” for the segment. Basically, it’s a two bit integer (0, 1, 2 or 3) telling which privilege level you are requesting when accessing the segment. If your requested privilege level is wrong (it compares with the descriptor’s privilege level on segment register load), the petition will be denied, and you won’t go to space today :-) .

Thank you for making me feel slightly less insane. I had a vague thought that that was the case, from a youth misspent doing assembly language for the 80286 on an operating system that actually used call gates and such, but I was feeling too lazy to look it up.

C++ has std::less, which returns consistent results and does not cause undefined behavior when comparing pointers into different arrays; but AFAICT it still isn’t required to be useful for checking whether an address is within an array. For instance, with far pointers, it could treat offsets as more significant than selectors. And unlike casting to uintptr_t, the behavior isn’t even implementation-defined.

Good point. std::less merely extends the existing partial order to a total order, so it removes the undefined behavior, but you could still have pointers p that satisfy x ≤ p < (x+n) even though p is not in the array of length n pointed to by x.

You could first use std::less etc. to check whether the pointer seems to be between the beginning and the end of the array. If the result is negative, then that is reliable. If the result is positive, then run a binary search to find the apparent position in the array, and finally use == to make sure. The complexity would be O(log(size of array)).

OTOH, this kind of code would be useful only as a fallback for (architecture, compiler) pairs on which the uintptr_t cast doesn’t work. So if you don’t support customers building your code on random DeathStations, then there’s hardly a reason to optimize the fallback.

It was even worse than that on OS/2 1.x, because you couldn’t just add 8 to the selector. You needed to call DosGetHugeShift(&usShiftCount), then shift 1 left by usShiftCount, and then use the resulting number as the selector increment. I can’t imagine why this system never caught on.

I recall being at a C++ standardization meeting were they had a similar debate. They wanted to say something (an array or vector or such) was in “contiguous” memory. But what exactly is “contiguous” memory? When one byte is “next” to another? So the diodes have to be adjacent on the chip? And an algorithmic method of defining it runs into problems when you remember that in C++ you can redefine the meaning of ++, [ ], ==, in reference to your object.

This is the kind of optimisation that would make me consider the compiler “broken”, because any compiler smart enough to see that it’s “UB that can be optimised away” should also be smart enough to see that the test was put here for a reason (much like, say, an attempt to detect overflow in the result of a mathematical operation).

Or at least, it should output a warning that the comparison relies on UB.

“But why do you add 8 to move to the next selector? Why not just increment the selector? Because the bottom three bits of the selector are used for other things. In particular, the bottom bit of the selector is used to choose the selector table. Let’s ignore bits 1 and 2 since they are not relevant to the discussion. Assume for convenience that they are always zero.²”

Back in the days of 16 bit Windows you, or more likely a C compiler, could write code which referenced a magic windows kernel variable called (IIRC)_AHINCR. That told you ‘how much to increment a selector by to move 64 bytes forward’. So in 16 bit protected mode it would be 8 and in 16 bit real mode it would 1000h (65536>>4).

So a n*64K buffer would have n 64K selectors assigned to it, and you add _AHINCR to the value in the segment register to move between them.

Once 32 bit protected mode was possible the kernel would set the limit to the first selector to span the whole buffer on a 386+. So if the application knew about 32 bit mode it could just use a 32 bit offset from the first selector to access the whole buffer and it didn’t need to reload segment values. Which was no bad thing because loading a selector value into a segment register was not a fast process.

I always thought the whole scheme was rather neat – you could write gnarly segment arithmetic code for huge pointer access which was portable between real and protected mode and run on 8086,80286,80386+. And you could optionally write fast code for that avoid reloading segment values if you knew you were on a 386+.

Looks like the text editor got confused by the < & characters. What I typed had "this rather than that“, with two expressions for this and that, but the text edited combined them to look like one monster expression. It’s pure luck it compiles at all.

It’s also interesting to note that there is a similar problem with UB in (signed) integer overflow if you’re trying to validate “(pointer + size)” where “pointer” and “size” are incorrect / untrusted and may cause integer overflow.

There is an interesting security corollary to these facts: pointer tests that can only fail if they trigger undefined behaviour can be validly optimised away. An example of this with integer overflow UB was reported on GCC, way back in 2006,:

Though MSVC at the time wasn’t using this UB as an optimisation opportunity, it sparked a purge of such UB on integer overflow within MSFT. Interestingly, some 10 years later, the codegen (compiler) team have created a new optimiser that can and does take advantage of integer overflow UB (c.f. “Taking advantage of signed integer overflow being undefined”):

While they are still specifically trying to avoid optimising away anything that looks like a pointer test, the moral is that you need to avoid relying on UB in any integer arithmetic and in particular, if you rely on integer arithmetic UB, you may find parts of your code changed in ways that you didn’t expect (like having security checks removed)!

Of course, an 80286 OS might decide not to use GDT entries for “compatibility” so that no valid pointer value can occur in the middle of a huge allocation. (I understand that Windows Standard Mode made very sparse use of GDT entries and it wouldn’t surprise me if this was a contributing factor.)