I want to define a function that takes an unsigned int as argument and returns an int congruent modulo UINT_MAX+1 to the argument.

A first attempt might look like this:

int unsigned_to_signed(unsigned n)
{
return static_cast<int>(n);
}

But as any language lawyer knows, casting from unsigned to signed for values larger than INT_MAX is implementation-defined.

I want to implement this such that (a) it only relies on behavior mandated by the spec; and (b) it compiles into a no-op on any modern machine and optimizing compiler.

As for bizarre machines... If there is no signed int congruent modulo UINT_MAX+1 to the unsigned int, let's say I want to throw an exception. If there is more than one (I am not sure this is possible), let's say I want the largest one.

I do not much care about the efficiency when I am not on a typical twos-complement system, since in my humble opinion that is unlikely. And if my code becomes a bottleneck on the omnipresent sign-magnitude systems of 2050, well, I bet someone can figure that out and optimize it then.

Now, this second attempt is pretty close to what I want. Although the cast to int is implementation-defined for some inputs, the cast back to unsigned is guaranteed by the standard to preserve the value modulo UINT_MAX+1. So the conditional does check exactly what I want, and it will compile into nothing on any system I am likely to encounter.

However... I am still casting to int without first checking whether it will invoke implementation-defined behavior. On some hypothetical system in 2050 it could do who-knows-what. So let's say I want to avoid that.

Question: What should my "third attempt" look like?

To recap, I want to:

Cast from unsigned int to signed int

Preserve the value mod UINT_MAX+1

Invoke only standard-mandated behavior

Compile into a no-op on a typical twos-complement machine with optimizing compiler

[Update]

Let me give an example to show why this is not a trivial question.

Consider a hypothetical C++ implementation with the following properties:

sizeof(int) equals 4

sizeof(unsigned) equals 4

INT_MAX equals 32767

INT_MIN equals -232 + 32768

UINT_MAX equals 232 - 1

Arithmetic on int is modulo 232 (into the range INT_MIN through INT_MAX)

On this hypothetical implementation, there is exactly one int value congruent (mod UINT_MAX+1) to each unsigned value. So my question would be well-defined.

I claim that this hypothetical C++ implementation fully conforms to the C++98, C++03, and C++11 specifications. I admit I have not memorized every word of all of them... But I believe I have read the relevant sections carefully. So if you want me to accept your answer, you either must (a) cite a spec that rules out this hypothetical implementation or (b) handle it correctly.

Indeed, a correct answer must handle every hypothetical implementation permitted by the standard. That is what "invoke only standard-mandated behavior" means, by definition.

Incidentally, note that std::numeric_limits<int>::is_modulo is utterly useless here for multiple reasons. For one thing, it can be true even if unsigned-to-signed casts do not work for large unsigned values. For another, it can be true even on one's-complement or sign-magnitude systems, if arithmetic is simply modulo the entire integer range. And so on. If your answer depends on is_modulo, it's wrong.

[Update 2]

hvd's answer taught me something: My hypothetical C++ implementation for integers is not permitted by modern C. The C99 and C11 standards are very specific about the representation of signed integers; indeed, they only permit twos-complement, ones-complement, and sign-magnitude (section 6.2.6.2 paragraph (2); ).

But C++ is not C. As it turns out, this fact lies at the very heart of my question.

The original C++98 standard was based on the much older C89, which says (section 3.1.2.5):

For each of the signed integer types, there is a corresponding (but
different) unsigned integer type (designated with the keyword
unsigned) that uses the same amount of storage (including sign
information) and has the same alignment requirements. The range of
nonnegative values of a signed integer type is a subrange of the
corresponding unsigned integer type, and the representation of the
same value in each type is the same.

C89 says nothing about only having one sign bit or only allowing twos-complement/ones-complement/sign-magnitude.

For each of the signed integer types, there exists a corresponding
(but different) unsigned integer type: "unsigned char", "unsigned
short int", "unsigned int", and "unsigned long int", each of
which occupies the same amount of storage and has the same alignment
requirements (3.9) as the corresponding signed integer type ; that
is, each signed integer type has the same object representation as
its corresponding unsigned integer type. The range of nonnegative
values of a signed integer type is a subrange of the corresponding
unsigned integer type, and the value representation of each
corresponding signed/unsigned type shall be the same.

The C++03 standard uses essentially identical language, as does C++11.

No standard C++ spec constrains its signed integer representations to any C spec, as far as I can tell. And there is nothing mandating a single sign bit or anything of the kind. All it says is that non-negative signed integers must be a subrange of the corresponding unsigned.

So, again I claim that INT_MAX=32767 with INT_MIN=-232+32768 is permitted. If your answer assumes otherwise, it is incorrect unless you cite a C++ standard proving me wrong.

@SteveJessop: Actually, I stated exactly what I want in that case: "If there is no signed int congruent modulo UINT_MAX+1 to the unsigned int, let's say I want to throw an exception." That is, I want the "right" signed int provided it exists. If it does not exist -- as might happen in the case of e.g. padding bits or ones-complement representations -- I want to detect that and handle it for that particular invocation of the cast.
–
NemoFeb 8 '13 at 17:52

Btw, I think that in your hypothetical tricky implementation int needs at least 33 bits to represent it. I know it's only a footnote, so you can argue it's non-normative, but I think footnote 49 in C++11 is intended to be true (since it's a definition of a term used in the standard) and it doesn't contradict anything explicitly stated in normative text. So all negative values must be represented by a bit pattern in which the highest bit is set, and hence you can't cram 2^32 - 32768 of them into 32 bits. Not that your argument relies in any way on the size of int.
–
Steve JessopFeb 8 '13 at 18:01

And regarding your edits in hvd's answer, I think you've mis-interpreted note 49. You say that sign-magnitude is forbidden, but it isn't. You've read it as: "the values represented by successive bits are additive, begin with 1, and (are multiplied by successive integral power of 2, except perhaps for the bit with the highest position)". I believe it should be read, "the values represented by successive bits (are additive, begin with 1, and are multiplied by successive integral power of 2), except perhaps for the bit with the highest position". That is, all bets are off if the high bit is set.
–
Steve JessopFeb 8 '13 at 18:06

@SteveJessop: Your interpretation may be correct. If so, it does rule out my hypothetical... But it also introduces a truly vast number of possibilities, making this question extremely hard to answer. This actually looks like a bug in the spec to me. (Apparently, the C committee thought so and fixed it throroughly in C99. I wonder why C++11 did not adopt their approach?)
–
NemoFeb 8 '13 at 18:13

If x >= INT_MIN (keep the promotion rules in mind, INT_MIN gets converted to unsigned), then x - INT_MIN <= INT_MAX, so this won't have any overflow.

If that is not obvious, take a look at the claim "If x >= -4u, then x + 4 <= 3.", and keep in mind that INT_MAX will be equal to at least the mathematical value of -INT_MIN - 1.

On the most common systems, where !(x <= INT_MAX) implies x >= INT_MIN, the optimizer should be able (and on my system, is able) to remove the second check, determine that the two return statements can be compiled to the same code, and remove the first check too. Generated assembly listing:

__Z1fj:
LFB6:
.cfi_startproc
movl 4(%esp), %eax
ret
.cfi_endproc

The hypothetical implementation in your question:

INT_MAX equals 32767

INT_MIN equals -232 + 32768

is not possible, so does not need special consideration. INT_MIN will be equal to either -INT_MAX, or to -INT_MAX - 1. This follows from C's representation of integer types (6.2.6.2), which requires n bits to be value bits, one bit to be a sign bit, and only allows one single trap representation (not including representations that are invalid because of padding bits), namely the one that would otherwise represent negative zero / -INT_MAX - 1. C++ doesn't allow any integer representations beyond what C allows.

Update: Microsoft's compiler apparently does not notice that x > 10 and x >= 11 test the same thing. It only generates the desired code if x >= INT_MIN is replaced with x > INT_MIN - 1u, which it can detect as the negation of x <= INT_MAX (on this platform).

[Update from questioner (Nemo), elaborating on our discussion below]

I now believe this answer works in all cases, but for complicated reasons. I am likely to award the bounty to this solution, but I want to capture all the gory details in case anybody cares.

Let's start with C++11, section 18.3.3:

Table 31 describes the header <climits>.

...

The contents are the same as the Standard C library header <limits.h>.

Here, "Standard C" means C99, whose specification severely constrains the representation of signed integers. They are just like unsigned integers, but with one bit dedicated to "sign" and zero or more bits dedicated to "padding". The padding bits do not contribute to the value of the integer, and the sign bit contributes only as twos-complement, ones-complement, or sign-magnitude.

Since C++11 inherits the <climits> macros from C99, INT_MIN is either -INT_MAX or -INT_MAX-1, and hvd's code is guaranteed to work. (Note that, due to the padding, INT_MAX could be much less than UINT_MAX/2... But thanks to the way signed->unsigned casts work, this answer handles that fine.)

C++03/C++98 is trickier. It uses the same wording to inherit <climits> from "Standard C", but now "Standard C" means C89/C90.

All of these -- C++98, C++03, C89/C90 -- have the wording I give in my question, but also include this (C++03 section 3.9.1 paragraph 7):

The representations of integral types shall define values by use of a
pure binary numeration system.(44) [Example: this International
Standard permits 2’s complement, 1’s complement and signed magnitude
representations for integral types.]

Footnote (44) defines "pure binary numeration system":

A positional representation for integers that uses the binary digits 0
and 1, in which the values represented by successive bits are
additive, begin with 1, and are multiplied by successive integral
power of 2, except perhaps for the bit with the highest position.

What is interesting about this wording is that it contradicts itself, because the definition of "pure binary numeration system" does not permit a sign/magnitude representation! It does allow the high bit to have, say, the value -2n-1 (twos complement) or -(2n-1-1) (ones complement). But there is no value for the high bit that results in sign/magnitude.

Anyway, my "hypothetical implementation" does not qualify as "pure binary" under this definition, so it is ruled out.

However, the fact that the high bit is special means we can imagine it contributing any value at all: A small positive value, huge positive value, small negative value, or huge negative value. (If the sign bit can contribute -(2n-1-1), why not -(2n-1-2)? etc.)

So, let's imagine a signed integer representation that assigns a wacky value to the "sign" bit.

A small positive value for the sign bit would result in a positive range for int (possibly as large as unsigned), and hvd's code handles that just fine.

A huge positive value for the sign bit would result in int having a maximum larger than unsigned, which is is forbidden.

A huge negative value for the sign bit would result in int representing a non-contiguous range of values, and other wording in the spec rules that out.

Finally, how about a sign bit that contributes a small negative quantity? Could we have a 1 in the "sign bit" contribute, say, -37 to the value of the int? So then INT_MAX would be (say) 231-1 and INT_MIN would be -37?

This would result in some numbers having two representations... But ones-complement gives two representations to zero, and that is allowed according to the "Example". Nowhere does the spec say that zero is the only integer that might have two representations. So I think this new hypothetical is allowed by the spec.

Indeed, any negative value from -1 down to -INT_MAX-1 appears to be permissible as a value for the "sign bit", but nothing smaller (lest the range be non-contiguous). In other words, INT_MIN might be anything from -INT_MAX-1 to -1.

Now, guess what? For the second cast in hvd's code to avoid implementation-defined behavior, we just need x - (unsigned)INT_MIN less than or equal to INT_MAX. We just showed INT_MIN is at least -INT_MAX-1. Obviously, x is at most UINT_MAX. Casting a negative number to unsigned is the same as adding UINT_MAX+1. Put it all together:

That last is what we just showed, so even in this perverse case, the code actually works.

That exhausts all of the possibilities, thus ending this extremely academic exercise.

Bottom line: There is some seriously under-specified behavior for signed integers in C89/C90 that got inherited by C++98/C++03. It is fixed in C99, and C++11 indirectly inherits the fix by incorporating <limits.h> from C99. But even C++11 retains the self-contradictory "pure binary representation" wording...

Question updated. I am down-voting this answer (for now) to discourage others... I will un-down-vote later because answer is interesting. (Correct for C, but wrong for C++. I think.)
–
NemoNov 3 '12 at 16:19

@Nemo The C standard applies to C++ in this case; at the very least, the values in <limits.h> are defined in the C++ standard as having the same meaning as in the C standard, so all of C's requirements for INT_MIN and INT_MAX are inherited in C++. You're correct that C++03 refers to C90, and C90 is vague about the allowed integer representations, but the C99 change (inherited at least via <limits.h> by C++11, hopefully also in a more straightforward way) to limit it to those three was one that codified existing practise: no other implementations existed.
–
hvdNov 3 '12 at 19:41

I agree that the meaning of INT_MIN etc. are inherited from C. But that does not mean the values are. (Indeed, how could they, since every implementation is different?) Your inference that INT_MIN is within 1 of -INT_MAX depends on wording that simply does not appear in any C++ spec. So while C++ does inherit the semantic meaning of the macros, the spec does not provide (or inherit) the wording that supports your inference. This appears to be an oversight in the C++ spec that prevents a fully-conforming efficient unsigned-to-signed cast.
–
NemoNov 3 '12 at 19:49

@Nemo If you (perhaps correctly) claim that C++ allows other representations, then on such an implementation, I claim that INT_MINisn't required to be the minimal representable value of type int, because as far as C is concerned, if the type does not match the requirements of int, the C standard cannot possibly cover that implementation in any way whatsoever, and the C++ standard does not provide any definition of it other than "what the C standard says". I'll check if there's a more straightforward explanation.
–
hvdNov 3 '12 at 19:53

It's not so easy with requirement (b). This compiles into a no-op with gcc 4.6.3 (-Os, -O2, -O3) and with clang 3.0 (-Os, -O, -O2, -O3). Intel 12.1.0 refuses to optimize this. And I have no info about Visual C.

OK, this is awesome. I wish I could split the bounty 80:20... I suspect the compiler's reasoning goes: If the loop does not terminate, result overflows; integer overflow is undefined; therefore the loop terminates; therefore i == n at termination; therefore result equals n. I still have to prefer hvd's answer (for the non-pathological behavior on less-smart compilers), but this deserves more up-votes.
–
NemoNov 5 '12 at 15:52

1

Unsigned are defined to be modulo. The loop is also guaranteed to terminate because n is some unsigned value and i eventually must reach every unsigned value.
–
idupreeJul 9 '13 at 5:22

EDIT: Fixed up code to avoid possible trap on non-modular-int machines (only one is known to exist, namely the archaically configured versions of the Unisys Clearpath). For simplicity this is done by not supporting the value -2n-1 where n is the number of int value bits, on such machine (i.e., on the Clearpath). in practice this value will not be supported by the machine either (i.e., with sign-and-magnitude or 1’s complement representation).

I am not trolling, but I am downvoting. You have claimed something is in the spec but refuse to cite it. Even if the spec says what you think it says (but refuse to prove), your answer is still wrong. -1.
–
NemoOct 31 '12 at 4:08

1

C++03 section 18.2.1.2 paragraph 56: "A type is modulo if it is possible to add two positive numbers and have a result that wraps around to a third number that is less." That is a direct quote. If you know otherwise, cite the spec. And again, even if is_modulo means what you think (which it may; so prove it), your answer is still wrong because your signed_from<false> casts to int unconditionally
–
NemoOct 31 '12 at 4:12

2

I have removed my downvote, but I still cannot accept this answer. First, I want chapter and verse of the spec that says is_modulo has anything to do with casting unsigned to int. (The section you quoted in chat only refers to arithmetic on int itself; not casting from unsigned.) Second, to my knowledge the standard does not guarantee that there is only one -- or any -- "trap" representation on non-twos-complement machines. I want a solution relying only on behavior specifically mandated by the standard, preferably with citations to back it up. (I did tag it "language-lawyer".)
–
NemoOct 31 '12 at 6:41

5

Homework? I have been out of school since before there was a C++ standard. Yes, of course it is an academic exercise; that does not change the fact that your answer is wrong. (Of course static_cast to unsigned is modulo; that has been guaranteed since C++98. It is also irrelevant to static_cast from unsigned.) Oh, by the way -- just out of curiosity -- did you use two accounts to downvote my question?
–
NemoOct 31 '12 at 7:36

1

@Industrial-antidepressant: Your first citation does not contradict anything I have said, because the spec does not require (max() - min() + 1) to be a power of 2. Your second citation, whose wording adds that requirement, was never adopted into the standard (look for yourself). Finally, none of this has any bearing whatsoever on the implementation-defined behavior of unsigned->signed casts. This answer is simply wrong.
–
NemoNov 4 '12 at 17:24

UINT_MAX is an expression of type unsigned int, and that makes your whole static_cast<int>(n + INT_MIN) - (UINT_MAX + INT_MIN + 1) of that type. It should be possible to fix that, though, and I expect it to then still be compiled the same.
–
hvdNov 3 '12 at 8:39

This answer is incorrect. See my comments on hvd's answer.
–
NemoNov 3 '12 at 16:20

Note that UINT_MAX+1 is zero on a traditional 2s complement system, the conversion to int was a noop, and we subtracted k*INT_MAX then added it back on "the same value". So an acceptable optimizer should be able to erase all that tomfoolery!

That leaves the problem of x > INT_MAX or not. Well, we create 2 branches, one with x > INT_MAX, and one without. The one without does a strait cast, which the compiler optimizes to a noop. The one with ... does a noop after the optimizer is done. The smart optimizer realizes both branches to the same thing, and drops the branch.

Issues: if UINT_MAX is really large or small relative to INT_MAX, the above might not work. I am assuming that k*INT_MAX <= UINT_MAX+1 implicitly.

which work out to 2 and 1 on a 2s complement system I believe (are we guaranteed for that math to work? That's tricky...), and do logic based on these that easily optimize away on non-2s complement systems...

This also opens up the exception case. It is only possible if UINT_MAX is much larger than (INT_MIN-INT_MAX), so you can put your exception code in an if block asking exactly that question somehow, and it won't slow you down on a traditional system.

I'm not exactly sure how to construct those compile-time constants to deal correctly with that.

UINT_MAX cannot be small relative to INT_MAX, because the spec guarantees that every positive signed int is representable as an unsigned int. But UINT_MAX+1 is zero on every system; unsigned arithmetic is always modulo UINT_MAX+1. Still there might be a kernel of a workable approach here...
–
NemoOct 31 '12 at 3:33

@Nemo Just following this thread, so pardon my potentially obvious question: Is your statement "UINT_MAX+1 is zero on every system` established in the '03-spec? If so, is there a specific subsection I should be looking under? Thanks.
–
WhozCraigOct 31 '12 at 4:51

@WhozCraig: Section 3.9.1 paragraph 4: "Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2^n where n is the number of bits in the value representation of that particular size of integer", with a footnote saying "This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type." Basically unsigned is specified to work the way you want/expect.
–
NemoOct 31 '12 at 6:35

Even if signed integers are two's complement, that does not say anything about how unsigned numbers cast to signed (which is implementation-defined). So this does not solve the problem. (For example, the C99/C11 standards specifically contemplate "padding bits". But that is just one example. "Implementation-defined" means just that. Any attempt along these lines is futile, I believe.)
–
NemoNov 7 '12 at 7:27

@Nemo - I changed a little is_2complement_system<T>::value, now I check how casting works for -1 (int(unsigned(-1)) - this is more than checking if system is 2 complement... I am thinking about adding verification for test casting for INT_MAX and other such numbers. Will it be reliable way to test that system is of your desired type (the most standard 2 complement system) or there is/would be still some other drawbacks here?
–
PiotrNyczNov 7 '12 at 9:22

The problem is that "implementation-defined" could mean just about anything, as long as the implementation defines it. So each unsigned int might cast to a different signed int, or some could throw some sort of exception, or whatever. So I do think any approach along these lines is workable in a strict standard-conforming sense.
–
NemoNov 7 '12 at 19:30

This makes not-strictly-portable assumptions: the behaviour of reading cast.Out is undefined when the bits of cast.In do not represent a value of type int. It will work in practise on almost all systems, but so will a simple static_cast to int.
–
hvdNov 3 '12 at 21:10

Aliases are permitted for types that only differ by qualifier or sign.
–
Sergey K.Nov 3 '12 at 21:10

1

For two's complement with INT_MIN < -INT_MAX and no padding bits, you're correct. And that will indeed be the most common system. I read the question as requiring defined behaviour for uncommon systems too, but if only standard-defined behaviour is required for the common systems, your answer looks good to me.
–
hvdNov 3 '12 at 21:16

3

This is perfectly standard-compliant: No, the standard does not tell that you will get the right answer this way.
–
Johan LundbergNov 3 '12 at 23:14

3

@SergeyK. I agree. If I understand the standard right it is legal and safe to do this but the resulting value could be anything.
–
Johan LundbergNov 3 '12 at 23:26