First, there's no problem with the issue mentioned in your
subject line: It's perfectly all right to have several union members
with distinct names but the same type. If that were not so, even
something as simple as `union { int i; time_t t; } u;' could be in
trouble. See also 6.2.5p20, which says that union members have
"possibly distinct" types.

The "write one member, read another" question has been discussed
more than once, and my impression of the debates is that there have
been two camps: Not "It's legal" and "It's illegal," but "It's legal"
and "You'll probably get away with it, but it might not be squeaky-
clean, and my head hurts can we talk about something else, please?"
(I'm in the latter camp.)

It's clear (from 6.2.6.1) that writing `u.a' deposits bytes that
represent `3', and that `u.b' thereby receives the same bytes. No
argument there: The storage allocated to `u.b' holds a representation
of `3'.

The part that makes my head ache is figuring out whether the
compiler is required to "notice" that storing to `u.a' affects the
value of `u.b'. If the compiler has already loaded `u.b' into a
register, say, is it required to re-fetch because `u.a' was changed?
Is the compiler allowed to consider `u.b' uninitialized because it
has never been stored to, despite the store to `u.a'?

To those in the "It's legal" camp, I offer a few puzzling and
possibly disturbing points:

- The footnote to 6.2.5p21 points out that "an object with union
type can only contain one member at a time" -- meaning that if
`u' contains `u.a', it does not contain `u.b'. Footnotes, of
course, are suggestive but non-normative.

- The footnote to 6.5.2.3p3 supports the "It's legal" camp by
describing the mechanism of type punning. Footnotes, of course,
are suggestive but non-normative.

- 6.5.2.3p5 gives a "special guarantee" for union members that
are structs, but does not extend a similar guarantee for other
member types.

- 6.7.2.1p14 has the normative language for the first footnote
mentioned above: "The value of at most one of the members can be
stored in a union object at any time." Your `u' can hold `u.a'
or `u.b', but not both at once.

Those are the citations I can find (if I've missed any I'm sure
others will point them out). Their cumulative impression on me is
that the matter is not settled beyond doubt, but the aforementioned
angels may see things differently.

As a practical matter, it's not all that important what I think
or what the angels think, but what the providers of your compilers
think. If a compiler does something unfortunate with your code you
will find yourself retracing this same argument with implementors
who are trying to stamp NOT A BUG on your complaint. If the angels
weigh in on your side, the implementors of the offending compiler
may eventually accede and agree to ship a fix -- "In a forthcoming
release," oh joy, oh joy. I think you might choose better battles:
Fight over things you Really Really Need and are Really Solid Bugs,
and don't waste troops trying to subjugate the unpopulated hinterland.

In C89, paragraph 3.3.2.3 states "With one exception, if a member of a
union object is accessed after a value has been stored in a different
member of the object, the behavior is implementation-defined." The
exception referred to is not related to your example. So the answer
to your question is: yes if the implementation says it is and no if
the implementation says something else.

In C99, the reference to implementation defined is removed.
Furthermore, paragraph 6.2.6.1-7 states "When a value is stored in a
member of an object of union type, the bytes of the object
representation that do not correspond to that member but do correspond
to other members take unspecified values." Since a and b occupy the
same bytes, none of those byte become unspecified. And footnote 82
indicates the intended behavior is for the bits of b to
"reinterpreted" for the type of b. Since both a and b have the same
type, it seems to me the intention is to retrieve the same value.

Am 12/12/2011 06:49 PM, schrieb christian.bau:
> On Dec 11, 6:04 am, Barry Schwarz <> wrote:
> You are right, but that seems to have some awful consequences. Take
> this code:
>
> union {
> int a;
> long b;
> } u;
> u.a = 3;
> printf("%ld\n", u.b);
>
> So on an implementation where int and long have the same size and
> representation, this code would be well-defined and print "3"?
>
> Now take this code:
>
> void f (int* a, long* b) { *a = 3; *b = 4; *a = *a + 2; }
>
> If I call f (&u.a, &u.b) is this required to set both to 6?
> And since the compiler doesn't know that I'm going to make this call,
> lots of optimization goes out of the window?

If I remember correctly the aliasing rules state that the compiler is
allowed to assume that a and b (insided the function) point to different
objects because they are of different types. Thus in the second
assignment to *a the compiler can assume that *a is still 3 and store 5
in place.

Eric Sosman wrote:
> On 12/10/2011 5:06 PM, Edward Rutherford wrote:
>> Hello :
>>
>> Is the following code an undefined behavior?
>>
>>
>> union {
>> int a;
>> int b;
>> } u;
>> u.a = 3;
>> printf("%d\n", u.b);
>
> (I rush in where angels fear to tread...)
>
> First, there's no problem with the issue mentioned in your
> subject line: It's perfectly all right to have several union members
> with distinct names but the same type. If that were not so, even
> something as simple as `union { int i; time_t t; } u;' could be in
> trouble. See also 6.2.5p20, which says that union members have
> "possibly distinct" types.
>
> The "write one member, read another" question has been discussed
> more than once, and my impression of the debates is that there have been
> two camps: Not "It's legal" and "It's illegal," but "It's legal" and
> "You'll probably get away with it, but it might not be squeaky- clean,
> and my head hurts can we talk about something else, please?" (I'm in the
> latter camp.)
>
> It's clear (from 6.2.6.1) that writing `u.a' deposits bytes that
> represent `3', and that `u.b' thereby receives the same bytes. No
> argument there: The storage allocated to `u.b' holds a representation of
> `3'.
>
> The part that makes my head ache is figuring out whether the
> compiler is required to "notice" that storing to `u.a' affects the value
> of `u.b'. If the compiler has already loaded `u.b' into a register,
> say, is it required to re-fetch because `u.a' was changed? Is the
> compiler allowed to consider `u.b' uninitialized because it has never
> been stored to, despite the store to `u.a'?
>
> To those in the "It's legal" camp, I offer a few puzzling and
> possibly disturbing points:
>
> - The footnote to 6.2.5p21 points out that "an object with union
> type can only contain one member at a time" -- meaning that if
> `u' contains `u.a', it does not contain `u.b'. Footnotes, of
> course, are suggestive but non-normative.
>
> - The footnote to 6.5.2.3p3 supports the "It's legal" camp by
> describing the mechanism of type punning. Footnotes, of course,
> are suggestive but non-normative.
>
> - 6.5.2.3p5 gives a "special guarantee" for union members that
> are structs, but does not extend a similar guarantee for other
> member types.
>
> - 6.7.2.1p14 has the normative language for the first footnote
> mentioned above: "The value of at most one of the members can be
> stored in a union object at any time." Your `u' can hold `u.a'
> or `u.b', but not both at once.
>
> Those are the citations I can find (if I've missed any I'm sure
> others will point them out). Their cumulative impression on me is that
> the matter is not settled beyond doubt, but the aforementioned angels
> may see things differently.
>
> As a practical matter, it's not all that important what I think
> or what the angels think, but what the providers of your compilers
> think. If a compiler does something unfortunate with your code you will
> find yourself retracing this same argument with implementors who are
> trying to stamp NOT A BUG on your complaint. If the angels weigh in on
> your side, the implementors of the offending compiler may eventually
> accede and agree to ship a fix -- "In a forthcoming release," oh joy, oh
> joy. I think you might choose better battles: Fight over things you
> Really Really Need and are Really Solid Bugs, and don't waste troops
> trying to subjugate the unpopulated hinterland.

Thanks for the explanation, Eric.

Does that mean the "It's Legal" brigade would say it's always legal to
read an unsigned char from an union, whatever was previously stored in
it, on the grounds that an unsigned char cannot contain a trap
representation?

On Sat, 10 Dec 2011 18:14:08 -0500, Eric Sosman
<> wrote:
>
> As a practical matter, it's not all that important what I think
>or what the angels think, but what the providers of your compilers
>think. If a compiler does something unfortunate with your code you
>will find yourself retracing this same argument with implementors
>who are trying to stamp NOT A BUG on your complaint. If the angels
>weigh in on your side, the implementors of the offending compiler
>may eventually accede and agree to ship a fix -- "In a forthcoming
>release," oh joy, oh joy. I think you might choose better battles:
>Fight over things you Really Really Need and are Really Solid Bugs,
>and don't waste troops trying to subjugate the unpopulated hinterland.

On 12/12/2011 3:09 PM, Edward Rutherford wrote:
> [...]
> Does that mean the "It's Legal" brigade would say it's always legal to
> read an unsigned char from an union, whatever was previously stored in
> it, on the grounds that an unsigned char cannot contain a trap
> representation?

The varieties of `char' are something of a special case, because
C has always had the notion that it's possible to inspect and maybe
fiddle with the individual bytes of a multi-byte object. At your
peril, of course, since you might invalidate the multi-byte thing.
But still: Things like memcpy() are defined in terms of copying the
individual bytes, and the copy of a valid object must itself be
valid.

The Standard tightens this just a trifle, by allowing the `char'
flavors other than `unsigned' to have trap representations. Still,
`unsigned char' remains as the "atom" of C memory: Its mapping between
representations and values is one-to-one, which guarantees fidelity
both in value and in representation when copying or comparing, and
also guarantees that there are no trap representations.

But back to the `union' issue: I'm still not 100% comfortable
with the idea of writing to one member and reading another. It sort
of looks like it should work, but I've not heard a watertight argument
that it *must* work, even in the face of a ferociously aggressive
optimizer. I think the "It's legal" faction have found arguments they
deem satisfactory; perhaps they've looked more diligently than I have.

Down to nuts and bolts: Is this a theoretical question, or do you
have an actual use case in mind? If the latter, could you describe it?
Maybe someone will be able to say "Well, in *that* case it works" or
"If you did it *this other* way you wouldn't care."

Am 12/14/2011 12:10 AM, schrieb christian.bau:
> On Dec 12, 7:09 pm, Jens Gustedt <> wrote:
>
>> If I remember correctly the aliasing rules state that the compiler is
>> allowed to assume that a and b (insided the function) point to different
>> objects because they are of different types. Thus in the second
>> assignment to *a the compiler can assume that *a is still 3 and store 5
>> in place.
>
> You are right. On the other hand, footnote 82 says:
>
> "If the member used to access the contents of a union object is not
> the same as the member last used to store a value in the object, the
> appropriate part of the object representation of the value is
> reinterpreted as an object representation in the new type as described
> in 6.2.6 (a process sometimes called "type punning"). "
>
> Which is a direct contradiction. I am assuming that the rules for
> union members apply in the same way whether the compiler knows that it
> is accessing different members of the same union or not.

I think this assumption can't be made. Generally, inside a function the
compiler has no way to know that the pointers originate from the same
object. In the contrary the aliasing rules were invented to assure that
under the given circumstances the *must* point to different objects.

And these things happen. gcc assumes (or at least there has been some
version of gcc) that they are different, even if the function is inlined
and it could deduce that both point to the same address.

"christian.bau" <> writes:
> On Dec 13, 12:48Â am, Eric Sosman <> wrote:
>> Â Â Â Down to nuts and bolts: Is this a theoretical question, or do you
>> have an actual use case in mind? Â If the latter, could you describe it?
>> Maybe someone will be able to say "Well, in *that* case it works" or
>> "If you did it *this other* way you wouldn't care."
>
> I bet more than one person has tried to read the representation of a
> float or double as a 32 or 64 bit integer. Last time I tried, I found
> one way that worked on one compiler and failed on another, and another
> way that worked on the other compiler and failed at the first (one
> method was using a union, one was casting the address of a float to
> "pointer to unsigned int"), but I couldn't find any code that worked
> on both compilers. And having code with an #ifdef checking the
> compiler that is used doesn't really inspire confidence in the code
> :-(

So use memcpy(). (I suppose that's not strictly portable to
freestanding implementations, but I'd expect memcpy() to be one of the
things that most freestanding implementations actually provide.)

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

On 12/13/2011 06:17 PM, christian.bau wrote:
....
> I bet more than one person has tried to read the representation of a
> float or double as a 32 or 64 bit integer. Last time I tried, I found
> one way that worked on one compiler and failed on another, and another
> way that worked on the other compiler and failed at the first (one
> method was using a union, one was casting the address of a float to
> "pointer to unsigned int"), but I couldn't find any code that worked
> on both compilers.

Try reading it using unsigned char; if that doesn't work (for
appropriate values of "work"), the implementation is non-conforming.

christian.bau wrote:
> On Dec 13, 12:48 am, Eric Sosman wrote:
>
>> Down to nuts and bolts: Is this a theoretical question, or do you
>> have an actual use case in mind? If the latter, could you describe it?
>> Maybe someone will be able to say "Well, in *that* case it works" or
>> "If you did it *this other* way you wouldn't care."
>
> I bet more than one person has tried to read the representation of a
> float or double as a 32 or 64 bit integer. Last time I tried, I found
> one way that worked on one compiler and failed on another, and another
> way that worked on the other compiler and failed at the first (one
> method was using a union, one was casting the address of a float to
> "pointer to unsigned int"), but I couldn't find any code that worked
> on both compilers. And having code with an #ifdef checking the
> compiler that is used doesn't really inspire confidence in the code :-(

"christian.bau" <> writes:
> On Dec 13, 12:48 am, Eric Sosman <> wrote:
>
>> Down to nuts and bolts: Is this a theoretical question, or do you
>> have an actual use case in mind? If the latter, could you describe it?
>> Maybe someone will be able to say "Well, in *that* case it works" or
>> "If you did it *this other* way you wouldn't care."
>
> I bet more than one person has tried to read the representation of a
> float or double as a 32 or 64 bit integer. Last time I tried, I found
> one way that worked on one compiler and failed on another, and another
> way that worked on the other compiler and failed at the first (one
> method was using a union, one was casting the address of a float to
> "pointer to unsigned int"), but I couldn't find any code that worked
> on both compilers. [snip]

The straghtforward method using a union is required to work.
More specifically, this (assuming uint64_t exists and double
is 64 bits):

Barry Schwarz <> writes:
> On Sat, 10 Dec 2011 22:06:08 +0000 (UTC), Edward Rutherford
> <> wrote:
>
>>Hello :
>>
>>Is the following code an undefined behavior?
>>
>>
>> union {
>> int a;
>> int b;
>> } u;
>> u.a = 3;
>> printf("%d\n", u.b);
>
> In C89, paragraph 3.3.2.3 states "With one exception, if a member of a
> union object is accessed after a value has been stored in a different
> member of the object, the behavior is implementation-defined." The
> exception referred to is not related to your example. So the answer
> to your question is: yes if the implementation says it is and no if
> the implementation says something else.
>
> In C99, the reference to implementation defined is removed.
> Furthermore, paragraph 6.2.6.1-7 states "When a value is stored in a
> member of an object of union type, the bytes of the object
> representation that do not correspond to that member but do correspond
> to other members take unspecified values." Since a and b occupy the
> same bytes, none of those byte become unspecified. And footnote 82
> indicates the intended behavior is for the bits of b to
> "reinterpreted" for the type of b. Since both a and b have the same
> type, it seems to me the intention is to retrieve the same value.

I agree with your analysis, but just wanted to add one
item. Practically speaking, the behavior under C89/C90
and C99 is likely to be the same. This idea is also
supported by DR 283 (which is what prompted adding the
footnote), which makes it clear that the intended
semantics in the two cases is meant to be the same.

"christian.bau" <> writes:
> On Dec 11, 6:04 am, Barry Schwarz <> wrote:
>> In C99, the reference to implementation defined is removed.
>> Furthermore, paragraph 6.2.6.1-7 states "When a value is stored in a
>> member of an object of union type, the bytes of the object
>> representation that do not correspond to that member but do correspond
>> to other members take unspecified values." Since a and b occupy the
>> same bytes, none of those byte become unspecified. And footnote 82
>> indicates the intended behavior is for the bits of b to
>> "reinterpreted" for the type of b. Since both a and b have the same
>> type, it seems to me the intention is to retrieve the same value.
>
> You are right, but that seems to have some awful consequences. Take
> this code:
>
> union {
> int a;
> long b;
> } u;
> u.a = 3;
> printf("%ld\n", u.b);
>
> So on an implementation where int and long have the same size and
> representation, this code would be well-defined and print "3"?

"christian.bau" <> writes:
> On Dec 12, 7:09 pm, Jens Gustedt <> wrote:
>
>> If I remember correctly the aliasing rules state that the compiler is
>> allowed to assume that a and b (insided the function) point to different
>> objects because they are of different types. Thus in the second
>> assignment to *a the compiler can assume that *a is still 3 and store 5
>> in place.
>
> You are right. On the other hand, footnote 82 says:
>
> "If the member used to access the contents of a union object is not
> the same as the member last used to store a value in the object, the
> appropriate part of the object representation of the value is
> reinterpreted as an object representation in the new type as described
> in 6.2.6 (a process sometimes called "type punning"). "
>
> Which is a direct contradiction. I am assuming that the rules for
> union members apply in the same way whether the compiler knows that it
> is accessing different members of the same union or not.

It isn't a contradiction because of how the objects are
accessed is different in the two cases. When a member
is accessed (ie, using '.' or '->') the effective type
is determined by the declared type of the member.
When an object is accessed through a pointer, there
is no declared type, so the rule for what the effective
type is or must be is different.

Eric Sosman <> writes:
> On 12/10/2011 5:06 PM, Edward Rutherford wrote:
>> Hello :
>>
>> Is the following code an undefined behavior?
>>
>>
>> union {
>> int a;
>> int b;
>> } u;
>> u.a = 3;
>> printf("%d\n", u.b);
>
> [snip]
>
> The "write one member, read another" question has been discussed
> more than once, and my impression of the debates is that there have
> been two camps: Not "It's legal" and "It's illegal," but "It's legal"
> and "You'll probably get away with it, but it might not be squeaky-
> clean, and my head hurts can we talk about something else, please?"
> (I'm in the latter camp.)

Let's see if we can get you over into that other camp.
> It's clear (from 6.2.6.1) that writing `u.a' deposits bytes that
> represent `3', and that `u.b' thereby receives the same bytes. No
> argument there: The storage allocated to `u.b' holds a representation
> of `3'.
>
> The part that makes my head ache is figuring out whether the
> compiler is required to "notice" that storing to `u.a' affects the
> value of `u.b'. If the compiler has already loaded `u.b' into a
> register, say, is it required to re-fetch because `u.a' was changed?
> Is the compiler allowed to consider `u.b' uninitialized because it
> has never been stored to, despite the store to `u.a'?

The case in question is quite straightforward, because the two
members must occupy the same bytes (on every implementation)
and also have the same type. Hence the accesses do not violate
the effective type rules, and must proceed as described by the
semantics.

The semantics in this case are defined principally by 6.2.5 p20
and 6.3.2.1 p2. There is also the question of how the two
objects line up relative to one another, but that follows by
virtue of unions not having any padding before any members. (I'm
sure interested parties can find the appropriate references.)
These paragraphs are pretty simple to read; I don't see any
room for uncertainty. Since the accesses in this case clearly
do not violate the effective type rules, the behavior is
correspondingly well-defined.

To respond to your other points:
> To those in the "It's legal" camp, I offer a few puzzling and
> possibly disturbing points:
>
> - The footnote to 6.2.5p21 points out that "an object with union
> type can only contain one member at a time" -- meaning that if
> `u' contains `u.a', it does not contain `u.b'. Footnotes, of
> course, are suggestive but non-normative.

This comment is made in the context of defining the term
"aggregate type". Clearly a union is not an aggregate type
because it cannot hold two (or more) independent values. I don't
think there's any mystery about that.
> - The footnote to 6.5.2.3p3 supports the "It's legal" camp by
> describing the mechanism of type punning. Footnotes, of course,
> are suggestive but non-normative.

And the comment in the footnote is supported by normative text,
as noted above.
> - 6.5.2.3p5 gives a "special guarantee" for union members that
> are structs, but does not extend a similar guarantee for other
> member types.

It does, but notice that the guarantee made here is stronger
than just other member access. Under this passage we are
allowed to access struct members inside a union object _even
though no mention is made of a union at the point of access_.
It's a special guarantee because it's a stronger guarantee
than holds for other union member types.
> - 6.7.2.1p14 has the normative language for the first footnote
> mentioned above: "The value of at most one of the members can be
> stored in a union object at any time." Your `u' can hold `u.a'
> or `u.b', but not both at once.

What it says is that at most one member can be _stored_ at any
one time. That is obviously true since storing into another member
will eradicate the effects of the first store. The union can't
hold two independent values, but it does hold the object referred
to by u.b, and that happens to be the same object as the one
referred to by u.a. Again, I don't think there's any mystery
here -- all that's being described is the destructive effects
of a member store on previous stores, in much the same way
that the effects of 'i = 3;' are wiped out by a subsequent 'i = 4;'.
It isn't talking about read access, just stores.

> Those are the citations I can find (if I've missed any I'm sure
> others will point them out).

I've looked fairly carefully, and didn't find any others.
> Their cumulative impression on me is
> that the matter is not settled beyond doubt, but the aforementioned
> angels may see things differently.

Hopefully you're a little closer now to seeing the light.

> As a practical matter, it's not all that important what I think
> or what the angels think, but what the providers of your compilers
> think. [snip]

There always are practical considerations dealing with any C
language question on any compiler. My preference is to disentangle
the two sets of considerations, and work to understand one without
confusing myself thinking about the other. Then, having a thoroughly
considered understanding of questions in one area, that normally
helps make a more informed decision as regards the larger issues.
And I think that is a good course here.

Eric Sosman <> writes:
> [snip]
>
> But back to the `union' issue: I'm still not 100% comfortable
> with the idea of writing to one member and reading another. It sort
> of looks like it should work, but I've not heard a watertight argument
> that it *must* work, even in the face of a ferociously aggressive
> optimizer. I think the "It's legal" faction have found arguments they
> deem satisfactory; perhaps they've looked more diligently than I have.

Questions about optimization are complicated because the rules
regarding effective types (obviously pertinent to optimization)
are subtle. However, the simple cases are not subtle. If we
consider a case like this:

the effective type considerations are quite straightforward,
because all the accesses involved are done using declared types.
There is no doubt that the accesses here meet the requirements of
the effective type rules; so any optimizations, no matter how
aggressive, must be faithful to the defined semantics. (The
example of course assumes that double is 64 bits and uint64_t
is defined.)

Share This Page

Welcome to The Coding Forums!

Welcome to the Coding Forums, the place to chat about anything related to programming and coding languages.

Please join our friendly community by clicking the button below - it only takes a few seconds and is totally free. You'll be able to ask questions about coding or chat with the community and help others.
Sign up now!