GCC aliasing rules: more aggressive than C99?

From: Joshua Haberman <jhaberman at gmail dot com>

To: gcc at gcc dot gnu dot org

Date: Sun, 3 Jan 2010 05:46:48 +0000 (UTC)

Subject: GCC aliasing rules: more aggressive than C99?

The aliasing policies that GCC implements seem to be more strict than
what is in the C99 standard. I am wondering if this is true or whether
I am mistaken (I am not an expert on the standard, so the latter is
definitely possible).
The relevant text is:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
* a type compatible with the effective type of the object,
[...]
* an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
To me this allows the following:
int i;
union u { int x; } *pu = (union u*)&i;
printf("%d\n", pu->x);
In this example, the object "i", which is of type "int", is having its
stored value accessed by an lvalue expression of type "union u", which
includes the type "int" among its members.
I have seen other articles that interpret the standard in this way.
See section "Casting through a union (2)" from this article, which
claims that casts of this sort are legal and that GCC's warnings
against them are false positives:
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
However, this appears to be contrary to GCC's documentation. From the
manpage:
Similarly, access by taking the address, casting the resulting
pointer and dereferencing the result has undefined behavior, even
if the cast uses a union type, e.g.:
int f() {
double d = 3.0;
return ((union a_union *) &d)->i;
}
I have also been able to experimentally verify that GCC will mis-compile
this fragment if we expect the behavior the standard specifies:
int g;
struct A { int x; };
int foo(struct A *a) {
if(g) a->x = 5;
return g;
}
With GCC 4.3.3 -O3 on x86-64 (Ubuntu), g is only loaded once:
0000000000000000 <foo>:
0: 8b 05 00 00 00 00 mov eax,DWORD PTR [rip+0x0] # 6 <foo+0x6>
6: 85 c0 test eax,eax
8: 74 06 je 10 <foo+0x10>
a: c7 07 05 00 00 00 mov DWORD PTR [rdi],0x5
10: f3 c3 repz ret
But this is incorrect if foo() was called as:
foo((struct A*)&g);
Here is another example:
struct A { int x; };
struct B { int x; };
int foo(struct A *a, struct B *b) {
if(a->x) b->x = 5;
return a->x;
}
When I compile this, a->x is only loaded once, even though foo()
could have been called like this:
int i;
foo((struct A*)&i, (struct B*)&i);
>From this I conclude that GCC diverges from the standard, in that it does not
allow casts of this sort. In one sense this is good (because the policy GCC
implements is more aggressive, and yet still reasonable) but on the other hand
it means (if I am not mistaken) that GCC will incorrectly optimize strictly
conforming programs.
Clarifications are most welcome!
Josh