On Wed, Sep 22, 2010 at 03:01:49PM -0700, H.J. Lu wrote:
> 2010-09-22 H.J. Lu <hongjiu.lu@intel.com>> > PR middle-end/45234> * calls.c (expand_call): Don't call check all variable sized> adjustments are multiple of preferred stack boundary after> stack alignment when calling builtin functions.
> --- a/gcc/calls.c> +++ b/gcc/calls.c> @@ -2369,7 +2369,7 @@ expand_call (tree exp, rtx target, int ignore)> > preferred_unit_stack_boundary = preferred_stack_boundary / BITS_PER_UNIT;> > - if (SUPPORTS_STACK_ALIGNMENT)> + if (SUPPORTS_STACK_ALIGNMENT && fndecl && !DECL_IS_BUILTIN (fndecl))> {> /* All variable sized adjustments must be multiple of preferred> stack boundary. Stack alignment may change preferred stack
I'd like to hear explanation on why you think this is the right approach.
What is so special about builtins in this case? If they are to be expanded
as calls instead of expanded inline, I'd say they behave like any other
call.
Isn't the problem instead with nested expand_calls (which is happening
in the testcase that is now failing)? Not sure if we still ever TER const/pure
calls into other call arguments or not[*], so if that is possible for arbitrary
calls, or the problem is just memcpy emitted through
emit_block_move_hints with method BLOCK_OP_CALL_PARM.
I admit I haven't debugged the original PR45234 testcase, so I don't know
why exactly is your patch needed and thus why it isn't needed when expanding
the call param move calls.
[*] In
long long foo (int, int) __attribute__((const));
int bar (int, int, long long, int, int);
int baz (int x)
{
return bar (2, 3, foo (4, 5), 6, 7) + 1;
}
foo is now evaluated before starting the pushes with -m32 -O2 -march=i586.
I vaguely remember const/pure calls used to be TERed at some point, but
perhaps I'm just using a wrong testcase.
Jakub

On Wed, Sep 22, 2010 at 11:56 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Sep 22, 2010 at 03:01:49PM -0700, H.J. Lu wrote:>> 2010-09-22 H.J. Lu <hongjiu.lu@intel.com>>>>> PR middle-end/45234>> * calls.c (expand_call): Don't call check all variable sized>> adjustments are multiple of preferred stack boundary after>> stack alignment when calling builtin functions.>>> --- a/gcc/calls.c>> +++ b/gcc/calls.c>> @@ -2369,7 +2369,7 @@ expand_call (tree exp, rtx target, int ignore)>>>> preferred_unit_stack_boundary = preferred_stack_boundary / BITS_PER_UNIT;>>>> - if (SUPPORTS_STACK_ALIGNMENT)>> + if (SUPPORTS_STACK_ALIGNMENT && fndecl && !DECL_IS_BUILTIN (fndecl))>> {>> /* All variable sized adjustments must be multiple of preferred>> stack boundary. Stack alignment may change preferred stack>> I'd like to hear explanation on why you think this is the right approach.> What is so special about builtins in this case? If they are to be expanded> as calls instead of expanded inline, I'd say they behave like any other> call.>> Isn't the problem instead with nested expand_calls (which is happening> in the testcase that is now failing)? Not sure if we still ever TER const/pure> calls into other call arguments or not[*], so if that is possible for arbitrary> calls, or the problem is just memcpy emitted through> emit_block_move_hints with method BLOCK_OP_CALL_PARM.> I admit I haven't debugged the original PR45234 testcase, so I don't know> why exactly is your patch needed and thus why it isn't needed when expanding> the call param move calls.>> [*] In> long long foo (int, int) __attribute__((const));> int bar (int, int, long long, int, int);> int baz (int x)> {> return bar (2, 3, foo (4, 5), 6, 7) + 1;> }> foo is now evaluated before starting the pushes with -m32 -O2 -march=i586.> I vaguely remember const/pure calls used to be TERed at some point, but> perhaps I'm just using a wrong testcase.>
I didn't find exactly why. From what I can see, builtin functions are
treated differently from the normal ones.

On Thu, Sep 23, 2010 at 01:25:03AM -0700, H.J. Lu wrote:
> > I'd like to hear explanation on why you think this is the right approach.> > What is so special about builtins in this case? If they are to be expanded> > as calls instead of expanded inline, I'd say they behave like any other> > call.> >> > Isn't the problem instead with nested expand_calls (which is happening> > in the testcase that is now failing)? Not sure if we still ever TER const/pure> > calls into other call arguments or not[*], so if that is possible for arbitrary> > calls, or the problem is just memcpy emitted through> > emit_block_move_hints with method BLOCK_OP_CALL_PARM.> > I admit I haven't debugged the original PR45234 testcase, so I don't know> > why exactly is your patch needed and thus why it isn't needed when expanding> > the call param move calls.> >> > [*] In> > long long foo (int, int) __attribute__((const));> > int bar (int, int, long long, int, int);> > int baz (int x)> > {> > return bar (2, 3, foo (4, 5), 6, 7) + 1;> > }> > foo is now evaluated before starting the pushes with -m32 -O2 -march=i586.> > I vaguely remember const/pure calls used to be TERed at some point, but> > perhaps I'm just using a wrong testcase.> >> > I didn't find exactly why. From what I can see, builtin functions are> treated differently from the normal ones.
I've briefly looked at the original PR45234 testcase, and the specifics of
the call are that it increased crtl->preferred_stack_boundary in between
the start of the expand_call function and the if (SUPPORTS_STACK_ALIGNMENT)
block you've added, I'd quite understand we need to align the stack
sufficiently in that case somewhere. The question is where exactly and by
what exact value. On the currently failing testcase obviously no increase
in crtl->preferred_stack_boundary for the nested memcpys happens, so if that
could be used as a condition, that testcase would work. I guess what should
be investigated is where the nested memcpy calls on that testcase actually
do align the stack properly (and how), because it looks like they all have
%esp % 16 == 0 on the call insn.
Jakub

On Thu, Sep 23, 2010 at 01:25:03AM -0700, H.J. Lu wrote:
> I didn't find exactly why. From what I can see, builtin functions are> treated differently from the normal ones.
I have looked into it some more and I believe the PR45234 "fix" is just
completely wrong, the problem is elsewhere (allocate_dynamic_stack_space)
and there are actually many issues in there.
So, IMNSHO that calls.c change should be reverted.
The problems I see in allocate_dynamic_stack_space for
SUPPORTS_STACK_ALIGNMENT targets are:
1) In
/* We can't attempt to minimize alignment necessary, because we don't
know the final value of preferred_stack_boundary yet while executing
this code. */
crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
I'm afraid this can actually decrease crtl->preferred_stack_boundary
if it has been increased above PREFERRED_STACK_BOUNDARY already earlier.
IMNSHO, if it needs to be done, it should be
if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
2) I'd say MUST_ALIGN macro could use
crtl->preferred_stack_boundary instead of PREFERRED_STACK_BOUNDARY,
if we already know we want to align stack more than
PREFERRED_STACK_BOUNDARY, we wouldn't need to try to align the result.
3) Most importantly, in
if (!known_align_valid || known_align % PREFERRED_STACK_BOUNDARY != 0)
{
size = round_push (size);
if (flag_stack_usage)
{
int align = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;
stack_usage_size = (stack_usage_size + align - 1) / align * align;
}
}
I think the first % PREFERRED_STACK_BOUNDARY should be
% MAX_SUPPORTED_STACK_ALIGNMENT, because crtl->preferred_stack_boundary
can always grow afterwards, and round_push if
crtl->preferred_stack_boundary is smaller than
MAX_SUPPORTED_STACK_ALIGNMENT should as align in bytes use a new
virtual register which would be replaced by constant equal to the actual
final crtl->preferred_stack_boundary (and let cse/combiner optimize
all the calculations up). For flag_stack_usage, I'm afraid it can't
be computed reliably.
4) When this function uses anti_adjust_stack_and_probe or anti_adjust_stack,
it should IMNSHO ensure that stack_pointer_delta isn't modified
even when size is constant (e.g. by saving/restoring the value around
it). The constant allocas should have the same properties as dynamic
allocas. For i?86/x86_64 with the change in 3) constants won't happen
any longer, but for other targets. For the future, it would be nice
if we had some REG note for al the alloca sp adjustments, because e.g.
the unwinding code is confused by the constant size allocas e.g. when
computing DW_OP_GNU_args_size.
5) == CONST_INT and == REG should be replaced with CONST_INT_P and REG_P
The alternative to this would be to prescan the trees right before actual
expansion and (conservatively) guess what the stack alignment will actually
need to be (e.g. take into account call argument alignments, etc.), and
don't allow crtl->preferred_stack_boundary etc. actually grow during the
expansion.
I'm attaching a testcase that shows that non-constant alloca has exactly
the same issue as the original PR testcase, but the "fix" obviously can't do
anything about it.
Jakub
/* PR middle-end/45234 */
/* { dg-do run { target { { i?86-*-* x86_64-*-* } && ilp32 } } } */
/* { dg-options "-mincoming-stack-boundary=2 -mpreferred-stack-boundary=2" } */
#include "check.h"
void
__attribute__ ((noinline))
bar (__float128 f)
{
check (&f, __alignof__(f));
}
volatile int z = 6;
int
main (void)
{
char *p = __builtin_alloca (z);
bar (0);
__builtin_strncpy (p, "good", 5);
if (__builtin_strncmp (p, "good", 5) != 0)
{
#ifdef DEBUG
p[z - 1] = '\0';
printf ("Failed: %s != good\n", p);
#endif
abort ();
}
return 0;
}