On Fri, Jan 17, 2014 at 6:23 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Fri, Jan 17, 2014 at 3:19 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:>> ix86_split_lea_for_addr transforms a single LEA instruction into a series>> of MOV and ADD instructions. For>>>> lea 0x400(%eax, %ecx, 8), %edx>>>> we get>>>> mov %eax, %edx>> add %ecx, %edx>> add %ecx, %edx>> add %ecx, %edx>> add %ecx, %edx>> add %ecx, %edx>> add %ecx, %edx>> add %ecx, %edx>> add %ecx, %edx>> add $0x400, %edx>>>> For -mtune=intel, we want to turn on X86_TUNE_OPT_AGU, but avoid>> ix86_split_lea_for_addr. This patch adds X86_TUNE_AVOID_LEA_FOR_ADDR>> and PROCESSOR_INTEL. We keep PROCESSOR_INTEL the same as>> PROCESSOR_SILVERMONT, except that X86_TUNE_AVOID_LEA_FOR_ADDR isn't>> turned on for PROCESSOR_INTEL. OK for trunk?>> As said earlier, m_INTEL is not a processor, but equals a REAL> processor, so the patch is not acceptable.>
-mtune=intel, similar to -mtune=generic, isn't equal to a single processor.
From invoke.texi:
---
@item intel
Produce code optimized for the most current Intel processors, which are
Haswell and Silvermont for this version of GCC.
---
We don't want -mtune=intel to define __tune_silvermont__ and we
want to generate balanced codes for Haswell and Silvermont.
-mtune=intel started as -mtune=silvermont. I am working on incremental
changes like this to better tune for Haswell without significantly impacting
Silvermont.

On Fri, Jan 17, 2014 at 3:46 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> ix86_split_lea_for_addr transforms a single LEA instruction into a series>>> of MOV and ADD instructions. For>>>>>> lea 0x400(%eax, %ecx, 8), %edx>>>>>> we get>>>>>> mov %eax, %edx>>> add %ecx, %edx>>> add %ecx, %edx>>> add %ecx, %edx>>> add %ecx, %edx>>> add %ecx, %edx>>> add %ecx, %edx>>> add %ecx, %edx>>> add %ecx, %edx>>> add $0x400, %edx>>>>>> For -mtune=intel, we want to turn on X86_TUNE_OPT_AGU, but avoid>>> ix86_split_lea_for_addr. This patch adds X86_TUNE_AVOID_LEA_FOR_ADDR>>> and PROCESSOR_INTEL. We keep PROCESSOR_INTEL the same as>>> PROCESSOR_SILVERMONT, except that X86_TUNE_AVOID_LEA_FOR_ADDR isn't>>> turned on for PROCESSOR_INTEL. OK for trunk?>>>> As said earlier, m_INTEL is not a processor, but equals a REAL>> processor, so the patch is not acceptable.>>>> -mtune=intel, similar to -mtune=generic, isn't equal to a single processor.> From invoke.texi:>> ---> @item intel> Produce code optimized for the most current Intel processors, which are> Haswell and Silvermont for this version of GCC.> --->> We don't want -mtune=intel to define __tune_silvermont__ and we> want to generate balanced codes for Haswell and Silvermont.> -mtune=intel started as -mtune=silvermont. I am working on incremental> changes like this to better tune for Haswell without significantly impacting> Silvermont.
OK, this clarifies the situation.
So, -mtune=generic is too broad, and -mtune=intel is needed, as a
generic tuning for latest Intel processors (note the plural). We want
tuning options that cover Haswell and Silvermont for this version, but
not something that degrades runtime too much (or unnecessarily
increases code size too much).
If this is the case, I agree with the approach.
BTW: There are some ix86_tune == XXX conditions scattered throughout
LEA handling code. Can these be substituted with appropriate TARGET_*
defines?
Uros.

On Fri, Jan 17, 2014 at 7:11 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Fri, Jan 17, 2014 at 3:46 PM, H.J. Lu <hjl.tools@gmail.com> wrote:>>>>> ix86_split_lea_for_addr transforms a single LEA instruction into a series>>>> of MOV and ADD instructions. For>>>>>>>> lea 0x400(%eax, %ecx, 8), %edx>>>>>>>> we get>>>>>>>> mov %eax, %edx>>>> add %ecx, %edx>>>> add %ecx, %edx>>>> add %ecx, %edx>>>> add %ecx, %edx>>>> add %ecx, %edx>>>> add %ecx, %edx>>>> add %ecx, %edx>>>> add %ecx, %edx>>>> add $0x400, %edx>>>>>>>> For -mtune=intel, we want to turn on X86_TUNE_OPT_AGU, but avoid>>>> ix86_split_lea_for_addr. This patch adds X86_TUNE_AVOID_LEA_FOR_ADDR>>>> and PROCESSOR_INTEL. We keep PROCESSOR_INTEL the same as>>>> PROCESSOR_SILVERMONT, except that X86_TUNE_AVOID_LEA_FOR_ADDR isn't>>>> turned on for PROCESSOR_INTEL. OK for trunk?>>>>>> As said earlier, m_INTEL is not a processor, but equals a REAL>>> processor, so the patch is not acceptable.>>>>>>> -mtune=intel, similar to -mtune=generic, isn't equal to a single processor.>> From invoke.texi:>>>> --->> @item intel>> Produce code optimized for the most current Intel processors, which are>> Haswell and Silvermont for this version of GCC.>> --->>>> We don't want -mtune=intel to define __tune_silvermont__ and we>> want to generate balanced codes for Haswell and Silvermont.>> -mtune=intel started as -mtune=silvermont. I am working on incremental>> changes like this to better tune for Haswell without significantly impacting>> Silvermont.>> OK, this clarifies the situation.>> So, -mtune=generic is too broad, and -mtune=intel is needed, as a> generic tuning for latest Intel processors (note the plural). We want> tuning options that cover Haswell and Silvermont for this version, but> not something that degrades runtime too much (or unnecessarily> increases code size too much).
Yes, that is correct.
> If this is the case, I agree with the approach.
I will check it in.
> BTW: There are some ix86_tune == XXX conditions scattered throughout> LEA handling code. Can these be substituted with appropriate TARGET_*> defines?
I have been looking at them closely to check their impacts on
both Haswell and Silvermont. I am planning to keep
the simple LEA -> ADD transformation, but avoid
the complex LEA -> ADD/MOV/SHL transformation.
Thanks.

On Fri, Jan 17, 2014 at 4:17 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> BTW: There are some ix86_tune == XXX conditions scattered throughout>> LEA handling code. Can these be substituted with appropriate TARGET_*>> defines?>> I have been looking at them closely to check their impacts on> both Haswell and Silvermont. I am planning to keep> the simple LEA -> ADD transformation, but avoid> the complex LEA -> ADD/MOV/SHL transformation.
No, I didn't talk about functional change, but about equivalent
TARGET_* define that can be used instead of "(ix86_tune ==
PROCESSOR_SILVERMONT) || (ix86_tune == PROCESSOR_INTEL)".
Uros.