I've only been reading this list in batches recently, so this reply
has quotes from several separate messages, sorry.
First some general comments:
- Explicit relocs for n64 non-PIC weren't implemented before the gcc 3.4
feature freeze. gcc 3.4.x therefore still uses symbolic addresses when
compiling a 64-bit kernel.
- gcc 4.0 knows how to generate explicit relocs for n64 non-PIC and
(by default) will use them instead of symbolic addresses.
- As Thiemo says, there as talk of a -msym32 option (more below), but it
hasn't been implemented yet. This means that if you want to the use
the 2-instruction dla hack, you'll need to use -mno-explicit-relocs
when compiling with 4.0. Don't count on that option being around
forever though!
"Maciej W. Rozycki" <macro@linux-mips.org> writes:
> On Wed, 1 Dec 2004, Ralf Baechle wrote:
>> this problem here is specific to inline assembler. The splitlock code for
>> a reasonable CPU is:
>>
>> static __inline__ void atomic_add(int i, atomic_t * v)
>> {
>> unsigned long temp;
>>
>> __asm__ __volatile__(
>> "1: ll %0, %1 # atomic_add \n"
>> " addu %0, %2 \n"
>> " sc %0, %1 \n"
>> " beqz %0, 1b \n"
>> : "=&r" (temp), "=m" (v->counter)
>> : "Ir" (i), "m" (v->counter));
>> }
>>
>> For the average atomic op generated code is going to look about like:
>>
>> 80100634: lui a0,0x802c
>> 80100638: ll a0,-24160(a0)
>> 8010063c: addu a0,a0,v0
>> 80100640: lui at,0x802c
>> 80100644: addu at,at,v1
>> 80100648: sc a0,-24160(at)
>> 8010064c: beqz a0,80100634 <init+0x194>
>> 80100650: nop
>>
>> It's significantly worse for 64-bit due to the excessive code sequence
>> generated for loading a 64-bit address. One outside CKSEGx that is.
>
> Only for old compilers. For current (>= 3.4) ones you can use the "R"
> constraint and get exactly what you need.
Right. IMO, this is exactly the right fix. It should be backward
compatible with old toolchains too.
FYI, the 'R' constraint has been kept around specifically for inline asms.
gcc itself no longer uses it.
"Maciej W. Rozycki" <macro@linux-mips.org> writes:
> On Wed, 1 Dec 2004, Ralf Baechle wrote:
>> On 64-bit the savings would be even more significant. But what we actually
>> want would be using the "o" constraint. Which just at least on the
>> compilers where I've tried it, didn't produce code any different from "m".
>
> No surprise as the "o" constraint doesn't mean anything particular for
> MIPS. All addresses are offsettable -- there is no addressing mode that
> would preclude it, so "o" is exactly the same as "m".
Right!
Thiemo Seufer <ica2_ts@csv.ica.uni-stuttgart.de> writes:
> Current 64bit MIPS kernels run in (C)KSEG0, and exploit sign-extension
> to optimize symbol loads (2 instead of 6/7 instructions, the same as in
> 32bit kernels). This optimization relies on an assembler macro
> expansion mode which was hacked in gas for exactly this purpose. Gcc
> currently doesn't have something similiar, and would try to do a regular
> 64bit load with explicit relocs.
>
> I discussed this with Richard Sandiford a while ago, and the conclusion
> was to implement an explicit --msym32 option for both gcc and gas to
> improve register scheduling and get rid of the gas hack. So far, nobody
> came around to actually do the work for it.
True. FWIW, it's trivial to add this option to gcc. As far as I remember,
the stumbling block was whether we should mark the objects in some way,
and whether the linker ought to check for overflow.
Richard