What do we know about inlining except that the compiler
will do it when it feels so?
Is there a guarantee that a simple macro-like definition like
void MemClear(char *p,int size)
{
memset(p,0,size);
}
will be inlined? What if this goes through multiple levels?
--
Helmut Leitner leitner hls.via.at
Graz, Austria www.hls-software.com

afaik, it is entirely up to the compiler, which is where it should be in
almost all cases.
I think I remember there being discussion about the use of the inline
keyword as something to _force_ the compiler to inline, which I kind of
like, but maybe using that keyword is bad, since all the C++ programmers
will use it everywhere, which may not be appropriate.
force_inline or forceinline might be better, as they're uglier, or even
forceinline
{
void MemClear(char *p,int size)
{
memset(p,0,size);
}
}
which would be unambiguous and quite obvious
"Helmut Leitner" <helmut.leitner chello.at> wrote in message
news:3EA3AEB3.DDC28183 chello.at...

What do we know about inlining except that the compiler
will do it when it feels so?
Is there a guarantee that a simple macro-like definition like
void MemClear(char *p,int size)
{
memset(p,0,size);
}
will be inlined? What if this goes through multiple levels?
--
Helmut Leitner leitner hls.via.at
Graz, Austria www.hls-software.com

Why not inline(always), inline(prefer), inline(never),
inline(SomeConstantComparedToStandardizedInlinabilityIndex)?
Like the way version already works?
Matthew Wilson wrote:

afaik, it is entirely up to the compiler, which is where it should be in
almost all cases.
I think I remember there being discussion about the use of the inline
keyword as something to _force_ the compiler to inline, which I kind of
like, but maybe using that keyword is bad, since all the C++ programmers
will use it everywhere, which may not be appropriate.
force_inline or forceinline might be better, as they're uglier, or even
forceinline
{
void MemClear(char *p,int size)
{
memset(p,0,size);
}
}
which would be unambiguous and quite obvious

Why not inline(always), inline(prefer), inline(never),
inline(SomeConstantComparedToStandardizedInlinabilityIndex)?
Like the way version already works?
Matthew Wilson wrote:

afaik, it is entirely up to the compiler, which is where it should be in
almost all cases.
I think I remember there being discussion about the use of the inline
keyword as something to _force_ the compiler to inline, which I kind of
like, but maybe using that keyword is bad, since all the C++ programmers
will use it everywhere, which may not be appropriate.
force_inline or forceinline might be better, as they're uglier, or even
forceinline
{
void MemClear(char *p,int size)
{
memset(p,0,size);
}
}
which would be unambiguous and quite obvious

What do we know about inlining except that the compiler
will do it when it feels so?
Is there a guarantee that a simple macro-like definition like
void MemClear(char *p,int size)
{
memset(p,0,size);
}
will be inlined? What if this goes through multiple levels?

Think of inlining like the obsolete register keyword in C.
Whether obvious inlining is done or not is a quality of implementation
issue, not a language issue.

Think of inlining like the obsolete register keyword in C.
Whether obvious inlining is done or not is a quality of implementation
issue, not a language issue.

It'd still be nice to have a way of explicitly saying that a function
either must or must not be inlined. For example, the dynamic linker
in the GNU libc will break if certain functions are not inlined,
because the relocation has not yet been done. The schedule()
function in Linux will break on sparc (and perhaps some other
platforms) if it is inlined, if you switch to a task that entered the
scheduler via a different containing function.
As for the analogy with the register keyword, GCC extends that to
allow you to explicitly place variables in specific registers, which
is useful in conjunction with assembly code. The uselessness of the
original keyword does not mean that anything similar is also useless.
Neither of these are things you'd need very often, but when you do,
it'd be really unpleasant if they weren't there. After all, D
claims to support "Down and dirty programming". :-)
-Scott

Who says the register keyword is useless?
I remember some case of some guys using a fairly recent GCC, where they
could raise performance by 20% by putting in the simple register hint in
a couple of spots. While the compilers are getting smart, they don't
know anything particular about the program's typical input values, as
the programmer usually does.
-i.
Scott Wood wrote:

As for the analogy with the register keyword, GCC extends that to
allow you to explicitly place variables in specific registers, which
is useful in conjunction with assembly code. The uselessness of the
original keyword does not mean that anything similar is also useless.
Neither of these are things you'd need very often, but when you do,
it'd be really unpleasant if they weren't there. After all, D
claims to support "Down and dirty programming". :-)
-Scott

Think of inlining like the obsolete register keyword in C.
Whether obvious inlining is done or not is a quality of implementation
issue, not a language issue.

It'd still be nice to have a way of explicitly saying that a function
either must or must not be inlined. For example, the dynamic linker
in the GNU libc will break if certain functions are not inlined,
because the relocation has not yet been done. The schedule()
function in Linux will break on sparc (and perhaps some other
platforms) if it is inlined, if you switch to a task that entered the
scheduler via a different containing function.

I suspect those functions are heavilly dependent on how a *particular*
compiler generates code for that. Depending on that is going outside of the
language definition. It makes successful operation of the code overly
sensitive to particular compiler versions, etc. (Some linux kernel
developers are open about the kernel code being heavilly dependent on how a
particular revision of GCC generates code.) You could as easilly write code
in D that depends on a particular implementation of D, though with D's
support for inline assembler I'd argue that is unnecessary.

As for the analogy with the register keyword, GCC extends that to
allow you to explicitly place variables in specific registers, which
is useful in conjunction with assembly code. The uselessness of the
original keyword does not mean that anything similar is also useless.

Those features are not part of the C language; although they are part of
GCC, they will not work with every version of GCC, and will not work with
any other C compiler. Contrast that with D, which has defined support for
inline assembler. Try doing some inline assembler work in GCC, then with D.
I think you'll find it supported far better in D, despite GCC's extensions.

Neither of these are things you'd need very often, but when you do,
it'd be really unpleasant if they weren't there. After all, D
claims to support "Down and dirty programming". :-)

Those things are what the inline assembler is for, and D has very strong
support for inline assembler. The C language itself has no support at all
for inline assembler, and GCC's support for it is very weak and error-prone
(for example, there's an arcane syntax you have to add to say which
registers were read and which were written by each asm block - get that
wrong, and your code will behave unpredictably. D, on the other hand, keeps
track of that automatically).

It'd still be nice to have a way of explicitly saying that a function
either must or must not be inlined. For example, the dynamic linker
in the GNU libc will break if certain functions are not inlined,
because the relocation has not yet been done. The schedule()
function in Linux will break on sparc (and perhaps some other
platforms) if it is inlined, if you switch to a task that entered the
scheduler via a different containing function.

I suspect those functions are heavilly dependent on how a *particular*
compiler generates code for that.

Not particularly, at least in the case of the scheduler. The
scheduler's only concern with inlining is that it the destination
thread doesn't resume in the wrong inlined instance. The inline
assembly is non-portable as well, but only because inline assembly is
not part of C.

Depending on that is going outside of the language definition.

That depends on what the language definition is. :-)

It makes successful operation of the code overly sensitive to
particular compiler versions, etc. (Some linux kernel developers
are open about the kernel code being heavilly dependent on how a
particular revision of GCC generates code.)

Some bits have been, but it's mainly been due to Linux developers
ignoring GCC's own rules for things like inline assembly constraints,
or making assumptions about weird stuff like "inline" assembly
outside of any function.

Those things are what the inline assembler is for, and D has very strong
support for inline assembler.

How do you use the inline assembler to tell the compiler not to
inline a certain function written in D, not assembly?

The C language itself has no support at all
for inline assembler, and GCC's support for it is very weak and error-prone
(for example, there's an arcane syntax you have to add to say which
registers were read and which were written by each asm block - get that
wrong, and your code will behave unpredictably. D, on the other hand, keeps
track of that automatically).

Is there a way in D inline assembly to ask for a temporary register
without mandating a specific one? How about specifying clobbers that
aren't explicitly in the code, such as when calling a function with
an unusual calling convention, or when switching threads?
Also, one of the example code sequences is this:
void *pc;
asm
{
call L1 ;
L1: ;
pop EBX ;
mov pc[EBP],EBX ; // pc now points to code at L1
}
Why do you need to specify EBP when accessing pc? Shouldn't the
compiler know what the best way to access pc is? It might want to
get rid of the frame pointer, or it might want to keep it around in a
register for use after the asm block, etc.
GCC's inline assembly also has the sometimes desirable attribute
that the compiler doesn't touch the instructions you specify, other
than to schedule the block and substitute the things you asked it to.
Will a D compiler be allowed to stick code in the middle of it, in
order to satisfy symbolic references, or to schedule instructions?
Is it allowed to optimize away mov instructions if it can get the
data there on its own? Can it move memory accesses across the asm
block?
Usually, those sorts of things would be beneficial, but there should
be a way to tell it not to do it.
-Scott

Some bits have been, but it's mainly been due to Linux developers
ignoring GCC's own rules for things like inline assembly constraints,
or making assumptions about weird stuff like "inline" assembly
outside of any function.

Those things are what the inline assembler is for, and D has very strong
support for inline assembler.

How do you use the inline assembler to tell the compiler not to
inline a certain function written in D, not assembly?

The compiler does not optimize inline assembly that you write. Therefore, if
you use the inline assembler to call a function, that function won't be
inlined.

The C language itself has no support at all
for inline assembler, and GCC's support for it is very weak and

error-prone

(for example, there's an arcane syntax you have to add to say which
registers were read and which were written by each asm block - get that
wrong, and your code will behave unpredictably. D, on the other hand,

keeps

track of that automatically).

Is there a way in D inline assembly to ask for a temporary register
without mandating a specific one?

No. The idea is "what you write is what you get" with the inline assembler.

How about specifying clobbers that
aren't explicitly in the code, such as when calling a function with
an unusual calling convention, or when switching threads?

Called functions must follow the normal register saving convention. If it is
an unusual function that clobbers other registers, you'll need to
save/restore them in the inline assembler.

Also, one of the example code sequences is this:
void *pc;
asm
{
call L1 ;
L1: ;
pop EBX ;
mov pc[EBP],EBX ; // pc now points to code at L1
}
Why do you need to specify EBP when accessing pc? Shouldn't the
compiler know what the best way to access pc is? It might want to
get rid of the frame pointer, or it might want to keep it around in a
register for use after the asm block, etc.

The compiler doesn't do frame pointer optimization when the inline assembler
is used, because the results of the inline assembler shouldn't be affected
by whether optimization is on or off. If you want, though, you can use the
'naked' pseudo-op and write the entire function in assembler, and what you
write is what you get.

GCC's inline assembly also has the sometimes desirable attribute
that the compiler doesn't touch the instructions you specify, other
than to schedule the block and substitute the things you asked it to.
Will a D compiler be allowed to stick code in the middle of it, in
order to satisfy symbolic references, or to schedule instructions?
Is it allowed to optimize away mov instructions if it can get the
data there on its own? Can it move memory accesses across the asm
block?

The D compiler does not schedule, move around, optimize, or alter the inline
assembler instructions. The assumption is that if the programmer is going to
use inline assembler, the programmer knows exactly what he wants, and will
write it that way. What you write is what you get.

Usually, those sorts of things would be beneficial, but there should
be a way to tell it not to do it.

I guess I'm philosophically opposed to such things. I much prefer the
straightforward approach of inline assembler that what you write is what you
get. I also find it odd that gcc provides such things, yet still requires me
to specify which registers were read/written for the simplest inline asm.

The compiler does not optimize inline assembly that you write. Therefore, if
you use the inline assembler to call a function, that function won't be
inlined.

I suppose, though it'd be a little awkward to use the assembler just
to call a function without it being inlined.
Still, I'm a bit uncomfortable with the idea that the compiler's
always right and cannot be corrected, even explicitly. I've seen GCC
silently decide not to inline a function (on which inlining was
requested) because it was "too big", even though it was just a large
switch statement on a constant, which ended up being one or two
instructions after optimization. Given that no compiler is going to
make the right choice all the time, it's nice to be able to declare
one's intent when there's a clear reason to do so.

How about specifying clobbers that
aren't explicitly in the code, such as when calling a function with
an unusual calling convention, or when switching threads?

Called functions must follow the normal register saving convention. If it is
an unusual function that clobbers other registers, you'll need to
save/restore them in the inline assembler.

Which would defeat the purpose of using a special convention. For
example, on a mutex implementation, one might want to make the
contented case call a function that saves all registers, so that
the common case doesn't have to spill any registers (other than
whatever's need to test the mutex).
Thread switching would also be slower on architectures with a
reasonable number of registers if you have to manually save all of
them just because you can't tell the compiler to save (or
reconstruct) the 2 or 3 it might still care about.
BTW, will there be any way to tell the inline assembler to put some
code out-of-line? Something like:
inline int lock_mutex(Mutex m)
{
int new = whatever_goes_in_there;
asm {
eax = 0; /* This tells the compiler to get a zero into eax,
in whatever way it chooses. Maybe the caller
(which is inlining this function) had one lying
around in a register, and it can now choose to use
eax for that variable. */
lock; cmpxchg [m.lock], new;
jz failed;
outofline {
failed: /* I hope this label isn't visible outside of this
instantiation of this assembly block... */
push ecx;
push edx;
call handle_failed;
pop edx;
pop ecx;
return; /* This tells the compiler to exit the assembly
block. Alternatively, a return label could
be declared. */
}
/* Tell the compiler that these registers were not, in fact,
clobbered. It can't assume it automatically, though, since
it has no idea what handle_failed might be doing to those
values on the stack. Or, to save space, I may have buried
those pushes into a wrapper assembly function instead,
where the compiler probably won't see them. */
noclobber ecx, edx;
/* Tell the compiler that, since this thing acts as a mutex,
no memory accesses can be reordered across it. It's
probably not necessary in this case, though, as it contains
a function call. */
clobber memory;
}
}

Also, one of the example code sequences is this:
void *pc;
asm
{
call L1 ;
L1: ;
pop EBX ;
mov pc[EBP],EBX ; // pc now points to code at L1
}
Why do you need to specify EBP when accessing pc? Shouldn't the
compiler know what the best way to access pc is? It might want to
get rid of the frame pointer, or it might want to keep it around in a
register for use after the asm block, etc.

The compiler doesn't do frame pointer optimization when the inline assembler
is used, because the results of the inline assembler shouldn't be affected
by whether optimization is on or off.

But it wouldn't affect the results, if the compiler handles the
assignment to pc rather than the programmer. And what if I move to a
compiler that *never* uses frame pointers? The code is now broken,
because I had to make an assumption about what the compiler was doing
with its registers.
Plus, pc is probably going to be used soon after the asm block; why
force it onto the stack and then back?

If you want, though, you can use the
'naked' pseudo-op and write the entire function in assembler, and what you
write is what you get.

Yes, but you can get that by using an external assembler as well.
The point of inline assembly is to, well, be inline. :-)

The D compiler does not schedule, move around, optimize, or alter the inline
assembler instructions. The assumption is that if the programmer is going to
use inline assembler, the programmer knows exactly what he wants, and will
write it that way. What you write is what you get.

The problem is that the programmer can't know exactly what he wants,
without knowing some decisions that the compiler will make. GCC's
syntax allows the programmer to tell the compiler exactly where to
substitute those decisions. Removing the ability of the compiler to
make the decisions will lead to slower code.

I guess I'm philosophically opposed to such things. I much prefer the
straightforward approach of inline assembler that what you write is what you
get. I also find it odd that gcc provides such things, yet still requires me
to specify which registers were read/written for the simplest inline asm.

It's not really that odd, seeing as it needs those features to make
up for its inability to parse the assembly code itself. However,
those features end up granting the programmer more power than what
they replace.
-Scott

On Wed, 7 May 2003 11:11:40 -0700, Walter <walter digitalmars.com> wrote:
I suppose, though it'd be a little awkward to use the assembler just
to call a function without it being inlined.

I'd agree with that.

Still, I'm a bit uncomfortable with the idea that the compiler's
always right and cannot be corrected, even explicitly. I've seen GCC
silently decide not to inline a function (on which inlining was
requested) because it was "too big", even though it was just a large
switch statement on a constant, which ended up being one or two
instructions after optimization. Given that no compiler is going to
make the right choice all the time, it's nice to be able to declare
one's intent when there's a clear reason to do so.

I think that comes with the territory of using a high level language. If a
particular routine is a major bottleneck in your program (and it does
usually come down to one!), and you want to make the effort to tune it to
the max, write it in inline assembler.

How about specifying clobbers that
aren't explicitly in the code, such as when calling a function with
an unusual calling convention, or when switching threads?

Called functions must follow the normal register saving convention. If

it is

an unusual function that clobbers other registers, you'll need to
save/restore them in the inline assembler.

Which would defeat the purpose of using a special convention. For
example, on a mutex implementation, one might want to make the
contented case call a function that saves all registers, so that
the common case doesn't have to spill any registers (other than
whatever's need to test the mutex).
Thread switching would also be slower on architectures with a
reasonable number of registers if you have to manually save all of
them just because you can't tell the compiler to save (or
reconstruct) the 2 or 3 it might still care about.
BTW, will there be any way to tell the inline assembler to put some
code out-of-line? Something like:
inline int lock_mutex(Mutex m)
{
int new = whatever_goes_in_there;
asm {
eax = 0; /* This tells the compiler to get a zero into eax,
in whatever way it chooses. Maybe the caller
(which is inlining this function) had one lying
around in a register, and it can now choose to use
eax for that variable. */

The Digital Mars C++ compiler can do this, but after having that capability
for 15 years it just never proved out to be very useful.

}
/* Tell the compiler that these registers were not, in fact,
clobbered. It can't assume it automatically, though, since
it has no idea what handle_failed might be doing to those
values on the stack. Or, to save space, I may have buried
those pushes into a wrapper assembly function instead,
where the compiler probably won't see them. */
noclobber ecx, edx;

That might be a reasonable addition.

/* Tell the compiler that, since this thing acts as a mutex,
no memory accesses can be reordered across it. It's
probably not necessary in this case, though, as it contains
a function call. */
clobber memory;

Unnecessary, as the inline assembler assumes memory is clobbered.

}
}

Also, one of the example code sequences is this:
void *pc;
asm
{
call L1 ;
L1: ;
pop EBX ;
mov pc[EBP],EBX ; // pc now points to code at L1
}
Why do you need to specify EBP when accessing pc? Shouldn't the
compiler know what the best way to access pc is? It might want to
get rid of the frame pointer, or it might want to keep it around in a
register for use after the asm block, etc.

The compiler doesn't do frame pointer optimization when the inline

assembler

is used, because the results of the inline assembler shouldn't be

affected

by whether optimization is on or off.

But it wouldn't affect the results, if the compiler handles the
assignment to pc rather than the programmer. And what if I move to a
compiler that *never* uses frame pointers? The code is now broken,
because I had to make an assumption about what the compiler was doing
with its registers.

When using inline asm, you'll always run the risk of nonportability between
compilers - after all, things like register conventions, calling
conventions, etc., are not defined by the language. Only the syntax of the
inline assembler is.

Plus, pc is probably going to be used soon after the asm block; why
force it onto the stack and then back?

Because the inline assembler assembles the code long before any register
assignments are done.

If you want, though, you can use the
'naked' pseudo-op and write the entire function in assembler, and what

you

write is what you get.

Yes, but you can get that by using an external assembler as well.
The point of inline assembly is to, well, be inline. :-)

I'm currently porting D to linux. Believe me, the inline assembler is a
great boon to that. Just try converting MASM files to gas files! To me,
using gas is like trying to write code looking in a mirror.

The D compiler does not schedule, move around, optimize, or alter the

inline

assembler instructions. The assumption is that if the programmer is

going to

use inline assembler, the programmer knows exactly what he wants, and

will

write it that way. What you write is what you get.

The problem is that the programmer can't know exactly what he wants,
without knowing some decisions that the compiler will make. GCC's
syntax allows the programmer to tell the compiler exactly where to
substitute those decisions. Removing the ability of the compiler to
make the decisions will lead to slower code.

You are correct in the abstract. In my experience, I believe the difference
to be negligible. I profile code extensively to make it faster. The
bottlenecks turn out to be maybe 30 lines of code out of a few thousand.
Those I just write completely in hand-tuned inline assembler.

I guess I'm philosophically opposed to such things. I much prefer the
straightforward approach of inline assembler that what you write is what

you

get. I also find it odd that gcc provides such things, yet still

requires me

to specify which registers were read/written for the simplest inline

asm.

It's not really that odd, seeing as it needs those features to make
up for its inability to parse the assembly code itself. However,
those features end up granting the programmer more power than what
they replace.

I understand what you're driving at. It is heavilly integrated in with how
gcc parses, optimizes, and generates code. I don't think that's a good thing
to put in a language spec, as it may unnecessarilly constrain how the
compiler is built.

return; /* This tells the compiler to exit the assembly
block. Alternatively, a return label could
be declared. */

Exit the assembly block? I don't know what you mean by that.

If that means what I think is intended, should 'break' be more
approprate?

}
/* Tell the compiler that these registers were not, in fact,
clobbered. It can't assume it automatically, though, since
it has no idea what handle_failed might be doing to those
values on the stack. Or, to save space, I may have buried
those pushes into a wrapper assembly function instead,
where the compiler probably won't see them. */
noclobber ecx, edx;

Still, I'm a bit uncomfortable with the idea that the compiler's
always right and cannot be corrected, even explicitly. I've seen GCC
silently decide not to inline a function (on which inlining was
requested) because it was "too big", even though it was just a large
switch statement on a constant, which ended up being one or two
instructions after optimization. Given that no compiler is going to
make the right choice all the time, it's nice to be able to declare
one's intent when there's a clear reason to do so.

I think that comes with the territory of using a high level language. If a
particular routine is a major bottleneck in your program (and it does
usually come down to one!), and you want to make the effort to tune it to
the max, write it in inline assembler.

Except that in this case, using inline assembler would have made it
worse. The code was expecting to have the switch(constant) optimized
away to just the relevant case. Writing the containing functions in
assembly would not have been realistic, as the to-be-inlined
functions were used all over the source tree (they were used to move
data to/from userspace).

eax = 0; /* This tells the compiler to get a zero into eax,
in whatever way it chooses. Maybe the caller
(which is inlining this function) had one lying
around in a register, and it can now choose to use
eax for that variable. */

The Digital Mars C++ compiler can do this, but after having that capability
for 15 years it just never proved out to be very useful.

It's a pretty small gain in this case, but what if it were a
non-constant, that is almost guaranteed to be in some register before
the asm statement?

Yes, it is visible outside. All labels are in one scope per function,
including the inline asm labels.

I was more worried about it being visible throughout the file (or
caller of the inline function), like it would have been in GCC, since
there's no support for find-the-first-one-in-a-given-direction
labels.

Just a shortcut for declaring a new label at the end and branching
there, which is a rather common construct (especially when using
out-of-line sections). I agree with "C" that break would be a better
keyword, though.

/* Tell the compiler that, since this thing acts as a mutex,
no memory accesses can be reordered across it. It's
probably not necessary in this case, though, as it contains
a function call. */
clobber memory;

Unnecessary, as the inline assembler assumes memory is clobbered.

It'd be nice if the language didn't force the compiler to do this in
all cases, though. For instance, it's not necessary when just
reading timestamps, or making use of some fancy computational
instruction for which the compiler doesn't have an intrinsic, or as a
touch-up in a critical function that the compiler doesn't optimize
well enough. At the very least, "noclobber memory" should exist, but
a compiler should also be allowed to look for itself. If the
compiler doesn't support this, it could always fall back on assuming
"clobber memory" for everything.

When using inline asm, you'll always run the risk of nonportability between
compilers - after all, things like register conventions, calling
conventions, etc., are not defined by the language. Only the syntax of the
inline assembler is.

But would it not be better to reduce the potential sources of
nonportability, by letting the programmer tell the compiler to handle
certain details? If the compiler can know the offset from EBP at
assembly time, it presumably knows that it's on the stack, and thus
that it should index off of EBP.

Plus, pc is probably going to be used soon after the asm block; why
force it onto the stack and then back?

Because the inline assembler assembles the code long before any register
assignments are done.

That's a compiler implementation detail. Other compilers might not
have that restriction (for example, they may allow the registers to
be patched into the assembled code later on, or use an external
assembler).
If the compiler has to choose registers for the asm block in advance,
it could just add the store instruction itself at the time it handles
the inline assembly (in which case you get exactly the same code as
you do now), or it could remember which register the asm block used
and use that in the subsequent non-asm code.

The problem is that the programmer can't know exactly what he wants,
without knowing some decisions that the compiler will make. GCC's
syntax allows the programmer to tell the compiler exactly where to
substitute those decisions. Removing the ability of the compiler to
make the decisions will lead to slower code.

You are correct in the abstract. In my experience, I believe the difference
to be negligible. I profile code extensively to make it faster. The
bottlenecks turn out to be maybe 30 lines of code out of a few thousand.
Those I just write completely in hand-tuned inline assembler.

It's a little harder when it's 30,000 lines out of a few million, and
most of that needs to stay portable, so any assembler has to be
buried in separate inline functions. In any case, I don't think the
language should throw away the opportunity for such optimizations
just because they don't help the majority of programs. The compiler
is free to not implement them if it doesn't feel they're important.
-Scott

Still, I'm a bit uncomfortable with the idea that the compiler's
always right and cannot be corrected, even explicitly. I've seen GCC
silently decide not to inline a function (on which inlining was
requested) because it was "too big", even though it was just a large
switch statement on a constant, which ended up being one or two
instructions after optimization. Given that no compiler is going to
make the right choice all the time, it's nice to be able to declare
one's intent when there's a clear reason to do so.

I think that comes with the territory of using a high level language. If

a

particular routine is a major bottleneck in your program (and it does
usually come down to one!), and you want to make the effort to tune it

to

the max, write it in inline assembler.

Except that in this case, using inline assembler would have made it
worse. The code was expecting to have the switch(constant) optimized
away to just the relevant case. Writing the containing functions in
assembly would not have been realistic, as the to-be-inlined
functions were used all over the source tree (they were used to move
data to/from userspace).

I see the inline/not inline as a quality of implementation issue. The
language design should specify semantics, and the semantics should not
change if something is inlined or not. I want to allow the compiler writer
to be as free as possible to innovate how D is implemented. Trying to
specify exactly what optimizations are performed in the language spec can
forestall that. Note that DMD has a compiler switch to turn inlining on or
off.

eax = 0; /* This tells the compiler to get a zero into eax,
in whatever way it chooses. Maybe the caller
(which is inlining this function) had one lying
around in a register, and it can now choose to use
eax for that variable. */

The Digital Mars C++ compiler can do this, but after having that

capability

for 15 years it just never proved out to be very useful.

It's a pretty small gain in this case, but what if it were a
non-constant, that is almost guaranteed to be in some register before
the asm statement?

It's not worth it. I have a lot of practice writing fast applications (DMC
is the fastest compiler, and has been for 15 years).

/* Tell the compiler that, since this thing acts as a mutex,
no memory accesses can be reordered across it. It's
probably not necessary in this case, though, as it contains
a function call. */
clobber memory;

Unnecessary, as the inline assembler assumes memory is clobbered.

It'd be nice if the language didn't force the compiler to do this in
all cases, though. For instance, it's not necessary when just
reading timestamps, or making use of some fancy computational
instruction for which the compiler doesn't have an intrinsic, or as a
touch-up in a critical function that the compiler doesn't optimize
well enough. At the very least, "noclobber memory" should exist, but
a compiler should also be allowed to look for itself. If the
compiler doesn't support this, it could always fall back on assuming
"clobber memory" for everything.

I misspoke. It doesn't do it in cases where none of the asm instructions
could possibly modify memory.

When using inline asm, you'll always run the risk of nonportability

between

compilers - after all, things like register conventions, calling
conventions, etc., are not defined by the language. Only the syntax of

the

inline assembler is.

But would it not be better to reduce the potential sources of
nonportability, by letting the programmer tell the compiler to handle
certain details? If the compiler can know the offset from EBP at
assembly time, it presumably knows that it's on the stack, and thus
that it should index off of EBP.

One thing I do in inline asm sometimes is muck with stack and the frame
registers. The variable name gives me an offset as if I hadn't - I then
adjust it as necessary.

Plus, pc is probably going to be used soon after the asm block; why
force it onto the stack and then back?

Because the inline assembler assembles the code long before any register
assignments are done.

That's a compiler implementation detail. Other compilers might not
have that restriction (for example, they may allow the registers to
be patched into the assembled code later on, or use an external
assembler).

They may not have that restriction, yes, but I don't want to force the
compiler to be built that way. I want to keep the bar low for building a
basic spec compliant D compiler, while making it possible to build very
advanced spec compliant ones.

The problem is that the programmer can't know exactly what he wants,
without knowing some decisions that the compiler will make. GCC's
syntax allows the programmer to tell the compiler exactly where to
substitute those decisions. Removing the ability of the compiler to
make the decisions will lead to slower code.

You are correct in the abstract. In my experience, I believe the

difference

to be negligible. I profile code extensively to make it faster. The
bottlenecks turn out to be maybe 30 lines of code out of a few thousand.
Those I just write completely in hand-tuned inline assembler.

It's a little harder when it's 30,000 lines out of a few million, and
most of that needs to stay portable, so any assembler has to be
buried in separate inline functions. In any case, I don't think the
language should throw away the opportunity for such optimizations
just because they don't help the majority of programs. The compiler
is free to not implement them if it doesn't feel they're important.

If the compiler is free not to implement it, then it can't be part of the
language spec. D doesn't preclude any vendors from adding extensions,
though. Extensions are important as they're how new innovations get tried
out. The good ones will wind up getting folded into D. I'm not sure what you
mean by portable, as GCC's way of doing inline assembler is not portable to
any other compiler. As far as I've been able to figure out (with google),
most of it isn't even documented. I figured out how to use it by reading the
kernel listings.
I'm currently in the process of building a linux version of D. It's pretty
sweet to be able to take the inline asm code from win32 and recompile it
under linux and it works just the same with no modification. That's a
hopeless task if you're using separate asm files, or if you're using the
inline assembler from a C compiler. I've even got obj2asm to work on elf
files, so now you can disassemble .o files and see it in intel syntax!
P.S. How I write a whole function in hand-tuned asm is write it in C,
compile it, disassemble it with obj2asm, cut & paste the code back into the
C source in an asm block, and then tune.

For the default case, sure. I'll wait until compilers have a full,
working AI built in before I trust even the best compiler to *always*
get it right, though.

The language design should specify semantics, and the semantics
should not change if something is inlined or not. I want to allow
the compiler writer to be as free as possible to innovate how D is
implemented. Trying to specify exactly what optimizations are
performed in the language spec can forestall that.

I'm not suggesting that the language mandate certain optimizations;
just that there be a standard way of communicating one's intentions
to the compiler. If the compiler doesn't support inlining at all,
then fine, don't inline; however, if it does support it, it should
pay attention to the programmer's request.

It's a pretty small gain in this case, but what if it were a
non-constant, that is almost guaranteed to be in some register before
the asm statement?

It's not worth it.

If there's no cost to it (as is the case with compilers which already
implement such things, including GCC), then any optimization is
worth it. It doesn't make the language any harder to write a
compiler for, as a compiler can choose to always interpret an
assignment as a mov statement.

I have a lot of practice writing fast applications (DMC
is the fastest compiler, and has been for 15 years).

But how much do you need to use assembly in a compiler? Take
something like a kernel instead, which often needs to use assembly
for various things, including the aforementioned copying of data
between user and kernel. This is done a lot, and saving a few cycles
on every such occurance *does* show up in the benchmarks, especially
since so many of them are just copying one or two words (making the
overhead very visible). Loading the value from userspace, then
storing it on the stack, then loading it again immediately after the
asm block is over will be noticeable. If you're on anything but a
non-regparm x86, add the cost of storing the user address to the
stack (since it was passed in a register) and then loading it again.
The compiler will generally do these sorts of things for its own
generated code; it doesn't strike me as a freak occurance for a
compiler to allow the user access to the same thing when using inline
assembly.

But would it not be better to reduce the potential sources of
nonportability, by letting the programmer tell the compiler to handle
certain details? If the compiler can know the offset from EBP at
assembly time, it presumably knows that it's on the stack, and thus
that it should index off of EBP.

One thing I do in inline asm sometimes is muck with stack and the frame
registers. The variable name gives me an offset as if I hadn't - I then
adjust it as necessary.

If you can specify that the value must be in a register in the
beginning and/or end of the block, you don't need to worry about the
validity of the address in the middle of the block.

Plus, pc is probably going to be used soon after the asm block; why
force it onto the stack and then back?

Because the inline assembler assembles the code long before any register
assignments are done.

That's a compiler implementation detail. Other compilers might not
have that restriction (for example, they may allow the registers to
be patched into the assembled code later on, or use an external
assembler).

They may not have that restriction, yes, but I don't want to force the
compiler to be built that way.

If the compiler isn't built that way, just act as if the user put a
mov instruction there. If the syntax allows the user to ask the
compiler to choose the register, it can pick one arbitrarily if it's
not capable of picking a good one.

It's a little harder when it's 30,000 lines out of a few million, and
most of that needs to stay portable, so any assembler has to be
buried in separate inline functions. In any case, I don't think the
language should throw away the opportunity for such optimizations
just because they don't help the majority of programs. The compiler
is free to not implement them if it doesn't feel they're important.

If the compiler is free not to implement it, then it can't be part of the
language spec.

The semantics behind what the programmer requests must be
implemented; it's the optimization that the semantics allow that
does not need to be there in simpler compilers.

D doesn't preclude any vendors from adding extensions, though.
Extensions are important as they're how new innovations get tried
out. The good ones will wind up getting folded into D.

Sure. However, this often leads to different compilers implementing
the same feature in incompatible ways, requiring programs that want
to use the feature to use lots of conditional compilation to remain
semi-portable.
If the new feature would require significant effort to implement
correctly (not necessarily efficiently), then I agree that it should
stay out of the language unless it is demonstrated to be sufficiently
useful (though it might sometimes be beneficial to formalize it into
an optional yet standardized extension, so that if it is implemented,
it's implemented in the same way). However, some of these things
could be implemented (poorly, but correctly and no worse than if the
feature weren't used) with a sed script if one were so inclined.

I'm not sure what you mean by portable, as GCC's way of doing
inline assembler is not portable to any other compiler.

Intel's compiler claims to support GCC inline assembly on x86 (their
IA64 compiler apparently doesn't support inline assembly at all).
However, in general, the lack of portability of inline assembly
between compilers for the same architecture is a bit annoying.
I was hoping that, with D's placing it into the language itself, it
would cease to be an issue. However, once extensions to the basic
syntax are relied on, you're right back to the current state of
incompatibility.

As far as I've been able to figure out (with google), most of it
isn't even documented. I figured out how to use it by reading the
kernel listings.

It's documented in the GCC info pages. Look for the "Extended Asm"
node, as well as the section on constraints.

I'm currently in the process of building a linux version of D. It's pretty
sweet to be able to take the inline asm code from win32 and recompile it
under linux and it works just the same with no modification. That's a
hopeless task if you're using separate asm files,

Not really. There are Intel-syntax assemblers for Linux (even gas
can be told to use it now), and gas is available for Windows should
one want to go the other way.

or if you're using the inline assembler from a C compiler.

Unless you're using the same C compiler on both platforms.

I've even got obj2asm to work on elf files, so now you can
disassemble .o files and see it in intel syntax!

GNU objdump can do that as well, by passing "-m i386:intel".

P.S. How I write a whole function in hand-tuned asm is write it in C,
compile it, disassemble it with obj2asm, cut & paste the code back into the
C source in an asm block, and then tune.

And do it over again every time the C code changes, or when a header
it depends on changes (if you notice!). Each time, doing it for
every supported architecture. It's still a useful technique for
certain situations, but it's not a replacement for flexible inline
assembly.
-Scott

I'm currently porting D to linux. Believe me, the inline assembler is a
great boon to that. Just try converting MASM files to gas files! To me,
using gas is like trying to write code looking in a mirror.

Why are you using GAS? You can use NASM (or maybe FASM) instead! Both
use a (cleaned-up?) Intel-Syntax.
There have also been a number of converters NASM <-> GAS <-> MASM. And
besides, the new GAS has been told to be able to use Intel-Syntax.
BTW, i didn't find a reliable way to use NASM with DigitalMars compilers
for Windows. It has Borland format, but it somehow didn't work. I'll try
to reproduce this problem someday later.
-i.

I did find reliable way to use NASM with Digital Mars for Win32 and DOSX
targets.
The problem is that common statement
section .data
or
section .code
in COFF and other formats is expanded to something line 'dword aligned
32-bit segment of code(or text)'
When the same statement is used for OBJ format, it is not treated as
pervious.
To make them identical, you should write
section .code align=4 use32
As for DOSX target, the previous is not sufficient. You should write
section _DATA class=DATA align=4 use32
or
section _CODE class=CODE align=4 use32
And moreover, you should place somewhere directive
group DGROUP _DATA
to tell linker to group data segment in this module with others.
The last described technique (I mean for DOSX target) is fully compatible
with Win32 target code.
I used this in order to compile XVID codec sources both for Win32 and DOSX
with DMC and it works.
BTW, with optimizations turned on C version of codec (when asm is not used)
runs almost twice faster than not optimized one. I think DMC optimizer is
cool!
Nic Tiger.
"Ilya Minkov" <midiclub 8ung.at> wrote in message
news:b9j6pl$32m$1 digitaldaemon.com...

Walter wrote:

I'm currently porting D to linux. Believe me, the inline assembler is a
great boon to that. Just try converting MASM files to gas files! To me,
using gas is like trying to write code looking in a mirror.

Why are you using GAS? You can use NASM (or maybe FASM) instead! Both
use a (cleaned-up?) Intel-Syntax.
There have also been a number of converters NASM <-> GAS <-> MASM. And
besides, the new GAS has been told to be able to use Intel-Syntax.
BTW, i didn't find a reliable way to use NASM with DigitalMars compilers
for Windows. It has Borland format, but it somehow didn't work. I'll try
to reproduce this problem someday later.
-i.