> On Fri, 4 Jan 2002, Richard Earnshaw wrote:
>
> > Yes, but at the other end, we currently have to read the instruction,
> > extract the call number, test whether it is zero (and if so do some
> > shuffling), check the range and then switch to the call value. If we
> > *know* that a netbsd-elf image always puts the syscall number in r0, then
> > we can eliminate all of the decoding and skip straight to the range check.
>
> We'd still need to look at the SWI number, since we need to check for IMB
> and suchlike, so the only win is that you can drop the checks for
> SYS_syscall (and probably SYS___syscall, if you convert it to SYS_syscall
> in userland). On the down side, you increase the number of syscalls for
> which you need to copyin some arguments from the stack.
I know it isn't strictly ARM ARM compatible (though that is only a
recommendation), but since you can't do an IMB in thumb, I think we should
use the same approach there as well (ie put the IMB code in r0).
>
> > So we pay a small price in the userland code for a large performance gain
> > inside the kernel...
>
> I think "large" may be an overstatement. Testing it shouldn't be hard,
> anyway.
I also forgot that before you can fetch the code you also have to test
whether you are in ARM mode or Thumb mode..., so the code would now be
something like
code = 0;
if (saved_pc & THUMB_BIT == 0) {
instruction = load[saved_pc - 4];
code = instruction & 0xffffff;
}
if (code == 0) {
code = reg0;
shuffle_regs();
}
While the alternative would be
code = reg0;
/* No need to shuffle regs, since we can adjust all uses. */
When you take into account the fact that mem[saved_pc] will probably not
be in the Dcache (it will normally be only in the Icache), the second
sequence should execute significantly faster; certainly it should be much
faster than the overhead of shuffling the registers before the call.