The purpose of this change is to remove all arosc-specific extensions from
the AROS Task structure. After the changes arosc.library will behave as any
other shared library. It can then also be replaced with other C libraries if
wanted. For the implementation I'm extending the auto-generated code from
genmodule so it provides the features so that arosc.library can store all
its state in its own per-opener/per-task libbase.

STATUS: IN PROGRESS (branches in SVN: branches/ABI_V1/trunk-genmodule_pob and
branches/ABI_V1/trunk-arosc_etask; first big milestone is done as the iet_acpd
field is removed from IntETask in arosc_etask branch).

Switch everything to DOS packets and remove the current system based on
devices. This has been heavily discussed on the mailing list. The current
AROS implementation uses devices for message passing and the classic system
used Port and DOS Packets. The latter is considered by many people as a hack
on AmigaOS without fully following the rest of the OS internal structure.
That's also reason why the alternative for AROS was developed in the first
place. But it became clear that we'll have to implement a DOS packets
interface anyway and thus the question became if we should have two
alternative implementations in AROS. In the beginning I too was a big
opponent of the DOS packets, mostly because the device interface allowed to
run some code on the callers task and thus avoid task switches to reduce
throughput. Using the PA_CALL and PA_FASTCALL feature of AROS ports the same
could be achieved. In the end it was concluded that everything you could do
with the device interface could also be done in the ports interface and
vice versa and that having two systems next to each other is bloat.

In current implementation file handles and locks are the same and this
should also be changed when switching to the DOS packets interface.

STATUS: NOT STARTED

DECISION: A decision was made on 2008-11-13 to include this item in full
scope into ABI_V1.

REMARKS:
A proposal for decrease of scope and work needed was made:

Since the file systems we use today (SFS, AmberRAM, CDROM, FAT) are all
package base (either via packet.handler or SFS packet emulation), why
not instead of whole system change do the following:

add missing features to package.handler

migrate SFS to use packet.handler

all newly written modules need to be packet-based

Advantages: less work for 1.0, preparing for future change
Disadvantages: this solution would probably not change until next major
release

This will involve a lot of discussion on the mailing list to nail down the
V1.0 version several APIs. During the writing of this manual a lot of things
will pop up that have not be thought through yet, so ABI V1 and AROS 1.0 can
only be released if this task is finished.

One of the important discussions is to clear out what is handled by the
compiler and what is defined by the ABI of AROS itself. This border is quite
vague at the moment.

We will have to go over all libraries and determine/document what a
developer can or can't expect from all their functions. We should also be
sure that all functions then also act as we describe in the ABI.

The manual will have to list all API that are not yet considered stable and
should not be used by applications which want to be future-compatible:

Currently SysBase is a special global variable handled by the ELF loader;
but some people don't seem to like it... I would like to keep it like that.

Another problem is that currently SysBase is one global pointer shared by
all tasks. This is not SMP-compatible as we need a SysBase at least for
each CPU/core separately:

Michal proposed to call a function every time you need SysBase.

Staf proposed to use virtual memory for that; e.g. that SysBase pointer
has the same virtual address for each task but points to a different
physical address for each CPU. Each task is allowed to access some
fields of SysBase without using any exec calls. For example, it is
allowed (or at least not forbidden) to access SysBase->ThisTask field,
which would be different on different CPU cores. Having SysBase at
fixedvirtual address means, any task can cache the SysBase value
and use it in any way. Each CPU core (or each Task if one prefers) would
have it's own exclusive copy of SysBase - all of them at the same
virtual address.

On topic of SMP: The "virtual" SysBase is just one tiny change among many
others needed to make SMP work properly. Not to mention one of drawbacks of
SMP on Amiga-like systems. The Forbid() call as well as Disable() has to
block scheduler/interrupts on ALL CPUs.

Is currently heavily discussed on the mailing list, but I myself don't have
the time to delve into it. A summary of different proposals with some
example code to see the effect on real code would be handy. My proposal is
to switch to the startvalinear & co. from the adtools gcc compiler. This
will use the same solution as is used for OS4. Advantage is that a limited
change of code is needed to adapt this. Disadvantage is that the adtools gcc
has to be used for compiling AROS.

A decision needs to be made whether we implement varags 'physically' the
same on all architectures or we use the same macros/approach on every
architecture, but this macro/approach can be implemented differently on each
architecture. If the second approach is chosen, nothing has to be changed
for i386. If the first approach is chosen (first proven possible) current
code will have to be rewritten

no static version of arosc.library - This makes building arosc quite complex
and all programs should be able to use the shared version anyway.

remove librom.a and arosc.library. arosc.library is split in three parts:

a std C subset that can be put in ROM as the first library, that thus
does not need dos.library. This should then replace all usage of
librom.a.
This library is called stdc.library. It contains also the math
functions which may make it too big for for example m68k ROMs.

a std C implementation of stdio is done as a light weight wrapper around
AmigaOS file handles. The library is stdcio.library

Full POSIX implementation; possibly also including some BSD functions.
This would then provide a (nearly) full POSIX implementation. This
library should be optional and real Amiga programs should not need this
library. The library is called posixc.library.

this change will influence all already compiled programs as arosc.library
is gone.

a possibility to retain the compatibility by creating
arosc2.library (new library) and reimplementing arosc.library as stub to
arosc2.library was rejected - the ABI_V1 should be clean of such hacks

This resource is meant to gather all arch and CPU-specific code for
exec.library. This way exec.library would not need any arch-specific
code but would use a call to kernel.resource functions when arch-specific
information or actions are needed. This change can be delayed until after
ABI V1.0 as exec.library is fixed; but programs that use the kernel.resource
directly will not be compatible with the next iteration of the ABI

STATUS: NOT REQUIRED FOR 1.0

DECISION: A decision was made on 2008-11-28 not to require this item
for ABI_V1 changes.

I think current oop.library and the hidds are still sub-optimal and need
some more investigation. I think this work can wait till after ABI V1.0 and
mentioning in the ABI V1.0 that programs that use oop.library or hidds
directly won't be compatible with ABI V2.0.

STATUS: NOT REQUIRED FOR 1.0

DECISION: A decision was made on 2008-12-04 not to require this item
for ABI_V1 changes

In my opinion, some cruft has been building up in these files and they could
use a good discussion of what things can be removed and what things can be
combined and then properly documented. Probably this does not impact the ABI
so it may be delayed but changes to these files may cause source
compatibility problems.
E.g. programs will not compile any more with the new version if they depend
on some specific features and need to be adapted but programs
compiled with the old version will keep on running on AROS.

STATUS: NOT REQUIRED FOR 1.0

DECISION: A decision was made on 2008-12-04 not to require this item
for ABI_V1 changes

the current state of the headers is anything that is releasable or can be
fixed; it is more a hacked together bunch of headers then a well thought out
set.

Luckily this is not an ABI problem but a source compatibility problem. When
we clean-up the header files we may introduce some compilation problems but
when the source code is fixed the program should still run on all ABI V1.0
conforming systems.

This task IMO belongs in the SDK development department. We may want to
introduce a SDK V1.0 somewhere in the future but this can happen after
releasing ABI V1.0 and AROS V1.0.

STATUS: NOT REQUIRED FOR 1.0

DECISION: A decision was made on 2008-12-14 not to require this item
for ABI_V1 changes

For m68k the SAS/C compiler had the possibility to use the A4 register as a
base address for accessing global variables. This way so called pure programs
can be generated; these are programs that can be loaded once into memory and
run several times. Every time when then the program is started, room is
allocated for the variables and the A4 variable is set to the allocated
memory. Such a feature is also needed on the i386 ABI.

Implementation

Different options have been examined in the past to see how this feature could
best be provided on i386. Currently it has been settled on using the %ebx
register for this purpose. This is analog to the SYSV i386 ABI where %ebx can
be reserved for use as global offset table (GOT) for position-independent
code.
The register has been reserved for this purpose by adding the -ffixed-ebx
option to the gcc specs file. This means that gcc will not generate code that
accesses the %ebx register, so it is reserved for other system use.
The consequence is that you can't use the "b" constraint any more on inline
asm in gcc for input or output arguments. This constraint would allow putting
arguments in the %ebx register but gcc gives an error because it isn't
supposed to use this register; inline asm statements of external code may thus
need to be changed to work on AROS.
Two macros (AROS_GET_RELBASE and AROS_SET_RELBASE) are defined to access this
variable. These must to be defined by the cpu.h of a specific cpu, or you'll
get a compile error.
At the moment no patch for gcc is available for using the --baserel option on
i386 for producing pure code for AROS. Feel free to implement it.
At the moment on i386 this register is also used for passing the libbase
pointer.

Other options

Other things have been considered for, and some even tested as, a good place
to store the relbase. Here are some summaries:

A __relbase system global variable. Analog currently to the SysBase variable
there would be a __relbase global variable where the value is provided by
the ELF loader. This approach works but this path was left when %ebx was
discovered and it's analogy to the SYSV usage of this register.

Registers %fs:n or %gs:n: Another possibility would be to use the i386
segment registers for accessing the relbase. Also here %ebx is currently
considered a better solution. The impact of using the segment registers on
a hosted version of AROS is not fully clear yet.

On m68k Amiga the A6 register was used for the pointer to the base address of
a library structure. This A6 register was also not used for other code so
anywhere in the code of a library you could access the libbase through this
register without explicitly passing it to the function as a function argument.

The first i386 ABI put the libbase on the stack as last argument to the
function. This approach has some disadvantages:

When you want to use the libbase in subfunctions of the library you need to
pass the libbase to that subfunction. In the end this means that the libbase
has to be added as an argument to all functions in a call chain of functions
where the libbase is needed in the bottom function.

This way only a libbase can be used in function that use Amiga library style
argument passing. It is difficult to use the libbase in libraries that use
normal C argument passing, like for example arosc.library.

Current way of passing argument on i386 is incompatible with vararg
functions as the libbase is put as last argument on the stack.

There has been comments of this approach on the efficiency due to the stack
usage by the extra argument.

The goal for the ABI and the libbase would also be to be compatible with other
compilers than gcc; for example the vbcc compiler.
Another goal should be to make it very easy to port external libraries to
AROS.

Implementation

The current implementation in the ABI V1 branch (trunk/branches/ABI_V1) uses
the %ebx register to pass the libbase to the library function. This register
is also reserved for base relative addressing. As this register is currently
reserved and not used by code generated by the compiler you could access it
everywhere in the library code.
AROS_GET_LIBBASE and AROS_SET_LIBBASE macros are defined to be able to access
the libbase. These are by default defined as respectively AROS_GET_RELBASE and
AROS_SET_RELBASE when they are not provided by the cpu.h include file for the
cpu.
When no cpu-specific versions of the AROS_L[HCPD] macros are defined, default
versions are defined in aros/libcall.h that will use the AROS_GET_LIBBASE and
AROS_SET_LIBBASE macros for passing the library base to the functions of a
library.

Other solutions:

Use relbase for passing libbase: in the description of the base pointer for
relative addressing different
solutions are provided. These could all also be used for libbase passing.
They could also be used even when another solution would be used

Use %eax register; two different approaches have been tried:

A first approach is to add a gcc attribute to the function for putting
the first argument into a register and then passing the libbase as the
first argument (the gcc attribute is __attribute((regparm(1)) ). The big
disadvantage is that now the libbase is the first argument. This means
that in warnings and errors the first normal parameter is reported as
the second parameter and so on. This is very confusing for the
programmer and that is why a better way was sought.

A second approach was to try to implement the AROS_LH macros fully in
inline asm code that would push the arguments on the stack and call the
function. This approach seemed to get very complex as now the rules for
sizes of arguments have to be implemented in the macro itself. This
approach was not further developed when work with the %ebx was started.

Q:
Why use the MMU? Just put it in thread local storage and don't worry about the
MMU. Using the MMU to remap different instances to the same location means
changing the MMU on every context switch.

A:
TLS is used already. The TLS on x86_64 (well, it's not really thread local
storage yet) is used to fetch SysBase and KernelBase. Keep in mind, however,
that on SMP system, several SysBases would need to exist in parallel. Since
some applications access SysBase fields, like SysBase->ThisTask directly,
each SysBase would need to reflect the state of one CPU core. Now imagine the
following pseudo code:

If the task switch would happen between these two lines of code, and the task
would migrate to another CPU (that might happen, depending on scheduler
strategy used), the SysBase->ThisTask would not point to the right task. Yes,
I do agree that such code sucks most. Only AROS (Amiga!) makes it possible ;)

Now, imagine we have a SysBase at some fixed virtual address, whereas each CPU
maps different portion of physical address space there. If a task switch would
happen between two lines of code mentioned above, nothing particular
(nothing bad!) would happen. No matter what CPU the task runs on, it would
always fetch the SysBase valid for the very CPU.

No, we would not need to change MMU table on every context switch. Every CPU
would get it's own MMU map during the boot process. Moreover x86_64 needs MMU
in 64-bit mode anyway.

The original API on 68k Amiga computers had variadic functions, like DoMethod.
The implementation and use of those functions relied on the way compilers
were placing the arguments to such functions. At the time these API functions
were invented, arguments were placed on the stack right before the function
was called.

Nowadays the placement of function arguments is often referred to as calling
conventions and as such are defined in ABI documents. The ABI defines a common
set of rules so binaries created by different compilers for the same CPU
architecture that claim to support a certain ABI can be linked together. Today
68k and x86 are examples of architectures that still place arguments on the
stack.

The X86_64 and PowerPc are examples of architectures that place a fixed number
of arguments in registers and additional arguments are then placed on the
stack of the caller of a function.

Examples

What does all that mean in practice? Look at the current implementation of
DoMethod for x86:

DoMethod is used to call a method of an object belonging to a certain class
(you need to know a little about OOP to understand all this). It is done by
calling a hook function of that class. This hook function is actually the
class dispatcher function that looks at the method Id and calls the
corresponding method function with additional arguments given to DoMethod.
A call to DoMethod can look like this:

For x86 all the arguments are placed on the stack. That means they are
linearized in memory, as if they were the elements of an array or a struct.
It can look like this:

Lower address
MyObject MyId 1 2 3 4 5 6 7 8

So if we take the address of MethodId inside of DoMethod we can access the
additional actual arguments by using pointer arithmetic:

*(&MethodId + 2)

would reference the address were the number 2 is stored. This can also be done
in a method function were we got the address of MethodId from the call to
DoMethod.

For X86_64 and PowerPc it is impossible to do it this way. Remember the first
arguments of DoMethod are passed in registers, up to 8 registers for PowerPc
and 6 for X86_64. So if we take the address of MethodId the compiler puts the
registers value holding MethodId onto the stack frame of DoMethod (not onto
the stack frame of the caller of DoMethod where additional arguments were
placed) and returns the address of that stack location. Then we add to that
address and expect the number 2 at the resulting location, which actually is
still in a register. What we get will normally simply lead to a crash, in real
application code.

Solutions

Normally, if you start searching for solutions to a problem, you first check
what others have done. Well, there are MOS and OS4, both being successors of
the original AmigaOS, running on PowerPc. They have solved this problem, both
of them basically in the same way. But this is only one of the possible
solutions. If possible, you want to choose from more solutions. The following
is a list of solutions proposed on the AROS developer mailing list.

Throw away the variadic functions and use the ones that accept a pointer to
a structure, i.e. use DoMethodA.

Use macros. With DoMethod not being a real function but a macro, its
arguments can be serialized into a local array.

Use a patched compiler to always pass arguments to some special attributed
functions on the stack .

Use macros from stdarg.h, i.e. va_start, va_end, va_copy, va_arg.

Use AROS' already existing SLOWSTACK macros.

Use assembler stubs that extend the callers stack frame to store the
register arguments at the right position, then call the function (DoMethod).
Upon function return all stack manipulations are reverted and control goes
back to the function caller. Example of asm stub in PowerPC assembler.

Pros & Cons

Lots of code will have to be rewritten. People will probably not accept this
decision. The original API had those functions we should support them as
well.

Is already used at some places, but especially together with DoMethod there
is a huge usage of stack space.

This is what MOS and OS4 did, and they failed to get the patches into the
main gcc tree. I.e. the patch has to be done for every new gcc version in
use. If we would use their compiler we'll be getting this work done for
PowerPc for free and we'll have to do it for all the other architectures
that pass args in registers on our own. We also had our own patch for the
PowerPc port for some time and it worked well, but there were only few
people working on PowerPc port. Maybe this was one of the reasons.

Is the only clean and standard way to do it. But it also requires a lot of
work. Regarding DoMethod we'll need an additional variadic dispatcher for
every class. Here va_arg and va_copy are used to get the arguments and call
the methods with them.

Is what X86_64 and PowerPc are using at the moment. It is slow and there
are potential buffer overruns because some of these use fixed size arrays.

This is a hack, because we are tinkering with stack frames but it works. It
has already been tested on PowerPc together with DoMethod. An assembler stub
would be needed for each architecture.