This is the (non-POSIX) threads package, that uses the basic clone interface
of the Linux-kernel.

Now, what is the difference between threads and
processes? A process runs totaly isolated from the other processes in the
Unix system, it concurrently request resources and is controled by the
operating system such that it does no harm to the other processes.
Interfaceing between processes is cost intensive performance wise.

A thread, on the other hand, is scheduled within the context of _one_ process,
shares most (if not all) of its resources, and has to cooperate with the
other threads in order to do no harm to whole thread group. Interfaceing
between threads is very cheap performance wise, hence, using threads is what
You should do when having _one_ application that needs to do work in
parallel.

This thread package does _NOT_ POSIX compatible threading.
As its name says: it is a tiny fast alternative using Linux cloning.
The aim was to avoid as much unnecessary sys_calls as possible.
Locking is _NOT_ done via IPC semaphores, instead we use a user level
atomic technique, when the resource is free, there will be no sys_call atall.

Each thread has its own 'Thread Control Block' (TCB) at the bottom of its
stack. Hence to identify itself it doesn't need to go over getpid().

However, there are restrictions given by the technique used:

- The total maximum number of threads is 27.
- The size of the stack must be at power of 2 and is equal
for all threads. It is not a problem to make the stack area huge,
because only those pages actually used will be allocated (paged in)
by the system. Though you will be estonished what /proc/<pid>/status
is telling you about stack consumption, ... just ignore it;-)
- You have to lock/unlock resource_libc when using non-reentrant parts
of libc (malloc _is_ no-reentrant).
- You must not use atexit, exit, _exit atall. instead use the techniques
and functions supplied by lt-threads.

Making room on the stack

The kernel has a feature, such that it doesn't
allow grow down the stack below the current stack pointer.
This makes sense in single threaded applications,
but is bad for threading purposes:

1. You are forced to put the stacks into the heap (below 1st giga)
hence it is less protected against overwriting then normally.
2. The 1 giga stack address space isn't used any more, and
you may have problems with address space, on huge programs
that use huge arrays (data bases, matrices).

We trick out the kernel by expanding the stack vma to a given
value (setting ESP to the bottom and thouching it).
( look at do_page_fault in arch/i386/mm/fault.c how it
treats growing down of stack )

Setting User Space Stack limits

int make_stack_unlimited(int drop_privs)

When running as user, current Linux has a stacklimit of 8Mb.
( no way to change that via bash ulimit )
This is enough for normal stacksizes, however, if you need more
You need some way to set it to 'unlimited'. This only can be done
as root, hence setting the suid bit is needed.
The below function sets the limit high, and (in case drop_privs)
will drop root priviledges before return.

Queuing tools

These two functions have to be used when maintaining queues of various
sorts. You may cast to your private queue entry structure as long as
the top of this structure fits `struct queue_entry':

Note: An existing entry _must_ have valid pointers in `next', `last'.
We build the head of a queue with an empty queue_entry haveing
`next', `last' poining to itself.

Avoiding Libc reentrancy Problems

Because all threads share the same VM, libc needs to be reentrant.
This isn't the case for a lot of functions ( malloc() is one on them).
Though newer libc can be made reentrant (needs the -D_REENTRANT on compilation
of _each_ object file), this feature has a performance lost due to the
overhead that applies to evry libc call. As we exactly know _when_ we
come into trouble with libc, the faster (and IMHO) better solution is to
to take care of this at application level.

For this to accomplish we have to put a (un)lock_resource(resource_libc)
bracket around those involved libc functions. The most frequently used ones
are offered by the threads package, including such locking.
Some of those are:

void *locked_malloc(size_t size)

void locked_free(void *p)

The prototype is exactly as you expect it from the libc functions.

getting page aligned memory from the heap

If you need a page aligned piece of memory, you usualy would use valloc().
However, you can't free that memory later. For this purpose you may use the
below functions, which are put on top of malloc()/free().

void *page_malloc(size_t size)

void page_free(void *ptr)

Name List Tools

The below functions are used for handling small name lists (you may call
it directories). All major resources in lt-threads are accessable also by
their names, as they are defined on creation time. Especially when it comes
to link between different processes (thread cores) and different maschines,
we look up the resources (such as mailboxes, threads, services) via name lists.

The global structures used for building namelists (as defined in lt-threads.h)
are:

Exiting a thread

Once the threading system is setup, you never should use exit() or
_exit(). Instead use exit_thread(0) to exit the running thread,
exit_thread(tcb) to kill an other thread and exit_all() to terminate
all threads.

void exit_thread(struct tcb *tcb)

void exit_all(int exit_code)

Suspending / resuming a thread

When a thread has nothing valuable to do (e.g. it is waiting for
some event to happen), it should go asleep. The following function
supends the current running thread:

void suspend_thread()

With the follow function a thread can awake a sleeping thread.
This may (but need not) happen imediately such that the calling thread
has been sleeping before retuning from resume_thread()

void resume_thread(struct tcb *tcb)

To suspend a thread for a given time interval (not resume_thread() needed
to awaken it later) the function thread_usleep can be used.

Note: we use our own usleep to avoid
A) problems with signal stuff
B) problems with libc

void thread_usleep(int useconds)

Locking Resources

The below locking scheme implements userspace semaphores,
that in the most frequent cases do _not_ enter the kernel.
( doesn't use IPC or kernel semphores )
It depends on the atomic_reserv/free() algorithme defined above
and won't work with other locking strategies, that cannot
`reserve and queue' with _one_ atomic operation.

The win of this algorithme is _much_ more speed,
the disadvantage is that you can't have more then 27 threads.
(though, on 64-bit machines it could be 58 threads)

create_resource returns a pointer to a newly created resource.
This one also is then added to the resources namelist so you may
look up for then `name'. If there are too many resources NULL will
be returned.
The following two functions are used to lock and unlock a previosly
created resource. A call to lock_resource() will put the thread into
sleep state, when the resource is already locked by an other thread.
A call to unlock_resource() will awake a thread, that has been waiting on
the resource.

void lock_resource(struct lock_struct *lock)

void unlock_resource(struct lock_struct *lock)

The below function is a special one: The owner of a lock may dedicate
the lock to a given other thread instead of just releasing it.
This will change the normal scheduling of locks.
(for example, this is used by the message routines)

void transfer_resource(struct lock_struct *lock, int successor_id)

Sending and receiving messages

A threading system without message transfer would be worthless.
We have it ;-)

All messages can be send to `mailboxes', that were created before.
The size of the message queue within a mailbox can be defined at
creation time. When a sending thread hits a queue overflow in the mailbox,
the sending thread is put asleep and queued for later to be awaken.
When a receiving thread hits an empty mailbox, it also gets asleep.
A sending thread on an empty mailbox awakens a sleeping receiver,
A receiving thread on a full mailbox awakens a sleeping sender.

mbox_handle create_local_mailbox(char *name, int numentries)

mbox_handle get_mailbox(char *name)

mbox_handle get_mailbox_wait(char *name)

void sendmessage(mbox_handle mbx, struct msg *msg)

struct msg *receivemessage(mbox_handle mbx)

int mailbox_is_empty(mbox_handle mbx)

Creating (starting) a thread

Any thread can create child threads. You must pass the address of
the function, that contains the thread's code and (optionaly)
pass the thread a private parameter pointer.

A little bit different is the creation of the `father of all threads',
the original Unix process itself. In order to let it also use the
threads related functions, it must be converted into a thread.

`init_zero_thread()' makes the starting parent process behave
as a normal thread and threfore this function call must come before
any other thread related call. It initializes the thrreads system.

struct tcb *init_zero_thread(int stacksize)

To obtain the pid of any thread please use the threads_getpid() instead
of the system call getpid(). It may be called any time and returns the
pid of the current process, when no threading is active (before init_thread0())
If `tcb' is NULL, it returns the pid of the current thread,
if -1 or THREAD0_TCB it returns the pid of `the father of all threads'
(thread0).

pid_t threads_getpid(struct tcb *tcb)

Some debugging aids

To make your life a bit easier (and because GDB has problems
debugging a thread group), here some usefull functions

The thread itself

All treads are functions of this type, when control reaches end of
this function, the thread exits the same as with exit_threads(0).
The 'params' pointer can be passed on create_tread().

A thread's exit function

typedef void thread_exit_function_type(void);

A thread must _not_ use atexit() to register a exit-function
(problems with non reentrancy of libc).
Instead, it may use the tcb->exit_func pointer point to a function
of below type. However, the old value of tcb->exit_func must
be saved on a _static_ place and restored within the exit function.
This way a a chain of exit functions will be called on a
`last-in first-out' policy.

Accessing a thread's TCB

All thread private data is on the stack, there may be private data
allocated on the heap, but the pointers to those areas should be itself
on the stack. As long as access happens within scope of the thread function
itself (or within local functions within function) this needs no extra
handling. However, calling an other function that needs to access thread
private data may be a problem. This problem can be solved by putting all
this data into a private structure and starting the thread by passing
it a pointer to this structure. Now, within scope of the thread one has
access to this pointer via.

OWN_TCB->params;

The TCB itself always is on bottom of the stack, hence it is also
available within a signal handler.

Looping through all TCBs

It may be necessary to scan data of all running threads or
doing something special like notifying e.t.c. To acomplish this you
should use the macro FOR_ALL_TCB such as

Atomic inline Functions

This set of functions is mainly used within the threads package itself,
however, you may find them usefull for your stuff too.

Below function atomically reserves 'resource' and queues ID.
When the reservation was successfull (i.e. if 'resource' was -1 before)
the function returns -1, otherwise '0'.
Id must be a number in the range 0..27.
In any case bit 'idnum' is set in resource. Due to the technique used
for being atomic, only 27 of 32 bits in the integer can be used.

static __inline__ int atomic_reserv(int *resource, int id)

Below function frees a previously with atomic_reserv reserved resource
If there are still IDs queued, the function returns with 0, otherwise -1.

static __inline__ int atomic_free(int *resource, int id)

Below function return the highest priority queued ID in the resource
or -1, if none is queued.
Format of 'resource' is as in atomic_reserv and atomic_free.