* There are no available hardware registers to designate as the thread register.

−

thread register. Therefore, kernel magic will be used to make the

+

Therefore, kernel magic will be used to make the thread pointer available to userspace. This specification does not proscribe a mechanism for that; the mechanism for obtaining the thread pointer will be encapsulated in the |__tls_get_addr| function.

−

thread pointer available to userspace. This specification does not

+

* Use TLS Variant II (in which the TLS data areas precede the TCB in memory).

−

proscribe a mechanism for that; the mechanism for obtaining the

+

As noted in Drepper's paper, this design permits the compiler to generate efficient code for the case that the main executable accesses TLS variables from the executable itself.

−

thread pointer will be encapsulated in the |__tls_get_addr| function.

+

* The |__tls_get_addr| function has the prototype:

+

+

extern void *__tls_get_addr (tls_index *ti);

−

2. Use TLS Variant II (in which the TLS data areas precede the TCB in

+

where the type 'tls_index' is defined as::

−

memory).

+

−

As noted in Drepper's paper, this design permits the compiler to

+

typedef struct {

−

+

unsigned long ti_module;

−

generate efficient code for the case that the main executable

+

unsigned long ti_offset;

−

accesses TLS variables from the executable itself.

+

} tls_index;

−

+

−

3. The |__tls_get_addr| function has the prototype:

+

−

+

−

extern void *__tls_get_addr (tls_index *ti);

+

−

+

−

where the type 'tls_index' is defined as::

+

−

+

−

typedef struct {

+

−

unsigned long ti_module;

+

−

unsigned long ti_offset;

+

−

} tls_index;

+

−

+

−

The type 'unsigned long' is used because it is a 32-bit type in

+

−

32-bit mode and a 64-bit type in 64-bit mode; thus, the members

+

−

will fit correctly into two consecutive GOT entries in both

+

−

modes.

+

−

+

−

4. The Initial Exec and Local Exec models are not yet specified.

+

−

+

−

These models require that the compiler be able to directly access

+

−

the thread pointer without using |__tls_get_addr|. Whether or not

+

−

that is possible will depend on the kernel mechanism used to

+

−

implement the thread pointer.

+

−

+

−

5. The compiler is not allowed to schedule the sequences below.

+

−

+

−

The sequences below must appear exactly as written in the code

+

−

generated by the compiler. This restriction is present because we

+

−

have not yet specified the Initial Exec and Local Exec models, and

+

−

so it is not clear what linker optimizations may be possible. In

+

−

order to facilitate adding linker optimizations in the future,

+

−

without recompiling current code, the compiler is restricted from

+

−

scheduling these sequences.

+

+

The type 'unsigned long' is used because it is a 32-bit type in 32-bit mode and a 64-bit type in 64-bit mode; thus, the members will fit correctly into two consecutive GOT entries in both modes.

+

+

* The Initial Exec and Local Exec models are not yet specified.

+

+

These models require that the compiler be able to directly access the thread pointer without using |__tls_get_addr|. Whether or not that is possible will depend on the kernel mechanism used to implement the thread pointer.

+

+

* The compiler is not allowed to schedule the sequences below.

+

The sequences below must appear exactly as written in the code generated by the compiler. This restriction is present because we have not yet specified the Initial Exec and Local Exec models, and so it is not clear what linker optimizations may be possible. In order to facilitate adding linker optimizations in the future, without recompiling current code, the compiler is restricted from scheduling these sequences.

Revision as of 05:44, 5 November 2004

Contents

Status

Currently NPTL for Linux/MIPS is work in progress. Ulrich Drepper's NPTL Design Document contains some information on the NTPL design including implementation details on other architectures.

Overview

This document presents a design for implementing Thread Local Storage
(TLS) for MIPS Linux, in both 32-bit and 64-bit mode. This design
specifies the code that must be generated by the compiler, the
relocations that must be generated by the assembler, and the processing
that must be performed by the linker.

Design Choices

There are no available hardware registers to designate as the thread register.

Therefore, kernel magic will be used to make the thread pointer available to userspace. This specification does not proscribe a mechanism for that; the mechanism for obtaining the thread pointer will be encapsulated in the |__tls_get_addr| function.

Use TLS Variant II (in which the TLS data areas precede the TCB in memory).

As noted in Drepper's paper, this design permits the compiler to generate efficient code for the case that the main executable accesses TLS variables from the executable itself.

The type 'unsigned long' is used because it is a 32-bit type in 32-bit mode and a 64-bit type in 64-bit mode; thus, the members will fit correctly into two consecutive GOT entries in both modes.

The Initial Exec and Local Exec models are not yet specified.

These models require that the compiler be able to directly access the thread pointer without using |__tls_get_addr|. Whether or not that is possible will depend on the kernel mechanism used to implement the thread pointer.

The compiler is not allowed to schedule the sequences below.

The sequences below must appear exactly as written in the code generated by the compiler. This restriction is present because we have not yet specified the Initial Exec and Local Exec models, and so it is not clear what linker optimizations may be possible. In order to facilitate adding linker optimizations in the future, without recompiling current code, the compiler is restricted from scheduling these sequences.

Conventions

In what follows, all references to registers other than |$2| (when it is
used as the return register) , |$4| (when it used as an argument
register), |$25| (the address of a called function), and |$28| (the
global pointer) are arbitrary; the compiler is free to use other
registers instead.

Where |...| appears in a code sequence the compiler may insert zero or
more arbitrary instructions.

Here, |lw| may be replaced with any other load/store insturction, using
the same opcode format as |lw|, such as |lb|, |lbu|, |lh|, |lwu|, |ld|,
|sb|, |sh|, |sw|, |sd|, |ll|, |lld|, |lwl|, |lwr|, |ldl|, or |ldr|.

If the size of the TLS area is known to be smaller than 32K, then the
following sequences can be used instead of those above.

If, rather than needing the address of the variable, the variable is to
be read or written, and the size of the TLS area is known to be smaller
than 32K, then the following code sequences may be used instead of
either of the sequences above.