digitalmars.D - Explicit Thread Local Heaps

There was some discussion around here a while back about the possibility of
using thread-local heaps in the standard GC. This was rejected largely
because of the complexity it would add when casting to shared/immutable.
I'm wondering if it would be a good idea to allow memory to be explicitly
allocated as thread-local through a separate GC. Such a GC would be designed
from the ground up to assume thread-local data and would never be used to
allocate in standard Phobos or Druntime functions. It would simply be a
Phobos module, something like std.localgc. The only way to use it would be to
explicitly call something like ThreadLocal.malloc, or pass it as a parameter
to something that needs an allocator.
The collector would (unsafely) assume that you always maintain at least one
pointer to all thread-locally allocated data on either the relevant thread's
stack, the thread-local heap or in thread-local storage. The global heap,
__gshared storage and other threads' stacks would not be scanned.
A major issue I see is interfacing such a GC with the regular GC such that
pointers from the thread-local memory to shared memory are dealt with
properly, without being excessively conservative. The thread-local GC would
likely use core.stdc.malloc() to allocate large blocks of memory, and would
need a way to signal to the shared GC what blocks might contain pointers
without synchronizing on every update.
If this sounds like a good idea, maybe I'll start prototyping it. Overall,
the idea is that thread-local heaps are an optimization that should be done
explicitly when/if you need it, not something that needs to be built deep into
the language runtime.

There was some discussion around here a while back about the
possibility of
using thread-local heaps in the standard GC. This was rejected
largely
because of the complexity it would add when casting to shared/
immutable.
I'm wondering if it would be a good idea to allow memory to be
explicitly
allocated as thread-local through a separate GC. Such a GC would be
designed
from the ground up to assume thread-local data and would never be
used to
allocate in standard Phobos or Druntime functions. It would simply
be a
Phobos module, something like std.localgc. The only way to use it
would be to
explicitly call something like ThreadLocal.malloc, or pass it as a
parameter
to something that needs an allocator.
The collector would (unsafely) assume that you always maintain at
least one
pointer to all thread-locally allocated data on either the relevant
thread's
stack, the thread-local heap or in thread-local storage. The global
heap,
__gshared storage and other threads' stacks would not be scanned.
A major issue I see is interfacing such a GC with the regular GC
such that
pointers from the thread-local memory to shared memory are dealt with
properly, without being excessively conservative. The thread-local
GC would
likely use core.stdc.malloc() to allocate large blocks of memory,
and would
need a way to signal to the shared GC what blocks might contain
pointers
without synchronizing on every update.
If this sounds like a good idea, maybe I'll start prototyping it.
Overall,
the idea is that thread-local heaps are an optimization that should
be done
explicitly when/if you need it, not something that needs to be built
deep into
the language runtime.

In my code the lock during allocation is more an issue than GC scanning.
Having thread local (or better numa node local) pools for the
allocation with separate locks would solve the main bottleneck.
I have always disliked extra memory hierarchies, I feel that its
benefit/complexity ratio is too small, but I might be wrong.
The problem you identified of pointers to "global" memory is difficult
to solve in a way that really gives the local GC and advantage over
the a good GC implementation has uses several pools, without burdening
the programmer.
Still I imagine that having a localgc library implementation could be
useful to some.
I suspect that using it for general types that might allocate memory
on their own would be difficult, but as this be used in special cases
probably it isn't an issue.