If anyone noticed this week I posted a few questions, and then a few bis
of code relating to threading and the GC.
Between tearing my hair out and generally verging on total madness, this
week I'm beginning to get to grips with the GC in a threaded environment.
Basically this statement is true for both phobos and tango (although
I've examined the phobos GC far more than tango)
-------------------------------------------------
"THE GC AS DELIVERED IS COMPLETELY BROKEN FOR REAL THREADED APPLICATONS,
and you should not try and write threaded applicaitons without expecting
to do a large amount of hacking to the GC"
-------------------------------------------------
Background - HOW THE GC WORKS:
-------------------------------------------------
let's first explain the basic ideas behind the Garbage collector:
Since most things D land are malloc'd via the GC - the GC knows all
about the memory you have allocated.. - It can also look at the stack
(currently running memory, eg. int/pointers that you have created in
your methods)..
Basically what it does is this:
- loop through all the memory it knows about, see's if any of it points
to any of the memory it has allocated..
- if the memory it has allocated has not been reference by the memory it
knows about.. then it's dead meat, and can be returned to the 'stack'
so it can be made available to any 'new' malloc() call.
There is quite a bit more in there, like a really nice design for
buckets/ and linked lists that default fill it... enabling quick
allocation depending on what size of memory is requested...
Anyway I hope you get the point..
The problems..
-------------------------------------------------
A) CROSS THREAD FREE's ARE EXTREMELY DANGEROUS
The current GC implementations free across threads both explicity with
delete, and implicitly with genCollect() which is just downright dangerous.
Consider this example - const char Gotcha
.......................................
extern (C) {
void * x = mylib_new();
void mylib_dosomething(void x* const char* z);
}
... d code ...
function risky_function(char[] a) {
mylib_dosomething(mylib, a.toStringz());
}
.......................................
In the above code I've left const char* in there, as that's what's in
the C headers, but is not actually in the D code.
What is critical about this code is that while mylib* x is active, and
has been passed 'char* z' in D, as in the risky_function() examples.
For the purpose of the garbage collecter which can only look at memory
it knows about to find pointers to 'char *z' z is ready to be
collected... This problem can easily crop up with the libc functions
that expect const char*'s and store data which another method may
queried later, and find has been overwritten with something completely
different (imagine function pointers!!)
the only workaround for this is to addRoot() and store the return value
to any 'toStringz()' call which is delivered into a const char* C call..
- even this may not help as you can accidentally delete these objects..
Normally all of this is not a common problem in a non-threaded
application, as you would probably just run the genCollect() after you
have finished dealing with mylib() - and everything will get cleaned up
nicely.
Unfortunatly with a multi-threaded application - if you run genCollect()
at the wrong time (eg. in another thread), you will destroy the char*
that was supposed to be constant...
-------------------------------------------------
B) NOT RUNNING genCollect() JUST WILL KILL MEMORY
as in the example above, you may think that turning off the GC and not
running getCollect(), they trying to free memory as you get it might be
a better idea.
Unfortunatly generally D is dependant and expects you to be running the
GC! a really good example is the toStringz() code.
what toStrings does is this
char* toStringz(char[] s) {
char[] copy = new char[s.length+1];
copy[0..s.length] = s;
copy[s.length] =0; // pad end with \0
return copy.ptr;
}
what char[] looks like under gdb is
struct { length = 123; ptr = "xxxxxxxxxx" }
the above code is probably doing a malloc() for the struct and a
malloc() for the ptr data, but leaves the struct hanging and expected to
be garbage collected, returning the ptr which is then sent on to a
external method, and might be expected to be constant. ***Well not 100%
sure about that behaviour, the struct may not be malloced, but just
allocated on the stack..***
There are many other examples in D of code where trying to manage and
clear memory explicitly is dificult / or near impossible (assoative
arrays for example) - and the expectation is that the garbage collecter
be used to sort it out..
The reality is that it's extremely difficult to be sure that you are not
leaking memory, there are no built in tools currently for testing this
(there is code in dsource = i've not tested it yet) , and it's kind of
goes against the principle of D, where you can for the most part throw
away all that memory allocation code, and concentrate on getting things
done..
-------------------------------------------------
POSSIBLE SOLUTION?
-------------------------------------------------
What I've hacked into the GC so far:
- getIdx() for std.thread.Thread exposing the private thread.idx ???<
how reliable is this??? - should be OK as long as you never delete a
thread....
- A GC Log like Array for associating each bit of malloc()'d memory is
owned by which thread.
- Checks in genCollect code to ensure that it does not try and free
other thread's memory.
- Warnings in free() when the program does exactly that..
- Information in LOGGING so that malloc can be traced to a line of code
(class/method etc.- backtrace_symbols does not appear to return line
numbers...) - see recent post
At this point running a threaded server under heavy load is alot more
stable than before. however as you can understand by looking at the
descriptions above. it still has some serious issues.
- Searching all the threads memory each time it wants to tidy memory a
little is horribly inefficient..
- There are still issues with the const char* that should be checked for..
ONGOING IDEAS
-------------------------------------------------
- One gcx pool for each thread.
This seems like the most sensible solution.. - it would mean that the
genCollect() run on a specific thread would only look at memory related
to that thread.
** Implications
+ memory allocated in one thread and used in another could not be
freed by that thread. -- need to think more about this.
++ if you pass a complex structure when starting a thread, you will
have to flag the root in the parent thread to ensure that it's not
freed by the starting thread.
++ the child thread will then need a way to flag so the starting
thread knows it can free it... -
++ making changes to a object in the child thread may make memory
allocation problematic.... -> the child thread does not know about the
parent?!?!
- rooting the results from getStringz() - and warning users!!!!!
- using strncpy and removing the need for the struct?? ( not sure if it
makes much difference - it's based on struct assumtion previously)
Suggested code for toStringz() = something like:
char* toStringz(char[] s) {
char *c = malloc(s.length+1;
strncpy(c, s.ptr, s.length);
c[s.length]=0;
gc.addRoot(c);
return c;
}
free should check Root() to see if it's registed there, to prevent it
being free'd by accident. (genCollect already does) an additional method
rootFree() - which basically does removeRoot() + free()...
Anyway it'd be interested in feedback.
Regards
Alan

There's three trivial steps to solve ALL of this:
a) don't manually delete things, at all
b) store pointers that have been passed to C functions where the GC can see
them.
c) For memory that's explicitly supposed to _belong_ to the C functions, use
cstdlib malloc.
If you keep to these three points, things become very simple.

The problems..
-------------------------------------------------
A) CROSS THREAD FREE's ARE EXTREMELY DANGEROUS
The current GC implementations free across threads both explicity with
delete, and implicitly with genCollect() which is just downright dangerous.
Consider this example - const char Gotcha
.......................................
extern (C) {
void * x = mylib_new();
void mylib_dosomething(void x* const char* z);
}
... d code ...
function risky_function(char[] a) {
mylib_dosomething(mylib, a.toStringz());
}
.......................................
In the above code I've left const char* in there, as that's what's in
the C headers, but is not actually in the D code.
What is critical about this code is that while mylib* x is active, and
has been passed 'char* z' in D, as in the risky_function() examples. For
the purpose of the garbage collecter which can only look at memory it
knows about to find pointers to 'char *z' z is ready to be collected...
This problem can easily crop up with the libc functions that expect
const char*'s and store data which another method may queried later, and
find has been overwritten with something completely different (imagine
function pointers!!)

There are very few C standard library routines which store data that
other functions reference, largely because they aren't thread-safe. So
I don't think this is actually an issue in practice. But more
generally, you're right. If you're using an external library that
stores a reference to GCed data and you intend to discard your own
reference to that data then you have to tell the GC not to collect it.

-------------------------------------------------
B) NOT RUNNING genCollect() JUST WILL KILL MEMORY
as in the example above, you may think that turning off the GC and not
running getCollect(), they trying to free memory as you get it might be
a better idea.
Unfortunatly generally D is dependant and expects you to be running the
GC! a really good example is the toStringz() code.
what toStrings does is this
char* toStringz(char[] s) {
char[] copy = new char[s.length+1];
copy[0..s.length] = s;
copy[s.length] =0; // pad end with \0
return copy.ptr;
}
what char[] looks like under gdb is
struct { length = 123; ptr = "xxxxxxxxxx" }
the above code is probably doing a malloc() for the struct and a
malloc() for the ptr data, but leaves the struct hanging and expected to
be garbage collected, returning the ptr which is then sent on to a
external method, and might be expected to be constant. ***Well not 100%
sure about that behaviour, the struct may not be malloced, but just
allocated on the stack..***

You're right. String operations in D necessarily assume the presence of
a GC. Concatenation, appending, etc, all allocate new memory and leave
the old memory untouched... assuming that it may still be referenced by
something.

-------------------------------------------------
POSSIBLE SOLUTION?
-------------------------------------------------
What I've hacked into the GC so far:
- getIdx() for std.thread.Thread exposing the private thread.idx ???<
how reliable is this??? - should be OK as long as you never delete a
thread....
- A GC Log like Array for associating each bit of malloc()'d memory is
owned by which thread.
- Checks in genCollect code to ensure that it does not try and free
other thread's memory.
- Warnings in free() when the program does exactly that..

What about transferring data between threads? That aside, I think the
new "shared" semantics in D2 address this fairly well.

ONGOING IDEAS
-------------------------------------------------
- One gcx pool for each thread.

This needs language support to work well, since some restrictions must
be placed on the use of static data if GC scanning of all threads is to
be avoided.
Sean

"THE GC AS DELIVERED IS COMPLETELY BROKEN FOR REAL THREADED APPLICATONS

Isn't it a bit overstated?
You basically say that memory could be prematurely freed if an external
non-D function stores a pointer to GC-allocated memory internally. This
is quite different from "COMPLETELY BROKEN" to me. You don't even
mention that for both extern(C) and extern(Windows) pointers are pushed
on stack so that memory is protected at least until the external
function returns. Well it *can* overwrite its arguments with an
unrelated data, but in all other scenarios your memory is protected.
The solutions are:
1. Don't bother. This is the correct approach for most of CRT and Win32
functions because they do not store pointers.
2. Write wrappers. If you must use a non-pure, non-reentrant C library
which stores pointers, or returns malloc'd memory,---then yes, you must
understand how GC works, how to protect GC-allocated memory, and how to
make GC handle externally-allocated memory.
3. Write D-aware external functions. If you implement them yourself you
can always copy memory instead of storing raw pointers, so that you can
follow solution 1 when writing D code.

"THE GC AS DELIVERED IS COMPLETELY BROKEN FOR REAL THREADED APPLICATONS

Isn't it a bit overstated?

through trying (and still am) random segfaults and large memory usage. -
even by writing it down helped a lot in understanding the whole issue.
(and I hope helps others a bit..)
I'm alot closer now to solving the issues than I think I was... In part
by writing the email and explaining the issues to myself..
- But I still think that a large multi-threaded application having to
rely on the GC is not a pretty prospect, and that being able to manage
memory in a tighter clearer, and cleaner way would be preferable.. -
imagine 500 threads being paused and scanned every few seconds to ensure
memory doesn't get out of hand - this doesn't sound efficient...
(althought I'm not sure I've got any great answers yet)
It's also a wake-up call to the authors of things like DBI - which
really should read this and understand that you can not send connect()
the username/password etc. without adding root to them.... - which I
doubt currently many libraries in dsource address..

You basically say that memory could be prematurely freed if an external
non-D function stores a pointer to GC-allocated memory internally. This
is quite different from "COMPLETELY BROKEN" to me. You don't even
mention that for both extern(C) and extern(Windows) pointers are pushed
on stack so that memory is protected at least until the external
function returns. Well it *can* overwrite its arguments with an
unrelated data, but in all other scenarios your memory is protected.
The solutions are:
1. Don't bother. This is the correct approach for most of CRT and Win32
functions because they do not store pointers.
2. Write wrappers. If you must use a non-pure, non-reentrant C library
which stores pointers, or returns malloc'd memory,---then yes, you must
understand how GC works, how to protect GC-allocated memory, and how to
make GC handle externally-allocated memory.
3. Write D-aware external functions. If you implement them yourself you
can always copy memory instead of storing raw pointers, so that you can
follow solution 1 when writing D code.

- But I still think that a large multi-threaded application having to
rely on the GC is not a pretty prospect, and that being able to manage
memory in a tighter clearer, and cleaner way would be preferable.. -
imagine 500 threads being paused and scanned every few seconds to ensure
memory doesn't get out of hand - this doesn't sound efficient...
(althought I'm not sure I've got any great answers yet)

Unless you've got 500 cores, you're actually pausing 2, 4, okay 8
threads, in case of 100% efficiency of your code. There *should* be a
benefit from collecting garbage in parallel, especially after number of
cores raises to dozens.
Also, overhead of scanning 500 threads is to scan 500 stacks instead of
one. Doesn't sound like a big deal.

It's also a wake-up call to the authors of things like DBI - which
really should read this and understand that you can not send connect()
the username/password etc. without adding root to them.... - which I
doubt currently many libraries in dsource address..

Well, yes, connect() is an asynchronous call which requires specifically
that buffers passed must stay valid while request is in progress. This
causes troubles in C either.

There *should* be a
benefit from collecting garbage in parallel, especially after number of
cores raises to dozens.

Do you know something I don't? According to my research, a separate thread
touching everyone else's memory causes huge amounts of cache misses and page
faults. These far outweigh any benefit gleaned from running the GC in
parallel.
-Craig

There *should* be a
benefit from collecting garbage in parallel, especially after number of
cores raises to dozens.

Do you know something I don't? According to my research, a separate thread
touching everyone else's memory causes huge amounts of cache misses and page
faults. These far outweigh any benefit gleaned from running the GC in
parallel.

A 'separate thread touching everyone else's memory' would inevitably
require to pause 'everyone' so that wasn't what I was talking about. I
was talking about a hypothetical situation when every thread were
collecting its own garbage leaving other threads' garbage alone. This
approach, assuming large amounts of thread-local garbage, would
definitely benefit from parallelization. But, even being hypothetical,
it couldn't be free, and inevitable overhead may outweight benefits if
number of parallel units is small.