As far as I see, the D garbage collector is a conservative
implementation. Is that correct?
Conservative gc means, the gc does not know where the pointers are
located. Every 4-byte word is interpreted as potential pointer. If the
value is in the address range of the gc heap, it can prevent objects or
complete trees from being freed.
This is no problem for most application. But isn't this a show stopper
for secure applications, like server processes?
How to prevent hacks? If someone for magic knows critical adresses and
supplies them in input values (data fields), he can force the
application to go down, running out of memory.
Frank

As far as I see, the D garbage collector is a conservative
implementation. Is that correct?

Yes.

Conservative gc means, the gc does not know where the pointers are
located. Every 4-byte word is interpreted as potential pointer. If the
value is in the address range of the gc heap, it can prevent objects or
complete trees from being freed.
This is no problem for most application. But isn't this a show stopper
for secure applications, like server processes?

I suppose that depends on the security constraints. A sufficiently
paranoid programmer could always store data encrypted in memory, or
explicitly call delete on temporary data.

How to prevent hacks? If someone for magic knows critical adresses and
supplies them in input values (data fields), he can force the
application to go down, running out of memory.

And if the attacker has physical access to the machine he can extract
sideband information simply by detecting voltage variations in the
motherboard. While I agree that the GC could be tuned a bit, I don't
find the security argument to be terribly persuasive, as such
applications must already be careful about how data is managed.
Sean

An attacker has as much time as needed. He will get all knowledge
neccessary and perhaps he can calculate such addresses without physical
access to the machine. But this is not the real problem. Random data in
integers and floating point values can also make problems.
I think, this is a really big security problem and makes reliable
programs impossible. If only knowledge or random data can cause memory
leaks, than this is a problem.
What about a program running out of memory after 3 days. Is it a program
bug, or because of randomly matching data values?
If the solution is to call delete manually, then the gc makes no sense
at all.
This is a show stopper, because everything is base on gc allocated
memory. You will only get a predictable behaviour with an precise (vs
conservative) GC.
Frank

An attacker has as much time as needed. He will get all knowledge
neccessary and perhaps he can calculate such addresses without physical
access to the machine. But this is not the real problem. Random data in
integers and floating point values can also make problems.

An attacker could even use the lib. source code to figure it out, but...

I think, this is a really big security problem and makes reliable
programs impossible. If only knowledge or random data can cause memory
leaks, than this is a problem.
What about a program running out of memory after 3 days. Is it a program
bug, or because of randomly matching data values?
If the solution is to call delete manually, then the gc makes no sense
at all.
This is a show stopper, because everything is base on gc allocated
memory. You will only get a predictable behaviour with an precise (vs
conservative) GC.

I agree that it can be a problem but disagree that it is a show-stopper...
What I think Sean's reply was getting at is that a "proper" design would go far
in mitigating the problem no matter the type of collector or memory mgmt.
strategy used.
That doesn't necessarily mean hack or work-around either, because in any case
good design of server type software shouldn't turn over control of resource
mgmt. to a "third party" like a GC (imho), it should be more tightly controlled
no matter how good the GC is.
One way to design for the issue you raise can be a "revolving buffer" where the
same chunk of memory is allocated once and used to service the requests and
responses until shutdown. e.g.: There can be a bound on the buffer because the
slice read (from e.g. a socket) is controlled, and buffer mgmt. is made alot
easier with D array semantics (like slicing). Then issues like data values
within the address range of the heap won't matter -- not only more secure but
potentially a lot more efficient as well. IIRC, that is similiar to what Mango
HTTP server does (as an example): http://mango.dsource.org/
The good news - since D's GC is not really a "bolt-on" like it has to be for C
or C++ - is that probably a type-aware collector can be done w/o affecting D's
current C-like pointer semantics. But I think ruling D out as-is for server
software is a little strong since D gives us so many options for managing
memory.

The intention of the GC is, to disburden the programmer from the whole
memory management. He only has to care about setting unused references
to null. And he can rely on the collector to find unused memory chunks.
But now it turns out, that this is not true.
So, If I want to rely on the GC, this is a show stopper for me.
A large piece of audio data in GC heap can completly currupt the GC.
If i only have a few integer variable this risk is extremly low, but it
is not gone.
So, how many variable are allowed, until we have to call it a show stopper?
Sorry, for being so pedantic.
FrankBenoit
keinfarbton

The intention of the GC is, to disburden the programmer from the whole
memory management. He only has to care about setting unused references
to null. And he can rely on the collector to find unused memory chunks.
But now it turns out, that this is not true.
So, If I want to rely on the GC, this is a show stopper for me.
A large piece of audio data in GC heap can completly currupt the GC.
If i only have a few integer variable this risk is extremly low, but it
is not gone.
So, how many variable are allowed, until we have to call it a show
stopper?

Well said, Frank.
And very good point.

Sorry, for being so pedantic.

I think D need some pedantic league around. Thomas is the only one
gentleman so far who are trying to bring some "ordnung" here.
Joke. Well, sort of.

The intention of the GC is, to disburden the programmer from the whole
memory management. He only has to care about setting unused references
to null. And he can rely on the collector to find unused memory chunks.
But now it turns out, that this is not true.
So, If I want to rely on the GC, this is a show stopper for me.
A large piece of audio data in GC heap can completly currupt the GC.
If i only have a few integer variable this risk is extremly low, but it
is not gone.
So, how many variable are allowed, until we have to call it a show
stopper?

Well said, Frank.
And very good point.

Yes.
There are two problems with this "audio data". First, it takes quite
long in the mark phase to scan such a vast data stretch. Second, as
noted, it contains enough "pointers" to shoot down Air Force One.

Sorry, for being so pedantic.

I think D need some pedantic league around. Thomas is the only one
gentleman so far who are trying to bring some "ordnung" here.
Joke. Well, sort of.

Ja, Fritz, Ordnung, aber nur _fast_ überalles.
---
Would it be too hard to have the compiler automatically mark "obvious
data" as non-scannable?
I mean, stuff gotten from streams or files should never contain pointers
anyhow. Likewise, the compiler _should_ know whether a large array
contains pointers or not. If not, then the entire array might be marked
as non-scannable.
Doing this non-pedantically might gain much speed in GC, without making
the program itself much slower. In other words, the compiler should not
bother with _every_ item known not to contain pointers, because this
would result in a long list of scan/no-scan areas for the GC. But if
there was a "lower size" or something like it, then this might actually
work as intended.
Opinions?

Would it be too hard to have the compiler automatically mark "obvious
data" as non-scannable?
I mean, stuff gotten from streams or files should never contain pointers
anyhow. Likewise, the compiler _should_ know whether a large array
contains pointers or not. If not, then the entire array might be marked
as non-scannable.
Doing this non-pedantically might gain much speed in GC, without making
the program itself much slower. In other words, the compiler should not
bother with _every_ item known not to contain pointers, because this
would result in a long list of scan/no-scan areas for the GC. But if
there was a "lower size" or something like it, then this might actually
work as intended.

I think someone actually wrote a GC patch a while back that did this, so
it's definitely possible with D as-is. I would like to see this done in
Phobos before 1.0, and it's on my mental to-do list for Ares as well.
Sean

As far as I know the D GC can be replaced.
There are many GC theories and I think most of them can not be corrupted with
garbage. (They handle with working sets, aging and so on.)
The problem is not a problem IMHO
Tamas Nagy
In article <dvhkut$541$1 digitaldaemon.com>, Frank Benoit says...

The intention of the GC is, to disburden the programmer from the whole
memory management. He only has to care about setting unused references
to null. And he can rely on the collector to find unused memory chunks.
But now it turns out, that this is not true.
So, If I want to rely on the GC, this is a show stopper for me.
A large piece of audio data in GC heap can completly currupt the GC.
If i only have a few integer variable this risk is extremly low, but it
is not gone.
So, how many variable are allowed, until we have to call it a show stopper?
Sorry, for being so pedantic.
FrankBenoit
keinfarbton

As far as I know the D GC can be replaced.
There are many GC theories and I think most of them can not be corrupted
with
garbage. (They handle with working sets, aging and so on.)
The problem is not a problem IMHO
Tamas Nagy

It seems that conservative mark-n-sweep GC is the only option for D
(I mean for default GC). Which is not bad in fact. It is simple and
compact in implementation.
GC as one of possible memory managers. In effective systems it is used
in cooperation with implicit memory managment.
And this what is extremely good in D - it allows to use best of both
worlds.
Speaking about server side.
In fact it should be no GC in common sense there.
Memory allocation in execution of some request shall be done in
memory pool. Such pool (raw memory chunk) can be dropped at the end
of request in the whole - without any dtors and the like.
Sort of Apache memory pools.
I think that D-on-the-server frameworks shall use this approach.
This will eliminate problem mentioned by Frank completely and will
make D servers lightning fast.
Andrew.

Speaking about server side.
In fact it should be no GC in common sense there.
Memory allocation in execution of some request shall be done in
memory pool. Such pool (raw memory chunk) can be dropped at the end
of request in the whole - without any dtors and the like.
Sort of Apache memory pools.

I think we're stuck with using the GC for built-in features, ie. dynamic
arrays and AAs. But this shouldn't amount to a tremendous percentage of
allocated space in server apps.
Sean

As far as I know the D GC can be replaced.
There are many GC theories and I think most of them can not be corrupted with
garbage. (They handle with working sets, aging and so on.)
The problem is not a problem IMHO

Yes, you can exchange the gc. But at the moment we have this
implementation, a conservative one. And as non-compiler-implementor I
cannot change the gc from conservative to precise, because the interface
lacks of the reference information.
In serveral papers i red that it is not possible to make a gc, that is
optimal for all applications.
This said, it would be a good thing to have an open standard to
integrate own GC implementations. This can help D in various ways.
- Multiple implementations can show the advantages of each way
- Each application can tune the used GC.
- Special solutions for special cases are possible (e.g. realtime,
gaming, secure applications)
- The D community can contribute to the GC implementation work
- D can become a GC laboratory :)
The current interface serves only for a stop-the-world conservative GC
implementations. Other implementations require some kind of compiler
assistance. e.g. read/write barrier, information about position of
references, sychronisation points, etc.
For an interface which should serve for many possible GCs, it should support
- stop-the-world, incremental, concurrent
- copying, mark-sweep
- moving, non-moving
- generational GC
- ??? Building Objects out of blocks => no fragmentation ???
So I try to begin with a few thoughts about such a "GC integration
interface" - GCII:
Reference info for classes, structs:
The allocation function of the GC should not only receive the size of
memory, which is required. It should also receive a bitfield with the
information, which words in this memory are references.
Reference info for the stack:
each stack frame begins with the bitfield with reference information.
If the GC scans the stack, the frames have to be recognized.
This could be dissabled for a conservative GC.
Read/Write Barrier:
Some GC need to run some code each time a reference is overwritten
and/or if a reference is read.
A flexible way can be, to give the compiler an function, to use for read
and write accesses. These functions should always be inlined and
optimized. e.g.:
___ref_assign( void * trg, void * src ){ trg = src; }
void* ___ref_read( void * src ){ return src; }
Does this make sense?
Please make additions.
Frank Benoit
^^ ^^-^^^^^^.de

As far as I know the D GC can be replaced.
There are many GC theories and I think most of them can not be corrupted with
garbage. (They handle with working sets, aging and so on.)
The problem is not a problem IMHO

Yes, you can exchange the gc. But at the moment we have this
implementation, a conservative one. And as non-compiler-implementor I
cannot change the gc from conservative to precise, because the interface
lacks of the reference information.
In serveral papers i red that it is not possible to make a gc, that is
optimal for all applications.
This said, it would be a good thing to have an open standard to
integrate own GC implementations. This can help D in various ways.
- Multiple implementations can show the advantages of each way
- Each application can tune the used GC.
- Special solutions for special cases are possible (e.g. realtime,
gaming, secure applications)
- The D community can contribute to the GC implementation work
- D can become a GC laboratory :)
The current interface serves only for a stop-the-world conservative GC
implementations. Other implementations require some kind of compiler
assistance. e.g. read/write barrier, information about position of
references, sychronisation points, etc.
For an interface which should serve for many possible GCs, it should support
- stop-the-world, incremental, concurrent
- copying, mark-sweep
- moving, non-moving
- generational GC
- ??? Building Objects out of blocks => no fragmentation ???

Have you had a look at Sean's work in Ares? He's been addressing this
very issue.

As far as I know the D GC can be replaced.
There are many GC theories and I think most of them can not be
corrupted with
garbage. (They handle with working sets, aging and so on.)
The problem is not a problem IMHO

Yes, you can exchange the gc. But at the moment we have this
implementation, a conservative one. And as non-compiler-implementor I
cannot change the gc from conservative to precise, because the interface
lacks of the reference information.
In serveral papers i red that it is not possible to make a gc, that is
optimal for all applications.
This said, it would be a good thing to have an open standard to
integrate own GC implementations. This can help D in various ways.
- Multiple implementations can show the advantages of each way
- Each application can tune the used GC.
- Special solutions for special cases are possible (e.g. realtime,
gaming, secure applications)
- The D community can contribute to the GC implementation work
- D can become a GC laboratory :)
The current interface serves only for a stop-the-world conservative GC
implementations. Other implementations require some kind of compiler
assistance. e.g. read/write barrier, information about position of
references, sychronisation points, etc.
For an interface which should serve for many possible GCs, it should
support
- stop-the-world, incremental, concurrent
- copying, mark-sweep
- moving, non-moving
- generational GC
- ??? Building Objects out of blocks => no fragmentation ???

Have you had a look at Sean's work in Ares? He's been addressing this
very issue.

The big missing piece at this point is a way to tell the GC what
portions of allocated memory may contain pointers. Some of this will
require improved RAII, but some could be done now. I've been
considering adding an additional parameter or two to gc.malloc,
gc.calloc, and gc.realloc to pass this information. But since it would
also require modifications to the GC (with which I'm not entirely
familiar) I haven't done so yet.
Sean