7 Answers
7

The garbage collector does scan the stack -- to see what things in the heap are currently being used (pointed to) by things on the stack.

It makes no sense for the garbage collector to consider collecting stack memory because the stack is not managed that way: Everything on the stack is considered to be "in use." And memory used by the stack is automatically reclaimed when you return from method calls. Memory management of stack space is so simple, cheap and easy that you wouldn't want garbage collection to be involved.

(There are systems, such as smalltalk, where stack frames are first-class objects stored in the heap and garbage collected like all other objects. But that's not the popular approach these days. Java's JVM and Microsoft's CLR use the hardware stack and contiguous memory.)

+1 the stack is always fully reachable so no sense to sweep it
–
ratchet freakOct 7 '11 at 18:04

2

+1 thank you, took 4 posts to hit the right answer. I don't know why you had to say everything on the stack is "considered" to be in use, it is in use it at least as strong a sense as heap objects still in use are in use - but that's a real nitpick of a very good answer.
–
psrOct 7 '11 at 18:35

@psr he means that everything on the stack is strongly reachable and has no need to be collected until the method returns but that (RAII) is already explicitly managed
–
ratchet freakOct 7 '11 at 19:15

@ratchetfreak - I know. And I just meant the word "considered" probably isn't needed, it's OK to make a stronger statement without it.
–
psrOct 7 '11 at 20:07

3

@psr: I disagree. "considered to be in use" is more correct both for stack and heap, for very important reasons. What you want is to discard what won't be used again; what you do is that you discard what is not reachable. You might well have reachable data which you won't ever need; when this data grows, you have a memory leak (yes, they are possible even in GC'ed languages, unlike many people think). And one might argue that stack leaks happen as well, the most common example being unneeded stack frames in tail-recursive programs run without tail call elimination (e.g. on the JVM).
–
BlaisorbladeOct 8 '11 at 12:55

Turn your question around. The real motivating question is under what circumstances can we avoid the costs of garbage collection?

Well, first off, what are the costs of garbage collection? There are two main costs. First, you have to determine what is alive; that requires potentially a lot of work. Second, you have to compact the holes that are formed when you free something that was allocated between two things that are still alive. Those holes are wasteful. But compacting them is expensive too.

How can we avoid these costs?

Clearly if you can find a storage usage pattern in which you never allocate something long-lived, then allocate something short-lived, then allocate something long-lived, you can eliminate the cost of holes. If you can guarantee that for some subset of your storage, every subsequent allocation is shorter-lived than the previous one in that storage then there will never be any holes in that storage.

But if we've solved the hole problem then we've solved the garbage collection problem too. Do you have something in that storage that is still alive? Yes. Was everything allocated before it longer-lived? Yes -- that assumption is how we eliminated the possibility of holes. Therefore all you need to do is say "is the most recent allocation alive?" and you know that everything is alive in that storage.

Do we have a set of storage allocations where we know that every subsequent allocation is shorter-lived than the previous allocation? Yes! Activation frames of methods are always destroyed in the opposite order that they were created because they are always shorter-lived than the activation which created them.

Therefore we can store activation frames on the stack and know that they never need to be collected. If there is any frame on the stack, the entire set of frames below it is longer-lived, so they don't need to be collected. And they will be destroyed in the opposite order that they were created. The cost of garbage collection is thus eliminated for activation frames.

That's why we have the temporary pool on the stack in the first place: because it is an easy way of implementing method activation without incurring a memory management penalty.

(Of course the cost of garbage collecting the memory referred to by references on the activation frames is still there.)

Now consider a control flow system in which activation frames are not destroyed in a predictable order. What happens if a short-lived activation can give rise to a long-lived activation? As you might imagine, in this world you can no longer use the stack to optimize away the need to collect activations. The set of activations can contain holes again.

C# 2.0 has this feature in the form of yield return. A method that does a yield return is going to be reactivated at a later time -- the next time that MoveNext is called -- and when that happens is not predictable. Therefore the information that would normally be on the stack for the activation frame of the iterator block is instead stored on the heap, where it is garbage collected when the enumerator is collected.

Similarly, the "async/await" feature coming in the next versions of C# and VB will allow you to create methods whose activations "yield" and "resume" at well-defined points during the action of the method. Since the activation frames are no longer created and destroyed in a predictable manner, all the information that used to be stored in the stack will have to be stored in the heap.

It is just an accident of history that we happened to decide for a few decades that languages with activation frames that are created and destroyed in a strictly ordered manner were fashionable. Since modern languages increasingly lack this property, expect to see more and more languages that reify continuations onto the garbage-collected heap, rather than the stack.

The most obvious answer, and perhaps not the fullest, is that the heap is the location of instance data. By instance data, we mean the data representing the instances of classes, aka objects, that are created at run time. This data is inherently dynamic and the number of these objects, and thus the amount of memory they take up, is known only at runtime. There HAS to be some sore of recovery of this memory or long running programs would consume all of there memory over time.

The memory being consumed by class defintions, constants, and other static data structures is inherently unlikely to increase unchecked. Since there's only a single class definition in memory per an unknown number of run time instances of that class, it makes sense that this type of structure is not a threat to memory usage.

It's worth bearing in mind the reason why we have garbage collection: because sometimes it's difficult to know when to deallocate memory. You really only have this problem with the heap. Data allocated on the stack will be deallocated eventually, so there isn't really any need to do garbage collection there. Things in the data section are generally assumed to be allocated for the program's lifetime.

The size of those is predictable (constant except for the stack, and the stack is typically limited to a few MB) and typically very small (at least compared to the hundreds of MB large applications may allocate).

Dynamically allocated objects typically have a small time frame in which they are reachable. After that, there is no way they can be referenced ever again. Contrast that with entries in the data section, global variables, and such: Frequently, there's a piece of code that references them directly (think const char *foo() { return "foo"; }). Normally, code doesn't change, so the reference is there to stay and another reference will be created each time the function is invoked (which could be at any time as far as the computer knows - unless you solve the halting problem, that is). Thus you couldn't free most of that memory anyway, as it would always be reachable.

In many garbage-collected language, everything that belong to the program being ran is heap-allocated. In Python, there simply isn't any data section and no stack-allocated values (there are the references that local variables are, and there's the call stack, but neither is a value in the same sense as an int in C). Every object is on the heap.

@JasonBaker: Interesting find! It doesn't have any effect though. It's an implementation detail and restricted to builtin objects. That's not to mention that those objects aren't expected to be deallocated ever in the lifetime of the program anyway, aren't, and are also tiny in size (less than 32 byte each, I'd guess).
–
delnanOct 7 '11 at 18:12

As a number of other responders have said, the stack is part of the root set, so it is scanned for references but not "collected", per se.

I just want to respond to some of the comments that imply that garbage on the stack doesn't matter; it does, because it may cause more garbage on the heap to be considered reachable. Conscientious VM and compiler writers either null out or otherwise exclude dead parts of the stack from scanning. IIRC, some VMs have tables mapping PC ranges to stack-slot-liveness bitmaps and others just null out the slots. I don't know what technique is currently preferred.

One term used to describe this particular consideration is safe-for-space.

Would be interesting to know. First thought is that nulling out spaces is the most realistic. Traversing a tree of excluded areas may well take longer than just scanning through nulls. Obviously any attempt to compact the stack is fraught with peril! Making that work sounds like a mind-bending/error-prone process.
–
Brian KnoblauchOct 7 '11 at 20:31

@Brian, Actually, thinking about it some more, for a typed VM you need something like that anyway, so you can determine which slots are references as opposed to integers, floats, etc. Also, regarding compacting the stack, see "CONS Should Not CONS Its Arguments" by Henry Baker.
–
Ryan CulpepperOct 7 '11 at 20:51

What is allocated on the stack? Local variables and return addresses (in C). When a function returns, its local variables are discarded. It is not neccessary, even detrimental, to sweep the stack.

Many dynamic languages, and also Java or C# are implemented in a system programming language, often in C. You could say Java is implemented with C functions and uses C local variables and therefore that Java's garbage collector does not need to sweep the stack.

There is an interesting exception: Chicken Scheme's garbage collector does sweep the stack (in a way), because its implementation uses the stack as a garbage collection first-generation space: see Chicken Scheme Design Wikipedia.