Garbage Collection in .NET - A deeper look for the beginners

Introduction

Garbage collection is a process of releasing the memory used by the objects,
which are no longer referenced. This is done in different ways and different
manners in various platforms and languages. We will see how garbage collection
is being done in .NET.

Garbage Collection basis

Almost every program uses resources such as
database connection, file system objects etc. In order to make use of these
things some resources should be available to us.

First we allocate a block of memory in the
managed memory by using the new keyword. (This will emit the
newobj instruction in the Microsoft intermediate language code generated
from C#, VB.NET, Jscript.NET or any other .NET compliance language).

Use the constructor of the class to set the
initial state of the object.

Use the resources by accessing the type’s
members.

At last CLEAR
THE MEMORY.

When we look at these steps, it seems to be a very
simple process to do with. But how many of us do that without forgetting to
release the memory block. C++ programmers will agree with me. C++ has got a
special member function called Destructor. It has the same name of the
constructor or the class with the ‘~’ (tilde) symbol to start with. This is a
special kind of function which will be called every time by the system when ever
the system finds that object will not be used any more by the program. (When the
scope and lifetime of the object goes off).

But how many times have programmers forgotten to
release the memory. Or how many times the programmers try to access the memory
which was cleaned.

These two are the serious bugs, which will lead us to memory leak and commonly
occurring. In order to overcome these things the concept of automatic memory
management has come. Automatic memory management or Automatic garbage collection
is a process by which the system will automatically take care of the memory used
by unwanted objects (we call them as garbage) to be released. Hurrah….. Thanks
to Microsoft's Automatic Garbage collection mechanism.

Automatic Garbage Collection in
.NET

When Microsoft planned to go for a new generation platform called .NET with
the new generation language called C#, their first intention is to make a
language which is developer friendly to learn and use it with having rich set of
APIs to support end users as well. So they put a great thought in Garbage
Collection and come out with this model of automatic garbage collection in
.NET.

They implemented garbage collector as a separate thread. This thread will be
running always at the back end. Some of us may think, running a separate thread
will make extra overhead. Yes. It is right. That is why the garbage collector
thread is given the lowest priority. But when system finds there is no space in
the managed heap (managed heap is nothing but a bunch of memory allocated for
the program at run time), then garbage collector thread will be given REALTIME
priority (REALTIME priority is the highest priority in Windows) and collect all
the un wanted objects.

How does Garbage collector locate
Garbage

When an program is loaded in the memory there will be a bunch of memory
allocated for that particular program alone and loaded with the memory. This
bunch of memory is called Managed Heap in .NET world. This amount of memory
will only be used when an object is to be loaded in to the memory for that
particular program.

This memory is separated in to three parts.

Generation Zero.

Generation One and

Generation Two.

Generation Zero

Generation
One

Generation
Two

Figure 1.1 Managed Heap Structure.

Ideally Generation zero will be in smaller size, Generation one will be in
medium size and Generation two will be larger.

When we try to create an object by using NEW keyword in the high level
languages. It will simply emit newobj in to the MSIL file. (newobj
is a Microsoft Intermediate Language instruction to create a new type). When
newobj executes, the system will,

Calculate the number of bytes required for
the object or type to be loaded in to the managed heap.

Add the bytes required for an object’s
overhead. Each object has two overhead fields: a method table pointer and a
SyncBlockIndex. On a 32-bit system, each of these fields requires 32 bits,
adding 8 bytes to each object. On a 64-bit system, each is 64 bits, adding 16
bytes to each object.

The CLR then checks that the bytes required
to allocate the object are available in the reserved region (committing storage
if necessary). IF the object fits, it is allocated at the address pointed to by
NextObjPtr. The type’s constructor is called (passing NextObjPtr) for the this parameter), and the
newobj MSIL instruction (or the new
operator) returns the address of the object. Just before the address is
returned, NextObjPtr is advanced past the object and indicates the address where
the next object will be placed in the heap.

These processes will happen at the
Generation zero level.

A

B

C

Figure 1.2 Allocating objects in the Managed
Heap

When Generation Zero is full and it does not have enough space to occupy
other objects but still the program wants to allocate some more memory for some
other objects, then the garbage collector will be given the REALTIME priority
and will come in to picture.

Now the garbage collector will come and check all the objects in the Generation
Zero level.
If an object’s scope and lifetime goes off then the system will automatically
mark it for garbage collection.

Note:

Here in the process the object is just marked and not collected. Garbage
collector will only collect the object and free the
memory.

Garbage collector will come and start examining all the objects in the level
Generation Zero right from the beginning. If it finds any object marked for
garbage collection, it will simply remove those objects from the
memory.

Here comes the important part. Now let us refer the figure 1.2 above.
There are three objects in the managed heap. If A and C are not marked but B has
lost it scope and lifetime. So B should be marked for garbage collection. So
object B will be collected and the managed heap will look like this.

A

C

Figure 1.3 Memory Structure after
Sweep

But do remember that the system will come and
allocate the new objects only at the last. It does not see in between. So it is
the job of garbage collector to compact the memory structure after collecting
the objects. It does that also. So the memory would be looking like as shown
below now.

A

C

Figure 1.4 Memory Structure after
Compact

But garbage collector does not come to end after
doing this. It will look which are all the objects survive after the sweep
(collection). Those objects will be moved to Generation One and now the
Generation Zero is empty for filling new objects.

If Generation One does not have space for objects
from Generation Zero, then the process happened in Generation Zero will happen
in Generation one as well. This is the same case with Generation Two
also.

You may have a doubt, all the generations are
filled with the referred objects and still system or our program wants to
allocate some objects, then what will happen? If so, then the
MemoryOutofRangeException will be thrown.

It very much clear that If some one come wallmart in morning will stay for whole day and return at evening and one who come after ward will leave soon. so wallmart make algorithm for parking to keep vehicle in that manner so that easy to park or get back.

so .net design GC for same allocating memeory at tail of heap and implement generation counter for each object.
after each iteration of GC will swap memory as my friend suggested.

There is no such thing as "DOTNET". Microsoft has a bunch of products and technologies that use the ".NET" moniker. You don't call the programming language CPLUSPLUS do you? Better still, this article isn't about .NET anyways, it is about the current versions of the Common Language Runtime. Marketing types don't know the difference, but around a programming forum like this it would be better to be more precise and refer to the CLR.

Your description of background threads and GC misses the fact that there are two quite different GC's available with current CLR, there is a concurrent one that runs in a background thread and a synchronous one. The synchonous version is usually used in server environments, in particular ASP.NET uses it.

Calculating the allocation size of a type does not happen at object instantiation time as you describe; it happens when the type is loaded.

'newobj' is not the only IL instruction that allocates memory from the heap, in particular don't forget about 'newarr'.

When allocation happens there is also an important special case for 'large' objects, this would merit discussion in any article on GC.

The marking process isn't about examining the objects on the heap. It happens by walking all the 'gc roots'. Roots are all the non-heap things that can reference objects, basically variables on the stack and registers. The GC tracks those and when it needs to collect it goes through all those roots and recursively marks everything they reference. After this process, anything not marked is subject to being collected.

It is also worth noting that finalizers mess with this process as well, if something is found to not be marked but has a finalizer that needs to be run it isn't collected. It is added to the finalizer queue and will be collected during some later GC.

Wow - I don't think I've ever had a reply to a posting from so far back. Glad to see someone pokes around in the old information.

GC roots are, as I said, any reference to an object on the managed heap that is not located in the managed heap itself. Off hand there are three broad categories of these

1) References on the stack, i.e. local variables, function parameters, fields of value types.
2) Physical machine registers, both the active and for waiting threads.
3) Manually allocated and freed System.Runtime.Interop.GCHandle's, which are used by unmanaged code to maintain a reference to managed objects.

The CLR virtual machine keeps a list of all of these 'roots', the code that does so is available in the SSCLI if you want a precise look.

When a GC is triggered the VM runs through that list, follows each root to the object it references, marks it, recursively examines all fields of that object and repeats this process for any that are object references.

This process means every object that can possibly be accessed by managed code has been mark. This means anything not marked is 'garbage' and may be reclaimed.

There are lots of other fun GC issues that I'm ignoring here, but hopefully that addressed what you were asking about regarding gc root traversal.