Garbage Collector

WEBINAR:On-demand webcast

How to Boost Database Development Productivity on Linux, Docker, and Kubernetes with Microsoft SQL Server 2017 REGISTER >

As you all know, any program use resources. These resources can be files, data base resources, network connections, memory buffers, objects, and so forth. Using any resources requires memory to be allocated. This is achieved by using the following steps:

Allocate memory for the type that represents the resource by calling the newobj intermediate language instruction. This instruction is emitted when you use the new operator.

Initialize the memory to make the resource available. The resource's initial state is created by the type's constructor.

Use the resource by accessing type members.

Clean up the resource state.

Free the memory occupied by the resource. This task is performed by the garbage collector.

The previous steps generate two major bugs. First, programmers often forget to free memory when it is no longer needed. Second, programmers try to access memory it was freed. These two cases represent the worst bugs that can reside in an application because the behavior of the application is unpredictable and it's also very difficult to track them. Resource management is a difficult task and distracts you, the programmer, from the main problems that you have to solve.

This is the reason that garbage collector was created. It will take all the burden of memory handling from your shoulders. You must keep in mind that garbage collector does not knows about a resource that is represented by types in memory. So, that it cannot clean up the resource in a proper manner.

You must perform these steps by writing the corresponding code that will clean up the resource. There are two methods that you can make use of to perform the resource cleaning: Finalize and Dispose.

Most types existent in the .NET framework do not require resource cleanup, types such as Int32, String, and ArrayList. But there are also types that wrap some unmanaged resource like a network connection, a database connection, an icon, and so on. The Common Language Runtime, also known as CLR, requires that all resources to be allocated from the managed heap.

This managed heap resembles the heap from the C runtime heap with one major difference: You never free objects from the managed heap; the object will be automatically freed when the application doesn't need them. Now, see what happens when a process is initialized. The CLR will reserve a contiguous zone of address space. This address zone represents the managed heap.

The managed heap maintains a pointer to this address space; call it NextObjectPtr. This pointer indicates where in the managed heap the next object will be allocated. Initially, NextObjectPtr points to the base address from the reserved address space.

As your intuition tells you, the newobj instruction creates a new object. As I mentioned earlier, this instruction is emitted when you make use of the new keyword. The newobj instruction will determine the CLR to perform the next steps:

Calculate the number of bytes required by the type for which memory will be allocated and also for all its based types.

Add the bytes required for an object overhead. Each object has two overhead fields, a method table pointer and a SyncBLockIndex. If you are using a 32-bit system, each field requires 32 bits; this will add 8 bytes to each object. In the case of a 64-bit system, each field requires 64 bits, which will add 16 bytes to each object.

The CLR then checks whether the bytes required to allocate the object are available in the reserved address space (managed heap). If they will fit, it is allocated at the address pointed by NextObjectPtr, the constructor is called passing NextObjectPtr for this parameter, and the new operator will return the address of the object. The NextObjectPtr is moved after the currently allocated object and indicates the address where the next object will be allocated at in the managed heap.

When an application calls the new operator to create a new object, there might not be enough space for it. The managed heap checks this by adding the required bytes of the objet to the address in NextObjectPtr. If the value exceeds the address space, the managed heap is full and garbage collection takes action. Garbage collector checks whether there are any unused objects in the managed heap. If this kind of objects do exist, the memory they use can be reclaimed. If there is no more memory in the heap, the new operator will throw an OutOfMemoryException.

An application has a set of roots. A root can be considered to be a memory storage location that contains a pointer to a reference type. This pointer can refer to an object or is set to null if the object doesn't exist.

All global or static reference type variables are considered roots; also, any local variable reference type or parameter variable on a thread stack is considered a root. When the garbage collector starts running (garbage collector starts when generation 0 of object is full, the garbage collector generation mechanism is used for performance improving, I'm not going to discuss this matter in this article), it will assume that all roots from the managed heap do not refer to any object.

The garbage collector starts to iterate thru all roots and creates a graph with all objects that can be reached. In the image, objects A and B are directly referenced by the roots so they will be added to the graph.

When the garbage collector reaches object C, it will observer that this object references another object from the managed heap, object D. So, the object also will be added to the garbage collector graph. The entire iteration is performed recursively.

After the graph is completed, this will contain all objects reachable from your application. All other objects that are not a part of this graph are considered garbage. The garbage collector will iterate the heap linearly, searching for free large continuous blocks of memory where a new object could be allocated.

Also, the garbage collector will shift non-garbage objects in memory, using the memcpy function to compact the memory heap. This operation will make all pointers to objects invalid. The garbage collector will correct all the invalid pointers. After the managed heap memory is compacted, the NextObjectPtr will point exactly after the last non-garbage object.

Now that you have a general overview of how garbage collector works, you can design your applications properly. In the next article, I will talk in more detail about object generation and also about the Finalize and Dispose methods and how they should be used.

About the Author

Michael Heliso

I work as a software developer for about 3 years. Until these days I have worked for different international software companies using different technologies and programming languages like: C/C++, lotus script, lotus API, C#, ASP.NET, MS-SQL, Oracle, Domino Server. My main interest, at this time, is focused on .NET technology (http://dotnetcaffe.net).

Comments

nice

Posted by kirants
on 09/13/2007 12:13pm

would be good to see references from MSDN that explain how garbage collector works, the algorithm.
Also, what you have explained seems to be an implementation which microsoft could very well change in a future CLR. Isn't it ?

Re:nice

Posted by Michael.Heliso
on 09/14/2007 03:31am

Yes, they might change it...but for now this is how it's working. I always try to design my apps in a way that GC is optimized.