Introduction

Polymorphism is by far the most important and widely used concept in object oriented programming. Some of the widely used technologies and libraries like COM, MFC etc. have polymorphism as their foundation. If you look at all the original design patterns, almost every pattern uses polymorphism in its structure.

In this article I hope to unveil the work done by the C++ compiler in implementing polymorphism. Some of the internals of C++ like virtual table, virtual table pointer etc. will also be touched upon while we implement polymorphism using the C language.

The Problem

I have taken a simple hierarchy of three classes to implement polymorphism.

The class diagram of the problem is as follows:

X is the base class which has three virtual functions – One, Two and Three. Class Y inherits from X publicly and overrides the function One. Class Z inherits publicly from Y and overrides the function Two. All classes have constructors and destructors and the destructors are declared as virtual. Class X contains a character array to store the name of the class which is inherited by the other classes. Each of the three classes has its own integer member variable.

Some C++ concepts

Here we will take a look at some of the C++ concepts that are not available in C and discuss how to implement the same in C.

Constructor and Destructor

C++ implementation

Constructors and destructors are special functions in C++ that are automatically called when an object is created and destroyed respectively. The new operator first allocates memory for the class and then calls the class’ constructor. Similarly, the delete operator first calls the class’ destructor and then de-allocates memory for the class.

Our implementation

In our C implementation, we will create two functions for each class – one for the constructor and the other for the destructor. The constructor function name will be the class name appended with _Ctor and the destructor function name will be the class name appended with _Dtor. For example, the constructor and destructor functions for the class X will be X_Ctor and X_Dtor respectively. We will call the constructor function just after allocating memory for the class using the malloc function and the destructor function just before de-allocating memory for the class using the free function.

Memory layout of an object

C++ implementation

When an object is instantiated, memory is allocated only for the data members of the class. Only a single copy of the member functions exist and they are shared by all the instances of the class.

Our implementation

To implement this behavior, we create all functions of all the classes as global functions. This includes the constructor and destructor functions mentioned in the previous section. The function names will be prefixed with the class name and an underscore. For example, the function Two belonging to class X will be named as X_Two. The data members of the class will be put in a structure as the class keyword is not available in C.

C++ implementation

Since only a single copy of the member functions exist in memory, to differentiate between the various class instances, these member functions have access to a this pointer, which holds the address of the instantiated object. The functions use this pointer to access the data members. Internally, the address in this pointer is passed to the member functions through the global ecx register of the microprocessor.

Our implementation

To implement this pointer, we declare a global integer variable called ECX. Before any function is called, this ECX variable will be set to the address of the memory allocated for the structure. All the functions expect the ECX variable to correctly point to the allocated memory. The functions use this ECX variable to access the members of the structure.

Virtual table and virtual table pointer

C++ implementation

Every class which has at least one virtual function, be it a virtual destructor, will have a virtual table associated with it. It is nothing but an array of function pointers. The virtual table for a class is populated with the addresses of the virtual functions declared in the class as well as the virtual functions inherited by the class. In the case of inherited virtual functions, only those that are not overridden in the class are taken into consideration.

The sequence of populating the virtual table is as follows:

Address of the virtual destructor is added.

Address of the virtual functions inherited from the base class in the order of declaration is added.

Addresses of the new virtual functions declared in the class are added in the order of declaration.

Our implementation

We will implement virtual functions as a global array of void pointers and populate the array with the address of the virtual functions of the class. In our problem there are three classes with virtual functions and so there will be three virtual tables.

The virtual table for class X will contain the address of its destructor and the addresses of its three virtual functions.

The virtual table for class Y will contain the address of its destructor and the addresses of the virtual functions inherited from class X. The address of the overridden function Y_One will replace the address of the inherited function X_One. Step number 4 explained in the above section is not done as there are no new virtual functions declared in class Y. If there were, it would have been appended to the end of the virtual table.

Similarly, the virtual table for class Z will contain the address of its destructor and the addresses of the virtual functions inherited from class Y. The address of the overridden function Z_Two will replace the address of the inherited function X_Two. Here again, step number 4 explained in the earlier section is not done.

C++ implementation

A virtual table is associated with a class using a virtual table pointer or vptr. The vptr is a pointer which is the very first data member of a class. This vptr will hold the address of the virtual table of the class thereby associating the class with its virtual table. This association or initialization of vptr is done in the constructor of the class.

Our implementation

In each of the structures that we are going to implement, the first member will be an integer pointer which will act as the virtual table pointer or vptr. In the constructor function of each class, the vptr is initialized with the address of its respective virtual table.

Inheritance

C++ implementation

When a new class is derived by inheriting from a base class, the derived class inherits all the data members of the base class. Even though, by concept, the private data members are not inherited, internally the derived class will contain even the private data members. But access to these members is not allowed by the compiler. Since there are no private data members in our problem, it is not a concern for us.

Our implementation

To implement inheritance we will create three separate structures. Structure X will contain a vptr as the first member followed by the character array to store the class name followed by an integer variable ‘x’. Structure Y will contain all the members of structure X in the same order as it is in structure X followed by an integer variable ‘y’. Similarly structure Z will contain all the members of structure Y in the same order as it is in structure Y followed by an integer variable ‘z’.

C++ implementation

Another concept associated with inheritance is with respect to the constructor and destructor. The constructor of the derived class should call the constructor of the base class before doing any initialization and the destructor of the derived class must call the destructor of the base class after doing its cleanup activities. In the case of argument constructors, where it expects some parameters to be passed to it, the user has to explicitly make the base class constructor call. Otherwise, it is done implicitly by the compiler.

Our implementation

Since class Z is derived from class Y which in turn is derived from class X, the constructor and destructor of class Z must call the constructor and destructor of class Y respectively and the constructor and destructor of class Y must call the constructor and destructor of class X respectively. Z_Ctor calls Y_Ctor in the very beginning of the function and Y_Ctor calls X_Ctor in the very beginning of the function. Likewise, Z_Dtor calls Y_Dtor at the very end of the function and Y_Dtor calls X_Dtor at the very end of the function.

Here is the summarized list of all the C++ concepts that we need to implement ourselves in order to implement polymorphism:

Constructor and destructor

Class member functions

Access to this pointer in member functions

Virtual table and virtual table pointer

Inheritance

In the compiler’s shoes

Now we have to do all the dirty work that the C++ compiler does to implement polymorphism. For this, we shall build the things we need to implement keeping in mind all the things that the compiler knows from the C++ code that we have written.

Constructor

C++ implementation

The code snippet shows the memory allocation for an object of class Z using the new operator. The constructor of the class Z is automatically called after memory allocation:

Our implementation

In the above C++ implementation, the compiler knows that the class to be instantiated is class Z. So it gets the size of the class Z using the sizeof operator and internally calls the malloc function or the HeapAlloc API to allocate the required amount of memory. This is one reason why overloading of the sizeof operator is not allowed in C++. After the memory is allocated, the constructor is called. For this, the compiler knows that the name of the constructor is the same as the class name. So the function with the same name as that of the class is called by the compiler.

In our implementation the constructor of class Z is named Z_Ctor and so we need to call this function after allocating the required amount of memory for class Z. The code snippet shows our C implementation:

C++ implementation

In C++, the derived class constructor calls the constructor of the base class before doing any initialization. As explained earlier, in the case of argument constructors, the base class constructor has to be explicitly called from the derived class constructor. Since we have default constructors in our code, this call is done automatically by the compiler.

Here, the compiler knows that class Z inherits from class Y and class Y inherits from class X. So it adds code to the constructors of the derived classes to call its immediate base class constructors.

Our implementation

In our implementation, the first executable line in the derived class constructor must be the call to the base class constructor. So Z_Ctor must call Y_Ctor and Y_Ctor must call X_Ctor. Given below is the code snippet from our implementation:

Destructor

C++ implementation

The destructor is automatically called when memory for an object is deallocated, the following is the code snippet where the destructor gets called.

<PRE lang=c++>delete pClass;

Here the compiler knows the class to which pClass belongs and also knows the name of the destructor, which is the same as the class name prefixed with a tilde (~). So the compiler first internally calls the destructor and then deallocates memory for the object using the free function or the HeapFree API.

There is also a situation where the destructor can be virtual, like in our case. In such cases, the address of the destructor function will be taken from the virtual table and then called. This is a good feature because the compiler only knows the type of the object on which the delete operator is used. It does not know to which type it actually points to.

For example, if memory is allocated as X* pClass = new Z and deallocated as delete pClass, here pClass is a pointer of type X but its vptr points to the virtual table of class Z. So if the destructor is not virtual, X_Dtor will be called when memory is freed, but if the destructor is virtual, then its address will be taken from the virtual table of class Z, which contains the address of Z_Dtor, which is the correct destructor to call. Since derived class destructors call its immediate base class destructors, the memory deallocation will be complete.

Here, the C++ compiler knows that the destructor is virtual and so the address of the destructor is taken from the virtual table. Another thing that the compiler knows is that the virtual pointer or vptr is the first data member of the class. The compiler also knows that the address of the destructor is the first element in the virtual table.

Our implementation

Keeping the above three points that the C++ compiler knows and assuming that pClass is a pointer of class X pointing to the memory allocated for class Z, we implement the code as shown below:

Two integer pointers representing the virtual pointer and the virtual table are created. A function pointer is also created to hold the address of the virtual destructor.

pClass holds the address of the object of class Z. To get the first 4 bytes of the object, which is the vptr, we typecast pClass to int*. Now that we have the vptr of the object, we can get its contents, which is nothing but the address of the virtual table, by dereferencing vptr using the ‘*’ operator. The first element of the virtual table contains the function address of the destructor function to be called. To get the address stored in the first element of the virtual table, we again dereference vtbl using the ‘*’ operator and typecast it as a function pointer.

The destructor is then called using the function pointer and then the memory is deallocated using the free function.

C++ implementation

In C++, the derived class destructor calls the destructor of the base class after doing all its cleanup activities. This is done automatically by the compiler.

Here again, the compiler knows that the class Z inherits from class Y and class Y inherits from class X. So it adds code to the destructors of the derived classes to call its immediate base class destructors.

Our implementation

In our implementation, the last line in the derived class destructor will be a call to the base class destructor. So Z_Dtor must call Y_Dtor and Y_Dtor must call X_Dtor. Given below is the code snippet from our implementation:

Class member functions

C++ implementation

All instances of a particular class share a single copy of the member functions of that class. Memory is allocated only for the data members each time an instance of a class is created. The member functions use a special this pointer to access the data members of each separate instance of the class. The this pointer holds the address of a particular instance of the class. Internally the address is passed to the functions through the ecx register of the microprocessor.

Our implementation

Functions are not allowed inside structures in the C language. We will create the member functions of the classes as global functions. The names of the functions will be prefixed with the class name followed by an underscore. For example, we will name the function One belonging to class X as X_One and function One belonging to class Y as Y_One.

To create an instance, we allocate memory for a structure using malloc. The above mentioned functions need to know the address of the allocated memory so that it can access the data members of the structure.

We will create a global integer variable called ECX to pass in the address of the allocated memory for the structure. This variable will be initialized with the address of the allocated memory as shown below:

The ECX variable will be used to access the data members of the structures from the functions including the constructor and the destructor. As an example, let’s look at the memory layout of the structure Z and see how to access its members using the ECX variable.

Offset

Member

ECX

int* vptr

ECX + 4

char classname[8]

ECX + 12

int x

ECX + 16

int y

ECX + 20

int z

Given below is the code snippet from the function Z_Two showing how to access the data members of the structure Z. Here ECX will be holding the starting address of the memory allocated to the structure Z.

Virtual table

C++ implementation

As explained earlier, every class having at least one virtual function, either defined in the class or inherited from another class, will have a virtual table associated with it. In our example, all the three classes have virtual tables. The following table shows the virtual table layout of all the three classes.

class X

class Y

class Z

X::Destructor

Y::Destructor

Z::Destructor

X::One

Y::One

Y::One

X::Two

X::Two

Z::Two

X::Three

X::Three

X::Three

Our implementation

To create the above mentioned virtual table layout, we will create three arrays of void pointers, one for each class.

For class X, the compiler knows that it doesn’t have a base class and so no functions are inherited, the destructor is declared as virtual and so its address must be the first element of the virtual table and there are three virtual functions declared. With this knowledge we will create the virtual table for class X as follows:

<PRE lang=c++>void* X_vtable[] =
{
X_Dtor,
X_One,
X_Two,
X_Three
};

For class Y, the compiler knows that it inherits from class X, its destructor is also virtual since the base class destructor is virtual, it inherits three virtual functions from class X and it overrides the function One since it is having the same signature as that declared in the base class. And so we create the virtual table for class Y as follows:

<PRE lang=c++>void* Y_vtable[] =
{
Y_Dtor,
Y_One,
X_Two,
X_Three
};

For class Z, the compiler knows that it inherits from class Y, its destructor is virtual since the base class destructor is virtual, it inherits three virtual functions from class Y and it overrides the function Two since it is having the same signature as that declared in the base class. The virtual table for class Z will look like the following:

<PRE lang=c++>void* Z_vtable[] =
{
Z_Dtor,
Y_One,
Z_Two,
X_Three
};

Virtual pointer

C++ implementation

In C++, the virtual pointer of a class is initialized in its constructor. The compiler knows that the virtual pointer is the first data member of the class and it has already created the virtual table as explained earlier. So initializing the virtual pointer is a simple task of assigning the pointer to the address of the virtual table. The same is done in the destructor also.

Our implementation

In our implementation, we have already created the virtual table and we have an integer pointer in the class to represent the virtual pointer. So all we have to do is to initialize the integer pointer to the address of the virtual table that we have created. The following code snippet shows the initialization of vptr for the class Z. The virtual table of class Z is shown in the above section.

In the above code snippet, ECX contains the address of the class. Since the first data member of a class is the virtual pointer, ECX will be pointing to the virtual pointer.

Inheritance

C++ implementation

C++ uses inheritance to reuse an existing class. The memory layout in our case is as shown in the figure below:

Class X has three data members, the virtual pointer being the first member. Class Y inherits from class X. This means that class Y contains all the data members of class X followed by one of its own data members, totaling four data members. Here again the virtual pointer is the first data member. Similarly, class Z inherits from class Y, meaning that class Z will first contain all the data members of class Y followed by one of its own. So it will have a total of five data members, where the virtual pointer is the first member.

Our implementation

The C language provides no such support for reusing an existing structure. We will re-declare the variables of class X at the beginning of class Y and the variables of class Y at the beginning of class Z to implement inheritance. The code snippet below shows the structures that we have created:

First, memory is allocated for the structure Z using the malloc function and its address is stored in the pClass pointer. The pClass pointer will be used to get the virtual pointer and then the virtual table of the class so that the functions in the virtual table can be called.

The same allocated address is also stored in the global ECX variable. This variable will be used as the base address in the functions that will be called to access the individual data members of the structure.

The last step is to call the constructor function of structure Z. From within the constructor function of structure Z we first call the constructor function of structure Y and from within the constructor function of structure Y we first call the constructor function of structure X. So first X_Ctor is executed and then Y_Ctor is executed and lastly Z_Ctor is executed.

Here the same thing as that in X_Ctor happens, the only difference being, first X_Ctor is called before taking the reference and initializing the members. Here the data member ‘x’ is not initialized. ‘x’ is initialized only in X_Ctor.

The contents of virtual table are taken by offsetting the vtbl variable and are stored into function pointers. The contents of the virtual table of structure Z, as explained earlier, are the addresses of Z_Dtor, Y_One, Z_Two and X_Three in that order.

<PRE lang=c++>One();
Two();
Three();

The functions One, Two and Three are called in that order using the function pointers into which the respective function addresses are stored from the virtual table.

<PRE lang=c++>Dtor();
free(pClass);

Before the memory is deallocated, the destructor function is called using the function pointer, which is initialized with the address of the destructor function taken from the virtual table. After that, the memory is deallocated using the free function.

From within the destructor function of structure Z we call the destructor function of structure Y at the end and from within the destructor function of structure Y we call the destructor function of structure X at the end. So first X_Dtor is executed and then Y_Dtor is executed and lastly Z_Dtor is executed.

First the reference to the virtual pointer is taken by offsetting the ECX variable. Then the virtual pointer is reinitialized. The virtual pointer reinitialization is done in the destructor so that if any calls to virtual functions are made in the destructor, the code doesn’t break and the correct functions are called.

Conclusion

The C++ compiler does a lot of work to implement polymorphism. What we saw was just one feature of OOP that a C++ compiler has to support. There are many more things like access specifiers, templates etc. which again take a lot of work to do on the part of the compiler. What we have seen now is that it is possible in one way or another to do OOP in a language that does not directly support it. This was one way of implementing polymorphism, but compilers could take an entirely different approach altogether with a lot of optimization thrown in.

Share

About the Author

Santosh works as a Technical Architect in God's own Country, Kerala, India. He has been involved with C/C++ since 1996. Started with a few game clones and then went on to commercial software. He started his stint with software training and then went on to professional software development, design and architecture. Unix and C were his favorite in his early days but later grew very fond of Windows especially with the release of Windows NT 4.0 and Visual C++ 6.0. Technologies like COM and .Net fascinate him. He still lurks around Unix once in a while.

Music and the guitar are his second favorites and he manages to impress his friends with these skills when there are nobody better than him around. He is a patient and fun loving character who does not think too much about the future. He hates wasting time and so is working hard (in his dreams) to perfect instant transportation and time travel.

Oh! Yes. He loves Superman. Always did and always will. He would love to become a Superman and rid the world of all evil.

He used to be a Microsoft Visual C++ MVP (October 2009 - September 2013)