Development

Tickets

Trac Information

Using Virtual Functions on the XMTGreg Mackey

1. Introduction

This experiment is to determine the cost of using virtual functions on an XMT. Some earlier attempts to use virtual functions at Sandia several years ago resulted in bad performance. This experiment will determine the cost of using virtual functions in a simple example to see if they are a viable option on the XMT. The code for this experiment is virtual_test.cpp. The canal output for compiling this code on an XMT 1 using software release 1.5 is virtual_test.canal.

2. The Classes

Polygon – children add new interface, but don’t modify parent’s

Square

Triangle

VirtualPolygon – children modify parent’s virtual function

VirtualSquare

VirtualTriangle

The functions area() and no_inline_area() are defined in the child classes while height_times_width() and set_dimensions() are defined in the parent classes.

3. Test Sections

Square Array

Operations are performed on an array of Square.

This is the baseline.

Square Pointer Array

Operations are performed on an array of pointers to Square.

This shows the cost of an array of pointers to objects as compared to an array of objects as in the Square Array section.

Polygon Array Normal Inheritance

Operations are performed on array of pointers to Polygon.

This shows the cost of an array of pointers to parent objects as compared to an array of pointers to children objects as in the Square Pointer Array section.

Polygon Array Virtual Inheritance

Operations are performed on array of pointers to VirtualPolygon.

This shows the cost of an array of pointers to parent objects with virtual functions as composed to an array of pointers to parent objects that don’t have virtual functions as in the Polygon Array Normal Inheritance section.

The array size is the same for all sections. In the Polygon sections, the array is initialized with alternating Square and Triangle objects. Also in these sections, Square / Triangle Init and Square / Triangle area() operate on only the Squares or Triangles, so they operate on half the elements in the array. Additionally, Square / Triangle area() in the Normal Inheritance section static cast to the child object type before calling area().

4. Test Results and Discussion

The tests are using a test size of 1000000. Note that I only give results for a single run of the tests. Each run has slightly varying results, but all of them show the same broad conclusions about the costs of the various features.

4.1 Compiled Using gcc 4.2 with –O2 on a Mac Pro.

The following results show a baseline for using various features when using a good modern compiler and running on a serial system. Comparing the Square Pointer Array results to the Square Array results, we see that pointer indirection increases initialization times by about 5x and destruction times by about 10x. Pointer indirection adds around 30% to the cost of calling functions on the classes.

Comparing the Polygon Array Normal Inheritance results to the Square Pointer Array results, we see that calling parent functions on an array of child objects stored as parent object pointers adds no cost to calling the functions. Calling child functions when statically casting the pointers to child types also adds no cost to calling the functions.

Comparing the Polygon Array Virtual Inheritance results to the Polygon Array Normal Inheritance results, we see that initialization and destruction times are somewhat higher which is probably due to having to deal with the virtual table. Calling any parent or child function on an array of child objects stored as parent object pointers has a cost increase in the range of 50% to 100%. Statically casting the parent objects to child objects and calling the child’s implementation of the virtual function doesn’t reduce the cost of calling the function. These results for virtual functions are probably due to how the compiler handles objects with virtual functions.

4.2 Compiled on an XMT 1 using 1.5 Software

The following results show the costs for using various features when using the XMT 1.5 compiler and running on an XMT 1. Comparing the Square Pointer Array results to the Square Array results, we see that pointer indirection increases initialization times by about 26x and destruction times from nothing to more than half the cost of initialization. These increases are mostly because allocating / deallocating memory on the XMT is not cheap. Pointer indirection adds around 50% to the cost of calling functions on the classes.

Comparing the Polygon Array Normal Inheritance results to the Square Pointer Array results, we see that calling parent functions on an array of child objects stored as parent object pointers adds no cost to calling the functions. Calling child functions when statically casting the pointers to child types also adds no cost to calling the functions. Comparing the inlined to the not inlined versions of Square / Triangle area(), we see that in this case not inlining the function basically doubles the cost of the function. The user must also use the assert parallel pragma. The compiler won’t parallelize the loop otherwise because it doesn’t know if the function has unknown side effects.

Comparing the Polygon Array Virtual Inheritance results to the Polygon Array Normal Inheritance results, we see that the initialization time is a little higher which is probably due to having to deal with the virtual table. Non-virtual functions have the same cost whether or not the object contains a virtual function. The cost of calling area() in the Polygon Array Normal Inheritance section is the sum of finding the Square and Triangle areas as each piece does half the array. The virtual function area() has about a 5x cost compared to its non-virtual counterpart. The user must use the assert parallel pragma to get the compiler to parallelize the loop containing the virtual function. Statically casting the parent objects to child objects and calling the child’s implementation of the virtual function doesn’t reduce the cost of calling the function, and the user must still use the assert parallel pragma to get the compiler to parallelize the loop.

Comparing the 1 processor run to the 10 processor run, we see that all of the loops scale well.

5. Conclusions

There is a sizable cost for using virtual functions on an XMT, but they can be used if necessary. Executing a virtual function was about five times slower in execution time than a non-virtual function, but virtual functions can be executed in parallel with an assert parallel pragma. I propose that part of the performance hit is due to the compiler not being able to inline virtual functions and part is due to the extra operations that decide which version of the virtual function to execute.