Links

Ever wanted to compare different JIT engines? Now that is possible!

It is well known that runtime profiling is handy when we want to measure the runtime cost of different functions, identify the performance bottlenecks of the code, or measure the progress of the project over a period. However, most profiling mechanisms are tied to a specific project, which makes the comparison of different solutions impossible. I have developed a simple library, which can easily be added to any existing JIT engine. The attached tgz contains patches for wide-spread JIT engines (namely WebKit-JavaScriptCore and Tamarin-NanoJIT).

As I mentioned before, the in-depth comparison of different engines is difficult because they employ different implementation approaches. However, both script and compiled languages have something in common: all code blocks belong to function bodies. Even the program body is just a special main function. Hence, profiling on function level is possible for all script execution engines, so this level is the primary target for our profiling library.

The library is desgined as platform independent as possible. It only depends on some standard C headers, and stdlib. Furthermore, the library has a simple C++ interface, which makes it suitable for wide application areas. To enable the library, ENABLE_PROFILE_ARM directive must be defined by the project. All of the following functions are grouped to the ARMProfiler namespace.

addProfileEvent creates a new profile record and returns its unique index. This index is a key, which should be passed to the following sampling functions:

void startSampling(uint32_t index);
void endSampling(uint32_t index);

To gather the total runtime of all functions, the library tracks the status of the call stack. startSampling should be called when the exection flow enters to a function. Its index refers to the first profile record of the function. We can split the function body to as many parts as we desire, and the library is able to measure the runtime of all these individual parts. However, we do not suggest to divide it to too small fragments, since the timer resultion on ARM is coarse (10 msec). endSampling updates the profiling data for the last given index, and resets the counters for the next profile record given by the index argument. Before leaving a function, endSampling function should be called with a StopSampling pre-defined constant to update the call stack (it pops the last function entry).

It might be surprising that index refers to the next profile record. In our experiences, it is easier to insert a profile-call before a machine-code template than inserting it before all possible branches.

A typical sequence for a function body (assuming there is only one return instruction at the end of the body) is the following:

The above functions can be called from both interpreters and JIT'ed code. In case of jit compilers, the call itself should not affect the state of the code generator. The following ARM instruction sequence preserves all necessary registers, so the call will be invisible to the code generator.