Just another Blog.mozilla.com weblog

Last August I asked the question “How fast can CFI/EXIDX-based stack unwinding be?” At the time I was experimenting with native unwinding using our in-tree Breakpad copy, but getting dismal performance results. The posting observed that Breakpad’s CFI unwinder is around 30 times slower than Valgrind’s CFI unwinder, and looked in detail at the reasons for this slowness.

Based on that analysis, I wrote a new lightweight unwinder library. LUL — as it became known — is aimed directly at doing unwinding for profiling. It is fast, robust, fairly accurate, and designed to allow a pool of worker threads to do unwinding, if that’s somewhere we want to go. It is also set up to facilitate the space-saving schemes discussed in “How compactly can CFI/EXIDX stack unwinding info be represented?” although those have not been implemented as yet. LUL stores unwind information in a simple, quick-to-use format, which could conceivably be generated by the Javascript JITs so as to facilitate transparent unwinding through Javascript as well as C++.

LUL has been integrated into the SPS profiler, and landed a couple of weeks back.

It currently provides unwinding on x86_64-linux, x86_32-linux and arm-android, using the Dwarf CFI and ARM EXIDX unwind formats. Unwinding by stack scanning is also supported, although that should rarely be needed. Compared to the Breakpad unwinder, there is a very substantial performance increase, achieving a cost of about 40% of a 1.2 GHz Cortex A9 for 1000 unwinds/second from leaf frames all the way back to XRE_Main().

To use LUL, build with –enable-profiling –enable-optimize=”-g -O2″. I then start the desktop builds with the following environment variable settings:

What next for LUL? I’d like to implement the space-saving schemes mentioned earlier. But more important, it would be nice to have developers using the SPS/LUL combination, so as to give real-use feedback. That will help to move it forward in the most immediately useful direction.