Note that that's how it's done for other cores, including AArch64. All comments is given for x86 internal_clone implementation above (it basically says that this hairy code is inspired by Glibc clone implementation).

Oh, things become more complicated... I see many test failures with host GCC 4.8 because fast unwinder failed to get backtraces for mallocs. This happens because __interceptor_malloc, compiled by GCC 4.8, didn't properly save frame pointer register:

Oh, things become more complicated... I see many test failures with host GCC 4.8 because fast unwinder failed to get backtraces for mallocs. This happens because __interceptor_malloc, compiled by GCC 4.8, didn't properly save frame pointer register:

So, it seems that LSan's operability depends on what compiler is used to build it :(.

It is a know issue [1] [2] and in fact it a little more complicated, it depends on the flags used as well:
if you mix thumb and arm code fast unwinder won't work. I am not sure which option would be better,
but I am afraid to get a reliable stacktrack on arm (with mixed objects and function built with different compilers)
we will need to use a external unwinder (maybe libunwinder). But it will have the side-effect of making
backtrace way slower.

Oh, things become more complicated... I see many test failures with host GCC 4.8 because fast unwinder failed to get backtraces for mallocs. This happens because __interceptor_malloc, compiled by GCC 4.8, didn't properly save frame pointer register:

So, it seems that LSan's operability depends on what compiler is used to build it :(.

It is a know issue [1] [2] and in fact it a little more complicated, it depends on the flags used as well:

if you mix thumb and arm code fast unwinder won't work. I am not sure which option would be better,

but I am afraid to get a reliable stacktrack on arm (with mixed objects and function built with different compilers)
we will need to use a external unwinder (maybe libunwinder). But it will have the side-effect of making
backtrace way slower.

Right, but I thought it affects only Thumb (that might be mixed with arm) but it seems that even arm mode is affected depending on GCC version. This complicates enabling tests on arm targets, but still, It would be nice to have LSan available on arm.

So, it seems that LSan's operability depends on what compiler is used to build it :(.

It is a know issue [1] [2] and in fact it a little more complicated, it depends on the flags used as well:

if you mix thumb and arm code fast unwinder won't work. I am not sure which option would be better,

but I am afraid to get a reliable stacktrack on arm (with mixed objects and function built with different compilers)
we will need to use a external unwinder (maybe libunwinder). But it will have the side-effect of making
backtrace way slower.

Right, but I thought it affects only Thumb (that might be mixed with arm) but it seems that even arm mode is affected depending on GCC version. This complicates enabling tests on arm targets, but still, It would be nice to have LSan available on arm.

My only concern is what kind of information user will see when potentially use with different libraries and systems.
We can constrain testing on a working environment, but it does not help when real usercase won't get meaningful
stacktraces.

Other projects usually rely on dwarf CFI for get arm stacktraces (valgrind, glibc) and I think we might need to actually
use it or rely on external libraries for arm (libunwind, etc.). Valgrind even tries to add some instructions scanning
to get around the thumb issues (coregrind/m_stacktrace.c) but it even states that it results in bad output.

One option would be to enable lsan as is and try to improve sanitizer stacktrace on arm
(lib/sanitizer_common/sanitizer_stacktrace.cc).

So, it seems that LSan's operability depends on what compiler is used to build it :(.

It is a know issue [1] [2] and in fact it a little more complicated, it depends on the flags used as well:

if you mix thumb and arm code fast unwinder won't work. I am not sure which option would be better,

but I am afraid to get a reliable stacktrack on arm (with mixed objects and function built with different compilers)
we will need to use a external unwinder (maybe libunwinder). But it will have the side-effect of making
backtrace way slower.

Right, but I thought it affects only Thumb (that might be mixed with arm) but it seems that even arm mode is affected depending on GCC version. This complicates enabling tests on arm targets, but still, It would be nice to have LSan available on arm.

My only concern is what kind of information user will see when potentially use with different libraries and systems.
We can constrain testing on a working environment, but it does not help when real usercase won't get meaningful
stacktraces.

Yeah, users will need to compile all their code with -marm -fno-omit-frame-pointer to get stacktraces (ASan would benefit from this too) or use slow unwinder with small context (e.g. 5) but this would significantly slow down application under test.

Other projects usually rely on dwarf CFI for get arm stacktraces (valgrind, glibc) and I think we might need to actually
use it or rely on external libraries for arm (libunwind, etc.). Valgrind even tries to add some instructions scanning
to get around the thumb issues (coregrind/m_stacktrace.c) but it even states that it results in bad output.

One option would be to enable lsan as is and try to improve sanitizer stacktrace on arm
(lib/sanitizer_common/sanitizer_stacktrace.cc).

Perhaps we can enable slow unwinder for LSan tests on ARM (compile with -funwind-tables and run with 'fast_unwind_on_malloc=false')? This would make almost all tests pass except few ones (e.g. use_tls_pthread_specific_dynamic.cc) where slow unwinder might be unable to get caller PC (e.g. from libpthread.so).

Updating. I've enabled slow unwinder for LSan tests on ARM to make them pass regardless of arm/thumb mode and compiler that was used to build LSan runtime.

That's pretty bad.
Yes, the tests will pass, but the tool will be utterly useless in real life. Slow unwinder is just too slow.
And setting fast_unwind_on_malloc=false in the test config means we are not testing what the users are going to use.

Updating. I've enabled slow unwinder for LSan tests on ARM to make them pass regardless of arm/thumb mode and compiler that was used to build LSan runtime.

That's pretty bad.
Yes, the tests will pass, but the tool will be utterly useless in real life. Slow unwinder is just too slow.
And setting fast_unwind_on_malloc=false in the test config means we are not testing what the users are going to use.

I agree. ARM ABI is pretty bad -- in thumb mode we can't use fast unwinder because of known reasons, but I really don't know how can we reliably run LSAN tests on Arm without enabling slow unwinder. In any case, I cannot push these changes -- you are the code owner after all, but even with slow unwinder we got pretty fine results on our distributiive. Is there any way how we can enable LSAN for Arm?

Disabling slow unwinder in tests.
Still, if compile LSan with thumb, the whole LSan testsuite would fail. Can we somehow disable it on thumb targets? E.g. check for '-mthumb' flag into lit configs and enable/disable tests accordingly?

Going back to the issue with interceptor_malloc not saving FP when built with GCC. In clang a function using builtin_frame_address(0) automatically gets a proper frame pointer. As far as I can see, arm-gcc in the android ndk (version 4.9) behaves the same way on a simple test case. Is that some new optimization where a frame address is computed on the fly?

Going back to the issue with interceptor_malloc not saving FP when built with GCC. In clang a function using builtin_frame_address(0) automatically gets a proper frame pointer. As far as I can see, arm-gcc in the android ndk (version 4.9) behaves the same way on a simple test case. Is that some new optimization where a frame address is computed on the fly?