What Alfonso Sanchez-Beato's blog talks about

In chapter 5 of his mind-blowing “The Road to Reality”, Penrose devotes a section to complex powers, that is, to the solutions to

$$w^z~~~\text{with}~~~w,z \in \mathbb{C}$$

In this post I develop a bit more what he exposes and explore what the solutions look like with the help of some simple Python scripts. The scripts can be found in this github repo, and all the figures in this post can be replicated by running

The scripts make use of numpy and matplotlib, so make sure those are installed before running them.

Now, let’s develop the math behind this. The values for \(w^z\) can be found by using the exponential function as

$$w^z=e^{z\log{w}}=e^{z~\text{Log}~w}e^{2\pi nzi}$$

In this equation, “log” is the complex natural logarithm multi-valued function, while “Log” is one of its branches, concretely the principal value, whose imaginary part lies in the interval \((−\pi, \pi]\). In the equation we reflect the fact that \(\log{w}=\text{Log}~w + 2\pi ni\) with \(n \in \mathbb{Z}\). This shows the remarkable fact that, in the general case, we have infinite solutions for the equation. For the rest of the discussion we will separate \(w^z\) as follows:

$$w^z=e^{z~\text{Log}~w}e^{2\pi nzi}=C \cdot F_n$$

with constant \(C=e^{z~\text{Log}~w}\) and the rest being the sequence \(F_n=e^{2\pi nzi}\). Being \(C\) a complex constant that multiplies \(F_n\), the only influence it has is to rotate and scale equally all solutions. Noticeably, \(w\) appears only in this constant, which shows us that the \(z\) values are what is really determinant for the number and general shape of the solutions. Therefore, we will concentrate in analyzing the behavior of \(F_n\), by seeing what solutions we can find when we restrict \(z\) to different domains.

Starting by restricting \(z\) to integers (\(z \in \mathbb{Z}\)), it is easy to see that there is only one resulting solution in this case, as the factor \(F_n=e^{2\pi nzi}=1\) in this case (it just rotates the solution \(2\pi\) radians an integer number of times, leaving it unmodified). As expected, a complex number to an integer power has only one solution.

If we let \(z\) be a rational number (\(z=p/q\), being \(p\) and \(q\) integers chosen so we have the canonical form), we obtain

$$F_n=e^{2\pi\frac{pn}{q} i}$$

which makes the sequence \(F_n\) periodic with period \(q\), that is, there are \(q\) solutions for the equation. So we have two solutions for \(w^{1/2}\), three for \(w^{1/3}\), etc., as expected as that is the number of solutions for square roots, cube roots and so on. The values will be the vertex of a regular polygon in the complex plane. For instance, in figure 1 the solutions for \(2^{1/5}\) are displayed.

Fig. 1: The five solutions to \(2^{1/5}\)

If \(z\) is real, \(e^{2\pi nzi}\) is not periodic anymore has infinite solutions in the unit circle, and therefore \(w^z\) has infinite values that lie on a circle of radius \(|C|\).

In the more general case, \(z \in \mathbb{C}\), that is, \(z=a+bi\) being \(a\) and \(b\) real numbers, and we have

$$F_n=e^{-2\pi bn}e^{2\pi ani}.$$

There is now a scaling factor, \(e^{-2\pi bn}\) that makes the module of the solutions vary with \(n\), scattering them across the complex plane, while \(e^{2\pi ani}\) rotates them as \(n\) changes. The result is an infinite number of solutions for \(w^z\) that lie in an equiangular spiral in the complex plane. The spiral can be seen if we change the domain of \(F\) to \(\mathbb{R}\), this is

$$F(t)=e^{-2\pi bt}e^{2\pi ati}~~~\text{with}~~~t \in \mathbb{R}.$$

In figure 2 we can see one example which shows some solutions to \(2^{0.4-0.1i}\), plus the spiral that passes over them.

Fig. 2: Roots and spiral for \(2^{0.4-0.1i}\)

In fact, in Penrose’s book it is stated that these values are found in the intersection of two equiangular spirals, although he leaves finding them as an exercise for the reader (problem 5.9).

Let’s see then if we can find more spirals that cross these points. We are searching for functions that have the same value as \(F(t)\) when \(t\) is an integer. We can easily verify that the family of functions

$$F_k'(t)=F(t)e^{2\pi kti}~~~\text{with}~~~k \in \mathbb{Z}$$

are compatible with this restriction, as \(e^{2\pi kti}=1\) in that case (integer \(t\)). Figures 3 and 4 represent again some solutions to \(2^{0.4-0.1i}\), \(F(t)\) (which is the same as the spiral for \(k=0\)), plus the spirals for \(k=-1\) and \(k=1\) respectively. We can see there that the solutions lie in the intersection of two spirals indeed.

Fig. 3: Roots for \(2^{0.4-0.1i}\) plus spirals for k=0 and k=-1

Fig. 4: Roots for \(2^{0.4-0.1i}\) plus spirals for k=0 and k=1

If we superpose these 3 spirals, the ones for \(k=1\) and \(k=-1\) cross also in places different to the complex powers, as can be seen in figure 5. But, if we choose two consecutive numbers for \(k\), the two spirals will cross only in the solutions to \(w^z\). See, for instance, figure 6 where the spirals for \(k=\{-2,-1\}\) are plotted. We see that any pair of such spirals fulfills Penrose’s description.

Fig. 5: Roots for \(2^{0.4-0.1i}\) plus spirals for k=-1,0,1

Fig. 6: Roots for \(2^{0.4-0.1i}\) plus spirals for k=-1,-2

In general, the number of places at which two spirals cross depends on the difference between their \(k\)-number. If we have, say, \(F_k’\) and \(F_l’\) with \(k>l\), they will cross when

That is, they will cross when \(t\) is an integer (at the solutions to \(w^z\)) and also at \(k-l-1\) points between consecutive solutions.

Let’s see now another interesting special case: when \(z=bi\), that is, it is pure imaginary. In this case, \(e^{2\pi ati}\) is \(1\), and there is no turn in the complex plane when \(t\) grows. We end up with the spiral \(F(t)\) degenerating to a half-line that starts at the origin (which is reached when \(t=\infty\) if \(b>0\)). This can be appreciated in figure 7, where the line and the spirals for \(k=-1\) and \(k=1\) are plotted for \(20^{0.1i}\). The two spirals are mirrored around the half-line.

Fig. 7: Roots for \(10^{0.1i}\), \(F(t)\), and spirals for k=-1,1

Digging more into this case, it turns out that a pure imaginary number to a pure imaginary power can produce a real result. For instance, for \(i^{0.1i}\), we see in figure 8 that the roots are in the half-positive real line.

Fig. 8: Roots for \(i^{0.1i}\), \(F(t)\), and spirals for k=-1,1

That something like this can produce real numbers is a curiosity that has intrigued historically mathematicians (\(i^i\) has real values too!). And with this I finish the post. It is really amusing to start playing with the values of \(w\) and \(z\), if you want to do so you can use the python scripts I pointed to in the beginning of the post. I hope you enjoyed the post as much as I did writing it.

In the conclusions to my last post, “Modifying System Call Arguments With ptrace”, I mentioned that one of the main drawbacks of the explained approach for modifying system call arguments was that there is a process switch for each system call performed by the tracee. I also suggested a possible approach to overcome that issue using ptrace jointly with seccomp, with the later making sure the tracer gets only the system calls we are interested in. In this post I develop this idea further and show how this can be achieved.

For this, I have created a little example that can be found in github, along the example used in the previous post. The main idea is to use seccomp with a Berkeley Packet Filter (BPF) that will specify the conditions under which the tracer gets interrupted.

Now we will go through the source code, with emphasis on the parts that differ from the original example. Skipping the include directives and the forward declarations we get to main():

The main change here when compared to the original code is the set-up of a BPF in the tracee, right after performing the call to fork(). BPFs have an intimidating syntax at first glance, but once you grasp the basic concepts behind they are actually quite easy to read. BPFs are defined as a sort of virtual machine (VM) which has one data register or accumulator, one index register, and an implicit program counter (PC). Its “assembly” instructions are defined as a structure with format:

struct sock_filter {
u_short code;
u_char jt;
u_char jf;
u_long k;
};

There are codes (opcodes) for loading into the accumulator, jumping, and so on. jt and jf are increments on the program counter that are used in jump instructions, while k is an auxiliary value which usage depends on the code number.

BPFs have an addressable space with data that is in the networking case a packet datagram, and for seccomp the following structure:

So basically what BPFs do in seccomp is to operate on this data, and return a value that tells the kernel what to do next: allow the process to perform the call (SECCOMP_RET_ALLOW), kill it (SECCOMP_RET_KILL), or other options as specified in the seccomp man page.

As can be seen, struct seccomp_data contains more than enough information for our purposes: we can filter based on the system call number and on the arguments.

With all this information we can look now at the filter definition. BPFs filters are defined as an array of sock_filter structures, where each entry is a BPF instruction. In our case we have

BPF_STMT and BPF_JUMP are a couple of simple macros that fill the sock_filter structure. They differ in the arguments, which include jumping offsets in BPF_JUMP. The first argument is in both cases the “opcode”, which is built with macros as a mnemonics help: for instance the first one is for loading into the accumulator (BPF_LD) a word (BPF_W) using absolute addressing (BPF_ABS). More about this can be read here, for instance.

Analysing now in more detail the filter, the first instruction is asking the VM to load the call number, nr, to the accumulator. The second one compares that to the number for the open syscall, and asks the VM to not modify the counter if they are equal (PC+0), so the third instruction is run, or jump to PC+1 otherwise, which would be the 4th instruction (when executing this instruction the PC points already to the 3rd instruction). So if this is an open syscall we return SECCOMP_RET_TRACE, which will invoke the tracer, otherwise we return SECCOMP_RET_ALLOW, which will let the tracee run the syscall without further impediment.

Moving forward, the first call to prctl sets PR_SET_NO_NEW_PRIVS, which impedes child processes to have more privileges than those of the parent. This is needed to make the following call to prctl, which sets the seccomp filter using the PR_SET_SECCOMP option, succeed even when not being root. After that, we call execvp() as in the ptrace-only example.

Switching to what the parent does, we see that changes are very few. In main(), we set the PTRACE_O_TRACESECCOMP option, that makes the tracee stop when a filter returns SECCOMP_RET_TRACE and signals the event to the tracer. The other change in this function is that we do not need to set anymore PTRACE_O_TRACESYSGOOD, as we are being interrupted by seccomp, not because of system calls.

we see here that now we invoke wait_for_open() only once. Differently to when we are tracing each syscall, which interrupted the tracer before and after the execution of the syscall, seccomp will interrupt us only before the call is processed. We also add here a trace for demonstration purposes.

Here we use PTRACE_CONT instead of PTRACE_SYSCALL. We get interrupted every time there is a match in the BPF as we have set the PTRACE_O_TRACESECCOMP option, and we let the tracer run until that happens. The other change here, besides a trace, is how we check if we have received the event we are interested in, as obviously the status word is different. The details can be seen in ptrace’s man page. Note also that we could actually avoid the test for __NR_open as the BPF will interrupt us only for open syscalls.

The rest of the code, which is the part that actually changes the argument to the open syscall is exactly the same. Now, let’s check if this works as advertised:

It does indeed! Note that traces show that the tracer gets interrupted only by the open syscall (besides an initial trap and when the child exits). If we added the same traces to the ptrace-only program we would see many more calls.

Finally, a word of caution regarding call numbers: in this post and in the previous one we are assuming an x86-64 architecture, so the programs would need to be adapted if we want to use it in different archs. There is also an important catch here: we are implicitly assuming that the child process that gets run by the execvp() call is also x86-64, as we are filtering by using the syscall number for that arch. This implies that this will not work in the case that the child program is compiled for i386. To make this example work properly also in that case, we must check the architecture in the BPF, by looking at “arch” in seccomp_data, and use the appropriate syscall number in each case. We would also need to check the arch before looking at the tracee registers, see an example on how to do this here (alternatively we could make the BPF return this data in the SECCOMP_RET_DATA bits of its return value, which can be retrieved by the tracer via PTRACE_GETEVENTMSG). Needless to say, for arm64/32 we would have similar issues.

Occasionally I find myself processing input data which arrives as a stream, like data from files or from a socket, but that has a known structure that can be modeled with C types. For instance, let’s say we are receiving from a socket a parcel that consists on a header of one byte, and a payload that is an integer. A naive way to handle this is the following (simplified for readability) code snippet:

Here we get the raw data with a read() call, we read the first byte, then we read an integer by taking a pointer to the second read byte and casting it to a pointer to an integer. (for this example we are assuming that the integer inserted in the stream has the same size and endianness as the CPU ones).

There is a big issue with this: the cast to int *, which is undefined behavior according to the C standard 1. And it is because things can go wrong in at least two ways, first due to pointer aliasing rules, second due to type alignment.

Strict pointer aliasing tells the compiler that it can assume that pointers to different types point to different places in memory. This allows some optimizations, like reordering. Therefore, we could be in trouble if, say, we take &buff[1] into a char * and use it to write a byte in that location, as reordering could hit us. So just do not do that. Let’s also hope that we have a compiler that is not completely insane and does not move our reading by int pointer before the read() system call. We could also disable strict aliasing if we are using GCC with option -fno-strict-aliasing, which by the way is something that the Linux kernel does. At any rate, this is a complex subject and I will not dig into it this time.

We will concentrate in this article on how to solve the other problem, that is, how to access safely types that are not stored in memory in their natural alignment.

The C Standard-Compliant Solution

Before moving further, keep in mind that it is always possible to be strictly compliant with the standard and access safely memory without breaking language rules or using compiler or machine specific tricks. In the example, we could retrieve vint by doing

vint = buff[1] + (buff[2] << 8)
+ (buff[3] << 16) + (buff[4] << 24);

(supposing stored data is little endian).

The issue here is performance: we are implicitly transforming four bytes to integers, then we have to bit-shift three of them, and finally we have to add them up. Note however that this is what we want if data and CPU have different endianness.

Doing Unaligned Memory Accesses

In all machine architectures there is a natural alignment for the different data types. This alignment is usually the size of the types, for instance in 32 bits architectures the alignment for integers is 4, for doubles it is 8, etc. If instances of these types are not stored in memory positions that are multiple of their alignment, we are talking about unaligned access. If we try to access unaligned data either of these can happen:

The hardware let’s us access it – but always at a performance penalty.

An exception is triggered by the CPU. This type of exception is called bus error 2.

We might be willing to accept the performance penalty 3, which is mitigated by CPU caches and not that noticeable in certain architectures like x86-64 , but we certainly do not want our program to crash. How possible is this? To be honest it is not something I have seen that often. Therefore, as a first analysis step, I checked how easy it was to get bus errors. To do so, I created the following C++ program, access1.cpp (I could not resist to use templates here to reduce the code size):

The program allocates memory using new char[], which as malloc() in C is guaranteed to allocate memory with the same alignment as the strictest fundamental type. After zeroing the memory, we access mem and mem + 1 by casting to different pointer types, knowing that the second address is odd, and therefore unaligned except for char * access.

Now we get what we were searching for, a legitimate bus error, in this case when accessing a long long from an unaligned address. Commenting out the offending line and letting the program run further showed the error also when accessing a long double from mem + 1.

Fixing Unaligned Memory Accesses

After proving that this could be a real problem, at least for some architectures, I tried to find a solution that would let me do unaligned memory accesses in the most generic way. I could not find anything safe that was strictly following the C standard. However, all C/C++ compilers have ways to define packed structures, and that came to the rescue.

Packed structures are intended to minimize the padding that is introduced by alignment needed by the structure members. They are used when minimizing storage is a big concern. But what is interesting for us is that its members can be unaligned due to the packing, so dereferencing them must take that into account. Therefore, if we are accessing a type in a CPU that does not support unaligned access for that type the compiler must synthesize code that handles this transparently from the point of view of the C program.

To test that this worked as expected, I wrote access2.cpp, which uses GCC attribute __packed__ to define a packed structure:

No bus errors anymore! It worked as expected. To gain some understanding of what was happening behind the curtains, I disassembled the generated ARM binaries. For both access1 and access2, the same instruction was being used when I was getting a value after casting to int: LDR, which unsurprisingly loads a 32-bit word into a register. But for the long long, I found that access1 was using LDRD, which loads double words (8 bytes) from memory, while access2 was using two LDR instructions instead.

This all made a lot of sense, as ARM states that LDR supports access to unaligned data, while LDRD does not 5. Indeed the later is faster, but has this restriction. It was also good to check that there was no penalty for using the packed structure for integers: GCC does a good job to discriminate when the CPU really needs to handle differently unaligned accesses.

GCC cast-align Warning

GCC has a warning that can help to identify points in the code when we might be accessing unaligned data, which is activated with -Wcast-align. It is not part of the warnings that are activated by options -Wall or -Wextra, so we will have to add it explicitly if we want it. The warning is only triggered when compiling for architectures that do not support unaligned access for all types, so you will not see it if compiling only for x86.

Conclusion

The moral of this post is that you need to be very careful when casting pointers to a type different to the original one 6. When you need to do that, think about alignment issues first, and also think on your target architectures. There are programs that we want to run on more than one CPU type and too many times we only test in our reference.

Unfortunately the C standard does not give us a standard way of doing efficient access to unaligned data, but most if not all compilers seem to provide ways to do this. If we are using GCC, __attribute__((__packed__)) can help us when we might be doing unaligned accesses. The ARM compiler has a __packed attribute for pointers 7, and I am sure other compilers provide similar machinery. I also recommend to activate -Wcast-align if using GCC, which makes easier to spot alignment issues.

Finally, a word of caution: in most cases you should not do this type of casts. Some times you can define structures and read directly data onto them, some times you can use unions. Bear in mind always the strict pointer aliasing rules, which can hit back. To summarize, think twice before using the sort of trick showed in the post, and use them only when really needed.

So there I was. I did have to use a proprietary library, for which I had no sources and no real hope of support from the creators. I built my program against it, I ran it, and I got a segmentation fault. An exception that seemed to happen inside that insidious library, which was of course stripped of all debugging information. I scratched my head, changed my code, checked traces, tried valgrind, strace, and other debugging tools, but found no obvious error. Finally, I assumed that I had to dig deeper and do some serious debugging of the library’s assembly code with gdb. The rest of the post is dedicated to the steps I followed to find out what was happening inside the wily proprietary library that we will call libProprietary. Prerequisites for this article are some knowledge of gdb and ARM architecture.

Some background on the task I was doing: I am a Canonical employee that works as developer for Ubuntu for Phones. In most, if not all, phones, the BSP code is not 100% open and we have to use proprietary libraries built for Android. Therefore, these libraries use bionic, Android’s libc implementation. As we want to call them inside binaries compiled with glibc, we resort to libhybris, an ingenious library that is able to load and call libraries compiled against bionic while the rest of the process uses glibc. This will turn out to be critical in this debugging. Note also that we are debugging ARM 32-bits binaries here.

The Debugging Session

To start, I made sure I had installed glibc and other libraries symbols and started to debug by using gdb in the usual way:

We can see here that we get the promised crash. I execute a couple of gdb commands after that to see the backtrace and part of the process address space that will be of interest in the following discussion. The backtrace shows that a segment violation happened when the CPU tried to execute instructions in address zero, and we can see by checking the process mappings that the previous frame lives inside the text segment of libProprietary.so. There is no backtrace beyond that point, but that should come as no surprise as there is no DWARF information in libProprietary, and also noting that usage of frame pointer is optimized away quite commonly these days.

After this I tried to get a bit more information on the CPU state when the crash happened:

Hmm, we are starting to see weird things here. First, in 0xf520bd02 (which probably has been executed little before the crash) we get an unconditional branch to some point in the thread stack (see mappings in previous figure). Second, the instruction in 0xf520bd06 (which should be executed after returning from the procedure that provokes the crash) would load into the pc (program counter) an address that is not mapped: we saw that the first mapped address is 0x10000 in the previous figure. The movw instruction has also a “pl” suffix that makes the instruction execute only when the operand is positive or zero… which is obviously unnecessary as 0x48c4 is encoded in the instruction.

I resorted to doing objdump -d libProprietary.so to disassemble the library and compare with gdb output. objdump shows, in that part of the file (subtracting the library load address gives us the offset inside the file: 0xf520bd02-0xf51f6000=0x15d02):

which is completely different from what gdb shows! What is happening here? Taking a look at addresses for both code chunks, we see that instructions are always 4 bytes in gdb output, while they are 2 or 4 in objdump‘s. Well, you have guessed, don’t you? We are seeing “normal” ARM instructions in gdb, while objdump is decoding THUMB-2 instructions. Certainly objdump seems to be right here as the output is more sensible: we have a call to an executable part of the process space in 0x15d02 (it is resolved to a known function, __android_log_print), and the following instructions seems like a normal function epilogue in ARM: a return value is stored in r0, the sp (stack pointer) is incremented (we are freeing space in the stack), and we restore registers.

If we get back to the register values, we see that cpsr (current program status register [1]) does not have the T bit set, so gdb thinks we are using ARM instructions. We can change this by doing

Ok, much better now [2]. The thumb bit in cpsr is determined by last bx/blx call: if the address is odd, the procedure to which we are calling contains THUMB instructions, otherwise they are ARM (a good reference for these instructions is [3]). In this case, after an exception the CPU moves to arm mode, and gdb is unable to know which is the right mode when disassembling. We can search for hints on which parts of the code are arm/thumb by looking at the values in registers used by bx/blx, or by looking at the lr (link register): we can see above that the value after the crash was 0xf520bd07, which is odd and indicates that 0xf520bd06 contains a thumb instruction. However, for some reason gdb is not able to take advantage of this information.

Of course this problem does not happen if we have debugging information: in that case we have special symbols that let gdb know if the section where the code is contains thumb instructions or not [4]. As those are not found, gdb uses the cpsr value. Here objdump seems to have better heuristics though.

After solving this issue with instruction decoding, I started to debug __android_log_print to check what was happening there, as it looked like the crash was happening in that call. I spent quite a lot of time there, but found nothing. All looked fine, and I started to despair. Until I inserted a breakpoint in address 0xf520bd06, right after the call to __android_log_print, run the program… and it stopped at that address, no crash happened. I started to execute the program instruction by instruction after that:

Something was apparently wrong with instruction ldmia, which restores registers, including the pc, from the stack. I took a look at the stack in that moment (taking into account that ldmia had already modified the sp after restoring 9 registers == 36 bytes):

All zeros! At this point it is clear that this is the real point where the crash is happening, as we are loading 0 into the pc. This looked clearly like a stack corruption issue.

But, before moving forward, why are we getting a wrong backtrace from gdb? Well, gdb is seeing a corrupted stack, so it is not able to unwind it. It would not be able to unwind it even if having full debug information. The only hint it has is the lr. This register contains the return address after execution of a bl/blx instruction [3]. If the called procedure is non-leaf, it is saved in the prologue, and restored in the epilogue, because it gets overwritten when branching to other procedures. In this case, it is restored on the pc and sometimes it is also saved back in the lr, depending on whether we have arm-thumb interworking built in the procedure or not [5]. It is not overwritten if we have a leaf procedure (as there are no procedure calls inside these).

As gdb has no additional information, it uses the lr to build the backtrace, assuming we are in a leaf procedure. However this is not true and the backtrace turns out to be wrong. Nonetheless, this information was not completely useless: lr was pointing to the instruction right after the last bl/blx instruction that was executed, which was not that far away from the real point where the program was crashing. This happened because fortunately __android_log_print has interworking code and restores the lr, otherwise the value of lr could have been from a point much far away from the point where the real crash happens. Believe or not, but it could have been even worse!

Having now a clear idea of where and why the crash was happening, things accelerated. The procedure where the crash happened, as disassembled by objdump, was (I include here only the more relevant parts of the code)

The addresses where this code is loaded can be easily computed by adding 0xf51f6000 to the file offsets shown in the first column. We see that a few calls to different external functions [6] are performed by ProprietaryProcedure, which is itself an exported symbol.

I restarted the debug session, added a breakpoint at the start of ProprietaryProcedure, right after stmdb saves the state, and checked the stack values:

We can see that the stack contains something, including a return address that looks valid (0xf75b4491). Note also that the procedure must never touch this part of the stack, as it belongs to the caller of ProprietaryProcedure.

Now it is a simply a matter of bisecting the code between the beginning and the end of ProprietaryProcedure to find out where we are clobbering the stack. I will save you of developing here this tedious process. Instead, I will just show, that, in the end, it turned out that the call to sigemptyset() is the culprit [7]:

Note here that I am printing the part of the stack not reserved by the function (0xf49dde4c is the value of the sp before execution of the line at offset 0x15b20, see the code).

What is going wrong here? Now, remember that at the beginning of the article I mentioned that we were using libhybris. libProprietary assumes a bionic environment, and the libc functions it calls are from bionic’s libc. However, libhybris has hooks for some bionic functions: for them bionic is not called, instead the hook is invoked. libhybris does this to avoid conflicts between bionic and glibc: for instance having two allocators fighting for process address space is a recipe for disaster, so malloc() and related functions are hooked and the hooks call in the end the glibc implementation. Signals related functions were hooked too, including sigemptyset(), and in this case the hook simply called glibc implementation.

I looked at glibc and bionic implementations, in both cases sigemptyset() is a very simple utility function that clears with memset() a sigset_t variable. All pointed to different definitions of sigset_t depending on the library. Definition turned out to be a bit messy when looking at the code as it depended on build time definitions, so I resorted to gdb to print the type. For a executable compiled for glibc, I saw

(gdb) ptype sigset_t
type = struct {
unsigned long __val[32];
}

and for one using bionic

(gdb) ptype sigset_t
type = unsigned long

This finally confirms where the bug is, and explains it: we are overwriting the stack because libProprietary reserves in the stack memory for bionic’s sigset_t, while we are using glibc’s sigemptyset(), which uses a different definition for it. As this definition is much bigger, the stack gets overwritten after the call to memset(). And we get the crash later when trying to restore registers when the function returns.

After knowing this, the solution was simple: I removed the libhybris hooks for signal functions, recompiled it, and… all worked just fine, no crashes anymore!

However, this is not the final solution: as signals are shared resources, it makes sense to hook them in libhybris. But, to do it properly, the hooks have to translate types between bionic in glibc, thing that we were not doing (we were simply calling glibc implementation). That, however, is “just work”.

Of course I wondered why the heck a library that is kind of generic needs to mess around with signals, but hey, that is not my fault ;-).

Conclusions

I can say I learned several things while debugging this:

Not having the sources is terrible for debugging (well, I already knew this). Unfortunately not open sourcing the code is still a standard practice in part of the industry.

The most interesting technical bit here is IMHO that we need to be very cautious with the backtrace that debuggers shows after a crash. If you start to see things that do not make sense it is possible that registers or stack have been messed up and the real crash happens elsewhere. Bear in mind that the very first thing to do when a program crashes is to make sure that we know the exact point where that happens.

We have to be careful in ARM when disassembling, because if there is no debug information we could be seeing the wrong instruction set. We can check evenness of addresses used by bx/blx and of the lr to make sure we are in the right mode.

Some times taking a look at assembly code can help us when debugging, even when we have the sources. Note that if I had had the C sources I would have seen the crash happening right when returning from a function, and it might not have been that immediate to find out that the stack was messed up. The assembly clearly pointed to an overwritten stack.

Finally, I personally learned some bits of ARM architecture that I did not know, which was great.

Well, this is it. I hope you enjoyed the (lengthy, I know) article. Thanks for your reading!