In Part 11, we spent some time clarifying
mechanisms we had previously glossed over: how variables and functions from
other ELF objects were accessed at runtime.

We saw that doing so “proper” required the cooperation of the compiler, the
assembler, the linker, and the dynamic loader. We also learned that the
mechanism for functions was actually quite complicated! And sorta clever!

And finally, we ignored all the cleverness and “made things work” with a
three-line change, adding support for both GlobDat and JumpSlot
relocations.

We're not done with relocations yet, of course - but I think we've earned
ourselves a little break. There's plenty of other things we've been ignoring
so far!

For example… how are command-line arguments passed to an executable?

Cool bear's hot tip

Ooh, ooh, that's easy!

The main() function gets an int argc argument, and a char **argv
argument!

Ah, of course cool bear. One little problem though… we have no main.

// in `elk/samples/chimera/chimera.c`
void _start(void) {
// (cut)
}

Remember, since we're staying away from libc, we have to come up with our
own entry point - named _start by convention. It takes no arguments, and
returns nothing - in fact, it never returns.

So, let's see… if we map those to the calling convention for the System V AMD64 ABI, for
“INTEGER” class arguments (works for both int arguments and pointers), this is what our
registers and stack look like right before calling __libc_start_main:

Cool bear's hot tip

Throughout this whole article, whenever we write foo %rax, we mean “the register named rax”.

It's a bit confusing, because in GDB, you can print registers with the syntax $rax,
and when looking at disassembly in Intel syntax, registers are just written rax.

So this push is just there to maintain 16-byte alignment because another 8 bytes
are pushed before calling __libc_start_main.

push rsp

This sets up stack_end. I'm assuming glibc uses that to set up some sort of stack
smashing protection. An assumption
that would be very easy to verify for anyone who, unlike me, is willing to dive back
in glibc's source code at this point in time.

It's possible to guess what's going wrong just by this picture. And
I'm going to give you a chance to guess! To avoid spoilers, I'll let
cool bear tell you about another bug in echidna I wasn't sure I was
even going to mention.

Cool bear's hot tip

Story time!

When amos was prototyping echidna, everything worked fine… for a while.
Then he tried it in release mode, and all hell broke loose. The GDB session
above shows one legitimate problem that was relatively easy to fix, but
then there was another problem.

At that point in the code, there was a struct with two u64 fields, like so:

struct S {
a: u64,
b: u64,
}

And it was dereferenced, moved around and the like. It being a 128-bit wide
type, LLVM thought it'd be smart and use the xmm0 register, so it could
be moved in one fell swoop.

But it was generating the movdqa instruction, like so:

movdqa XMMWORD PTR [rsp],xmm0

…but by that point, %rsp wasn't 16-byte-aligned, only 8-byte-aligned.
And the a in movdqa stands for “aligned”. So it segfaulted. (That's
a segfault you don't see often!).

So amos went fishing with GDB. %rsp was 16-byte-aligned at the beginning of
_start (as expected), it was 16-byte-aligned at the beginning of main…
but it wasn't aligned right before the movdqa.

As it turns out, amos had misunderstood the System V AMD64 ABI.

_start was doing that:

_start:
mov rsi, rsp
jmp main

…which is wrong. You see, main expects to be called, not just jumped to.
And call pushes the address to return to onto the stack.

So function prologues (generated by LLVM for every Rust function) actually expect
%rsp to be unaligned, and compensate when allocating local storage: they
reserve 8+16*n bytes, which re-aligns %rsp.

TL;DR - even if our main is never supposed to return, we should call it.

Did you figure out the problem?

In debug builds, naive code is generated, and the stack is used for
everything, including all local variables, temporaries, etc:

Next, we'll do a little spring cleanup - right now, main.rs takes care of
the whole startup process. How about we move that to process.rs?

// in `elk/src/process.rs`
use std::ffi::CString;
// This struct has a lifetime, because it takes a reference to an `Object` - so
// it's only "valid" for as long as the `Object` itself lives.
pub struct StartOptions<'a> {
pub exec: &'a Object,
pub args: Vec<CString>,
pub env: Vec<CString>,
pub auxv: Vec<Auxv>,
}

We'll be passing these options whenever we want to start a process with elk.

Remember that elk is just a regular ELF program. It gets started much the
same way echidna is - at some point there, the Linux kernel puts auxiliary
vectors on the stack, then hands off control to libc.

libc stashes those somewhere, and getauxval (not a syscall) is the way to
get them back, way, way later, when there's a lot of other stuff on top of
the stack.

This way of getting the auxiliary vectors is actually much simpler and what a
regular person is likely to do. And I mean regular not as a derogatory term,
but as “someone who isn't actively trying - despite repeated warnings from their
friends - to make a dynamic linker”.