Search

Playing with ptrace, Part I

Have you ever wondered how system calls
can be intercepted? Have you ever tried fooling the kernel by
changing system call arguments? Have you ever wondered how
debuggers stop a running process and let you take control of the
process?

If you are thinking of using complex kernel programming to
accomplish tasks, think again. Linux provides an elegant mechanism
to achieve all of these things: the ptrace (Process Trace) system
call. ptrace provides a mechanism
by which a parent process may observe and control the execution of
another process. It can examine and change its core image and
registers and is used primarily to implement breakpoint debugging
and system call tracing.

In this article, we learn how to intercept a system call and
change its arguments. In Part II of the article we will study
advanced techniques—setting breakpoints and injecting code into a
running program. We will peek into the child process' registers and
data segment and modify the contents. We will also describe a way
to inject code so the process can be stopped and execute arbitrary
instructions.

Basics

Operating systems offer services through a standard mechanism
called system calls. They provide a standard API for accessing the
underlying hardware and low-level services, such as the
filesystems. When a process wants to invoke a system call, it puts
the arguments to system calls in registers and calls soft interrupt
0x80. This soft interrupt is like a gate to the kernel mode, and
the kernel will execute the system call after examining the
arguments.

On the i386 architecture (all the code in this article is
i386-specific), the system call number is put in the register %eax.
The arguments to this system call are put into registers %ebx,
%ecx, %edx, %esi and %edi, in that order. For example, the
call:

write(2, "Hello", 5)

roughly would translate into

movl $4, %eax
movl $2, %ebx
movl $hello,%ecx
movl $5, %edx
int $0x80

where $hello points to a literal string “Hello”.

So where does ptrace come into picture? Before executing the
system call, the kernel checks whether the process is being traced.
If it is, the kernel stops the process and gives control to the
tracking process so it can examine and modify the traced process'
registers.

Let's clarify this explanation with an example of how the
process works:

along with the output of ls. System call number 11 is execve, and
it's the first system call executed by the child. For reference,
system call numbers can be found in /usr/include/asm/unistd.h.

As you can see in the example, a process forks a child and
the child executes the process we want to trace. Before running
exec, the child calls ptrace with the first
argument, equal to PTRACE_TRACEME. This tells the kernel that the
process is being traced, and when the child executes the execve
system call, it hands over control to its parent. The parent waits
for notification from the kernel with a wait() call. Then the
parent can check the arguments of the system call or do other
things, such as looking into the registers.

When the system call occurs, the kernel saves the original
contents of the eax register, which contains the system call
number. We can read this value from child's USER segment by calling
ptrace with the first argument PTRACE_PEEKUSER, shown as
above.

After we are done examining the system call, the child can
continue with a call to ptrace with the first argument PTRACE_CONT,
which lets the system call continue.

The first argument determines the behaviour of ptrace and how
other arguments are used. The value of request should be one of
PTRACE_TRACEME, PTRACE_PEEKTEXT, PTRACE_PEEKDATA, PTRACE_PEEKUSER,
PTRACE_POKETEXT, PTRACE_POKEDATA, PTRACE_POKEUSER, PTRACE_GETREGS,
PTRACE_GETFPREGS, PTRACE_SETREGS, PTRACE_SETFPREGS, PTRACE_CONT,
PTRACE_SYSCALL, PTRACE_SINGLESTEP, PTRACE_DETACH. The significance
of each of these requests will be explained in the rest of the
article.

Reading System Call Parameters

By calling ptrace with PTRACE_PEEKUSER as the first argument,
we can examine the contents of the USER area where register
contents and other information is stored. The kernel stores the
contents of registers in this area for the parent process to
examine through ptrace.

Here we are tracing the write system calls, and
ls makes three write system calls. The call to
ptrace, with a first argument of PTRACE_SYSCALL, makes the kernel
stop the child process whenever a system call entry or exit is
made. It's equivalent to doing a PTRACE_CONT and stopping at the
next system call entry/exit.

In the previous example, we used PTRACE_PEEKUSER to look into
the arguments of the write system call. When a system call returns,
the return value is placed in %eax, and it can be read as shown in
that example.

The status variable in the wait call is used to check whether
the child has exited. This is the typical way to check whether the
child has been stopped by ptrace or was able to exit. For more
details on macros like WIFEXITED, see the wait(2) man page.

Reading Register Values

If you want to read register values at the time of a syscall
entry or exit, the procedure shown above can be cumbersome. Calling
ptrace with a first argument of PTRACE_GETREGS will place all the
registers in a single call.

This example makes use of all the concepts previously discussed,
plus a few more. In it, we use calls to ptrace with PTRACE_POKEDATA
to change the data values. It works exactly the same way as
PTRACE_PEEKDATA, except it both reads and writes the data thatt the
child passes in arguments to the system call whereas PEEKDATA only
reads the data.

Single-Stepping

ptrace provides features to
single-step through the child's code. The call to
ptrace(PTRACE_SINGLESTEP,..) tells the kernel to stop the child at
each instruction and let the parent take control. The following
example shows a way of reading the instruction being executed when
a system call is executed. I have created a small dummy executable
for you to understand what is happening instead of bothering with
the calls made by libc.

Here's the listing for dummy1.s. It's written in assembly
language and compiled as gcc -o dummy1 dummy1.s:

You might have to look at Intel's manuals to make sense out of
those instruction bytes. Using single stepping for more complex
processes, such as setting breakpoints, requires careful design and
more complex code.

In Part II, we will see how breakpoints can be inserted and
code can be injected into a running program.

Pradeep Padala
is currently working on his Master's degree at the University of
Florida. His research interests include Grid and distributed
systems. He can be reached via e-mail at
p_padala@yahoo.com
or through his web site
(www.cise.ufl.edu/~ppadala).