Playing with ptrace, Part I

Using ptrace allows you to set up system call interception and modification at the user level.

Have you ever wondered how system calls
can be intercepted? Have you ever tried fooling the kernel by
changing system call arguments? Have you ever wondered how
debuggers stop a running process and let you take control of the
process?

If you are thinking of using complex kernel programming to
accomplish tasks, think again. Linux provides an elegant mechanism
to achieve all of these things: the ptrace (Process Trace) system
call. ptrace provides a mechanism
by which a parent process may observe and control the execution of
another process. It can examine and change its core image and
registers and is used primarily to implement breakpoint debugging
and system call tracing.

In this article, we learn how to intercept a system call and
change its arguments. In Part II of the article we will study
advanced techniques—setting breakpoints and injecting code into a
running program. We will peek into the child process' registers and
data segment and modify the contents. We will also describe a way
to inject code so the process can be stopped and execute arbitrary
instructions.

Basics

Operating systems offer services through a standard mechanism
called system calls. They provide a standard API for accessing the
underlying hardware and low-level services, such as the
filesystems. When a process wants to invoke a system call, it puts
the arguments to system calls in registers and calls soft interrupt
0x80. This soft interrupt is like a gate to the kernel mode, and
the kernel will execute the system call after examining the
arguments.

On the i386 architecture (all the code in this article is
i386-specific), the system call number is put in the register %eax.
The arguments to this system call are put into registers %ebx,
%ecx, %edx, %esi and %edi, in that order. For example, the
call:

write(2, "Hello", 5)

roughly would translate into

movl $4, %eax
movl $2, %ebx
movl $hello,%ecx
movl $5, %edx
int $0x80

where $hello points to a literal string “Hello”.

So where does ptrace come into picture? Before executing the
system call, the kernel checks whether the process is being traced.
If it is, the kernel stops the process and gives control to the
tracking process so it can examine and modify the traced process'
registers.

Let's clarify this explanation with an example of how the
process works:

along with the output of ls. System call number 11 is execve, and
it's the first system call executed by the child. For reference,
system call numbers can be found in /usr/include/asm/unistd.h.

As you can see in the example, a process forks a child and
the child executes the process we want to trace. Before running
exec, the child calls ptrace with the first
argument, equal to PTRACE_TRACEME. This tells the kernel that the
process is being traced, and when the child executes the execve
system call, it hands over control to its parent. The parent waits
for notification from the kernel with a wait() call. Then the
parent can check the arguments of the system call or do other
things, such as looking into the registers.

When the system call occurs, the kernel saves the original
contents of the eax register, which contains the system call
number. We can read this value from child's USER segment by calling
ptrace with the first argument PTRACE_PEEKUSER, shown as
above.

After we are done examining the system call, the child can
continue with a call to ptrace with the first argument PTRACE_CONT,
which lets the system call continue.

The first argument determines the behaviour of ptrace and how
other arguments are used. The value of request should be one of
PTRACE_TRACEME, PTRACE_PEEKTEXT, PTRACE_PEEKDATA, PTRACE_PEEKUSER,
PTRACE_POKETEXT, PTRACE_POKEDATA, PTRACE_POKEUSER, PTRACE_GETREGS,
PTRACE_GETFPREGS, PTRACE_SETREGS, PTRACE_SETFPREGS, PTRACE_CONT,
PTRACE_SYSCALL, PTRACE_SINGLESTEP, PTRACE_DETACH. The significance
of each of these requests will be explained in the rest of the
article.

Karan Verma asked how to put a breakpoint at a particular line number. This is exactly what a debugger does.

There are many nuances and details, but I'll simplify this to the bare minimum:

1) Find where the program is loaded in memory.
2) Read the executable and locate the debug data which corresponds to the source file containing the line at which you want to place the breakpoint.
3) Interpret the line number table in the debug data to locate the address which corresponds to the desired line.
4) Adjust the address from (3) to make it correspond with (1) if necessary.
5) Copy and save the original instruction at the breakpoint address.
6) Write the breakpoint instruction at the breakpoint address

Naturally, this is usually done when the child process is stopped. After inserting the breakpoint, the child is allowed to run.

When the breakpoint is executed, the child process will stop and the parent will receive a signal. The parent process needs to replace the breakpoint instruction the original instruction before it can allow the child process to continue.

GDB and LLDB are open source debuggers which give examples of how this is done. Reading GDB is not for the faint of heart -- there's a lot of complexity in handling many different object file formats and many different target architectures.

Ptrace() provides access to the memory and registers of a child process. It doesn't tell you how that memory is organized.

Information on how the stack is organized is usually contained in the ABI (Application Binary Interface) for each processor. DWARF debugging information (see dwarfstd.org) Call Frame Information (CFI) describes where each call has saved registers.

If you want to write a routine which walks the call stack, I suggest that you start with one which will walk the stack in the current process and later convert it to accessing a child process. The first step is to find the start of the current stack frame, then find the previous frame.

This article helps me a lot. I'm trying to create a line coverage tool using ptrace.

One problem is ptrace only resolve single thread, and I don't know how to deal with multi thread application.

I set option to catch clone event, it can help me to find all lwp's pid.
I also try to make all thread continue, but it seems only child thread will go on until sleep. The parent thread has no affect on continue command, /proc//status shows it in "tracing stop".

Could you tell me how to make all thread continue? (I have restore the breakpoint in runtime memory)

I've used strace() for debugging for a couple of years and never knew this was what it used; I'm hoping to create a simulator using ptrace() soon for automatic integration testing of an embedded project and your article is a great start to see some real code :-)

Please be aware that putdata() contains a serious mistake. If len is not a multiple of four, then putdata() should read the final long value, replace one, two, or three bytes, and then write the value. This mistake causes the second example in Part II to seg fault.

I'm posting this hoping the next fellow who encounters this gotcha can save a little time...

ptrace only works in the base thread of the parent process. ptrace(PTRACE_CONT, pid) will fail with ESRCH (process not found) if issued in a child thread on Linux.

If you are thinking of using a debugger thread to watch each child thread, give it up. It won't work. And unless you find this message or have a sudden epiphany, you are liable to spend a great deal of time bashing your poor head against the wall.

Trending Topics

Webinar: 8 Signs You’re Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
11am CDT, April 29th

Join Linux Journal and Pat Cameron, Director of Automation Technology at HelpSystems, as they discuss the eight primary advantages of moving beyond cron job scheduling. In this webinar, you’ll learn about integrating cron with an enterprise scheduler.