Injecting the Python Interpreter Via GDB

Recently I was evaluating the security of an application sandbox and I needed a way to inject some kind of interface into the sandboxed application in order to explore the possibilities available from that context. The main objective was to be able to easily explore file and system call access to determine what was allowed/denied. I decided the most suitable interface I could use for this exploration would be the Python interactive shell.

The first step I needed to take was to get the Python library (libpython) loaded into the address space of the target application. The easiest way that I could think to do this was to utilize the call command in the Gnu Debugger (GDB). GDB’s call command performs a debugee procedure call by injecting a new thread into the debugee and controlling the startup state. Since GDB already performs the necessary steps, I could take advantage of this by issuing the command:

call (int)dlopen("/usr/lib/libpython2.7.dylib")

This causes GDB to create a new thread and call the dynamic linker function dlopen() to load a shared library. Because I was testing this on Mac OS X, I specified the default path to the libpython dynamic library. On other UNIX based platforms this path would be to an .so library. On Windows libpython is a DLL .

This same functionality can be achieved on Windows using the LoadLibrary() function instead of dlopen(). Windbg also supports debugee procedure calls with the .call command. Obviously for the call to dlopen() to succeed, the platform you are using must have debug symbols for the libraries you are calling into. Otherwise, GDB does not know what address the function is at. Depending on the situation, it might be useful to use the dynamic linkers configuration variables to force the load of a debug build of the necessary libraries.

Now that I had the Python interpreter embedded into the address space of the target application, the next step was to get my own Python code executing. Again this could be easily achieved from GDB using the call command. When you are embedding Python in any application the first step is to call Py_Initialize(). The Py_Initialize() function creates the building modules in the Python namespace, as well as setting up the appropriate module search paths. The call to this function from GDB looks exactly like you would expect:

call (int)Py_Initialize()

With the Python namespace established, the only thing left to do is to execute a Python string. One easy method of doing this is to use the PyRun_SimpleString() function. This function takes a character string argument and compiles and executes it as Python code.

As you can see from the example below, we can verify that PyRun_SimpleString() has been successful by calling os.getpid() in python and comparing it with the process ID (pid) from a bash shell debugee ($$).

While this worked great for a console-based application where we are directly in control of stdin/stdout, it’s less than ideal for a GUI-based application or a server. I decided the solution was to create a Python bindshell similar to what would be used in an exploitation scenario.

For the most part the code was fairly straightforward, although there were a few ugly hacks involved. I have included the entire code listing in the appendix to this post below; however, I will briefly run through how I went about this.

Basically, I used some pretty standard socket server code that utilizes the socket library. Calling bind()/listen() and accept() to bind a socket to a port, listen for a client to connect, and generate a new handle for the newly connected client.

Once the socket was set up, the next step was to create an instance of the code module and utilize the InteractiveInterpreter method to provide a nice Python shell. Typical invocation of this method will simply read from stdin and write to stdout; however, a different approach was used since I needed to bind this to a socket. The InteractiveInterpreter method accepts an argument readfunc, which allows us to specify a function for performing a read of the user input rather than just using raw_input (the default). I then created a function that reads from the socket (after converting it to a file using the makefile() method) using the readline() method. Doing this caused my Python bindshell to half work. Input was read from the socket; however, the output from the executing code would still go to the client. To work around this I needed to redirect writes to stdout through the socket. Because my interpreter was in the address space of another program that I was inspecting, I did not want to just dup2stdin, as doing this would redirect the debugee’s output through the socket too. To work around this I had to modify the stdin object in the python namespace to write to the socket. Unfortunately this was not a simple case of swapping the stdout object with the socket object because the socket object does not have a write() method, which is what is used by the code module to perform output functionality. At first I tried using the filehandle that I created with the makefile() method of the socket; however, this did not end up sending the output to the client. To combat this I created my own class (wsocket) that wrapped the socket object and provided a write() method that passed the data through to the send() method of the socket. I then replaced the stdout object in the Python namespace with the wsocket instance. This caused the bindshell to work as expected, as you can see below (using the netcat (nc) command to connect to the shell on port 55555):

Now that I had a working Python bindshell, I needed to inject it using the GDB commands from earlier. I decided that, rather than implementing a whole series of calls to the Python framework, I would simply encode the entire program into a single line and use PyExc_SimpleString() to execute it.

I made a tiny Python program using the base64 module, which read the whole file in, and b64 encoded it (shown below):

import base64
print base64.b64encode(open("raw","r").read())

In order to decode and execute this at runtime, I needed to use the following code.

This code simply takes the base64 encoded output, decodes it back to ascii text, and passes it to the runcode() method of the InteractiveInterpreter to be executed. With this in mind the final command that I sent to GDB is:

This worked perfectly and allowed me to easily inject the interpreter; however, it would leave the GDB session blocking and unusable. I decided it would be more useful to have a working GDB session, which would make it possible to explore the address space while using ctypes from the Python shell to create API stubs for executing functionality in the debugee. The problem though, was that if I pressed ^C (control C) to interrupt the execution of the call command, GDB shut down the Python code that was executing. To get around this, I created a thread inside my Python bindshell, and executed the InteractiveInterpreter inside it.

I then issued the following command in GDB:

set unwindonsignal

Set unwinding of the stack if a signal is received while in a function that gdb called in the program being debugged. If set to on, gdb unwinds the stack it created for the call and restores the context to what it was before the call. If set to off (the default), gdb stops in the frame where the signal was received.

The result of this command was that when I interrupted the running command in GDB, the stack unwound and GDB continued as normal. However, because the bindshell was now running in its own thread, it resumed execution when the continue command was given.

At this stage I thought that I was done. However, when I injected my bindshell into the target app I received an error loading modules such as base64 or ctypes from the lib-dynload/ directory. After much searching I discovered an obscure bug report on the Python site, http://bugs.python.org/issue4434. The poster notes that he was provided with a work around:

“I have been given the following workaround: in mylib.c, before PyInitialize() I can call dlopen(“libpython2.5.so”, RTLD_LAZY | RTLD_GLOBAL);”

After reading this I modified my dlopen call to include the flag value: 9.
This is the result of the operation RTLD_LAZY | RTLD_GLOBAL. This left me with the following:

call (int)dlopen("/usr/lib/libpython.dylib",9)

After I made this modification it solved the above problem and the code worked fine.

To make it more convenient for me to run the code again, I created a GDB command to house the finished statements. The syntax for this is pretty simple. I created the pyject command, which takes no arguments and will simply run the commands we discussed thus far. Running the command results in the Python interpreter being injected and executed, and our Python bindshell binding to port 55555.

There are many uses for this functionality, the sandbox example is just one of them. As I mentioned previously, it’s possible to craft ctypes stubs from Python to call into the APIs of the debugged process. This can be very useful as confirmation of reverse engineering practice, as well as providing a strong basis for exploration of unknown software. There have been several published tools for injecting Python into a process in the past; the reason I worked on this, however, is that it doesn’t require an additional tool to inject and this technique works on multiple platforms.

To utilize the gdbinit, on a unix platform, copy it into your home directory as .gdbinit.

Some of the individuals posting to this site, including the moderators, work for Cisco Systems. Opinions expressed here and in any corresponding comments are the personal opinions of the original authors, not of Cisco. The content is provided for informational purposes only and is not meant to be an endorsement or representation by Cisco or any other party. This site is available to the public. No information you consider confidential should be posted to this site. By posting you agree to be solely responsible for the content of all information you contribute, link to, or otherwise upload to the Website and release Cisco from any liability related to your use of the Website. You also grant to Cisco a worldwide, perpetual, irrevocable, royalty-free and fully-paid, transferable (including rights to sublicense) right to exercise all copyright, publicity, and moral rights with respect to any original content you provide. The comments are moderated. Comments will appear as soon as they are approved by the moderator.