Search

Extending GlusterFS with Python

Are you a Python programmer who wishes your storage could do more for you?
Here's an easy way to add functionality to a real distributed filesystem, in
your favorite language.

Programming languages are usually not good neighbors. Even mixing
languages as closely related as C and C++ often can lead to a morass of
conflicting conventions with respect to symbol names, initialization
orders and memory management strategies. As the distance between
languages increases, the difficulty of integrating them increases as
well. This is particularly true when attempting to mix compiled and
interpreted languages. Most interpreted languages have ways to call
functions and access symbols in compiled libraries, but these facilities
often are far from convenient, and calling back the other way—from
compiled code to interpreted—is less convenient still. Integration
between interpreted languages is even less feasible—the one notable
exception being the several languages that share the Java Virtual Machine
(JVM). Interoperability between interpreted languages using different
virtual machines usually is limited to message passing between separate
processes.

In this context, Python's facilities for integrating with code written
in other languages are like a breath of fresh air. One option is
Jython, which exists quite comfortably within the aforementioned JVM
ecosystem. For integration with compiled code, Python offers not one
but two methods of integration. The first is the "extension
API", which
allows you to write Python modules in C. ("C" is used here as shorthand
for any compiled code that adheres to the initialization and calling
conventions originally defined for C.) Using this interface, it is
possible to create compiled modules that offer the full functionality of
native Python modules with the full performance of compiled code. There
even are projects like Cython that will generate most of the necessary
"boiler plate" for you.

The Python ctypes module offers an even more convenient option for
integration with compiled code, with only a very small decrease in
functionality. Using ctypes, Python code can call functions and access
symbols even in C libraries whose authors never thought about Python
at all. Python programmers also can use ctypes to interpret C data
structures (overlapping somewhat with the functionality provided by the
struct module) and even define Python callbacks that can be passed to
C functions. Although it is not possible to do absolutely everything with
ctypes that you can do with the extension interface, combining the two
approaches can lead to very powerful results.

As a case study in combining Python code with an existing compiled
program or language, this article focuses on the implementation of
a Python "translator" interface for GlusterFS. GlusterFS is a modern
distributed filesystem based on the principle of horizontal scaling—adding capacity or performance to a system by adding more servers based on
commodity hardware instead of having to pay an ever-increasing premium
to make existing servers more powerful. Development is sponsored by
Red Hat, but it's completely open source, so anyone can contribute. In
addition to horizontal scaling, another core principle of GlusterFS
is modularity. Most of the functionality within GlusterFS actually
is provided by translators—so called because they translate I/O calls
(such as read or write) coming from the user into the same or other calls
that are passed on toward storage. These calls are passed from one
translator to another, arranged in an arbitrarily complex hierarchy,
until eventually the lowest-level calls are executed on servers' local
filesystems. I call this interface TXAPI here for the sake of brevity,
even though that's not an official term. TXAPI has been used to implement
internal GlusterFS functionality, such as replication and caching, and
also external functionality, such as on-disk encryption.

This article is not primarily about GlusterFS, however. Even though
I use GlusterFS to illustrate techniques for integrating Python and
C code and show results to illustrate the potential benefits of such
integration, most of the techniques are equally applicable to other
programs with a similar set of characteristics. Those characteristics
include a C "top level" calling into Python instead of the other way
around, a fundamentally multithreaded execution model, and the presence
of a well-defined plugin interface (TXAPI) that makes extensive use of
callbacks in both directions.

The fact that GlusterFS is primarily a C program—filesystems are, after
all, system software—means that you can't use ctypes for everything. To
bootstrap your integration, you need to use Python's "embedding
API", which
is a close cousin of the previously mentioned extension API and allows
C code to call in to the Python interpreter. You need to invoke this API
at least once to create an interpreter and invoke an initialization
function in a Python module. For this purpose, you use a single C-based
"meta translator" that can be loaded just like translators always have
been. This translator is called glupy from GLUster and PYthon. (The
preferred pronunciation is "gloopy" even though
"glup-pie" might make
more sense given those origins.) Most of what glupy does is provide the
generic embedding-API glue to load the actual Python translator, which is
specified as an option. This loading is a fairly simple matter of calling
PyImport_Import to load the module, followed by
PyObject_CallObject to
initialize it, as shown below (error handling has been left
out for clarity):

The user's Python init function is then responsible
for registering TXAPI callbacks for later, in addition to its own
domain-specific initialization. Glupy also includes a Python/ctypes module
that encapsulates the GlusterFS types and some functions that glupy
users can invoke (in the example, this is done using the "dl" handle).

At this point, you reach a fork in the road. If you're already using the
embedding API, why not continue using it for almost everything? In
this approach, a glupy dispatch function would use
Py_BuildValue to
construct an argument list and then use
PyObject_CallObject to call the
appropriate Python function/method from a table. This is pretty tedious
code to write by hand, but much of the process could be automated. The
bigger problem with this approach is that TXAPI involves many pointers
to GlusterFS-specific structures, which must be passed through the
embedding API as opaque integers. The Python code receiving such a value
must then explicitly use from_address to convert this into a real Python
object. Clutter within glupy itself is not a problem, but clutter within
glupy users' code makes this approach less appealing.

The approach actually used in glupy involves less C code and more
Python code, with a greater emphasis on ctypes. In this approach,
the user's Python code is presented not as Python functions but as C
functions, using ctypes to define function types that then can be used
as decorators. Unfortunately, details of the platform-specific foreign
function interfaces used by ctypes to implement such a callback mean
that there's no way to get the actual function pointer as it's seen by
C code other than by actually passing it to a C function. Accordingly,
you pass the Python callback object to a glupy registration function
that can see the result of this conversion. For each type of operation,
there are two corresponding registration functions: one for the dispatch
function that initiates the operation and one for the callback that
handles completion. The glupy meta-translator then stores pointers to
the registered functions in a table for fast access later. One side
effect of this approach is that glupy functions are strongly typed. This
might seem rather un-Pythonic, but TXAPI itself is strongly typed, and
the consequences of mixing types could be a hung filesystem, so this
seems like a reasonable safety measure. Although this might all seem rather
complicated, the net result is Python code that's relatively free of
type-conversion clutter and requires very little initialization code. For
instance, the following shows the init function for an example I'll be
using that registers dispatch functions and callbacks for two types
of operations:

The next problem to solve is multithreading. The Python
interpreter still is essentially single-threaded, so C code that calls
into Python must be sure to take the Global Interpreter Lock and do other
things to keep the interpreter sane. Fortunately, current versions of
Python make this much easier than it used to be. The first thing you
need to do is enable multithreading by calling
PyEval_InitThreads
after Py_Initialize. What a surprising number of people seem to
miss, even though it's fairly well documented, is that part of what
PyEval_InitThreads does is acquire the Global Interpreter Lock on behalf
of the calling thread. This lock must be released explicitly at the
end of initialization, or else any other code that tries to acquire it
will deadlock. In this case, this acquisition is implicit in calls to
PyGILState_Ensure, which is the recommended way to set up interpreter
state before calling into Python from multithreaded C code. Each
glupy dispatch function and callback does this, with a matching call to
PyGILState_Release after the Python function returns.

Before moving on from what's inside glupy to what glupy code looks
like, you need to know what this example glupy-based translator
actually does. The problem this example tries to solve is
one that occurs frequently when using GlusterFS to store the code for
PHP Web applications. Often, such applications try to load literally
hundreds of include files every time a page is requested. Each include
file might exist in any of several include directories along a search
path. The example caches information about "positive lookups"
(that is, those that
succeeded) but not about "negative lookups" (which failed).

Although this
behavior makes sense for many applications, the performance impact for
many PHP applications can be severe. Without negative-lookup caching,
you're likely to search half of those directories in vain before finding
the one that contains each include file, every time the including page
is requested. (This pattern does occur in other environments as well,
including Python Web applications, but common PHP frameworks cause those
applications to be hit the hardest.) Just as the effects are severe,
the benefits of adding a negative-lookup cache can be significant. For
example, a C version of such a translator decreased average include-search
times nearly seven-fold. What could a Python version do?

This
is the function that gets called to look up a file, which is the core
functionality for this example. Entry to this function represents a
transition from C to Python, while its return represents a transition
back to C. Calls through the "dl" object—a handle to the C dynamic
library that supports glupy—also suspend the Python interpreter while
they run. The Python decorator syntax allows you to hide most of the
function-type details, and there's also a notable lack of type-conversion
code. Most of what's there is domain-specific code, not boiler plate
required by the infrastructure.

In the top half of this function, you simply check the cache to see
if you already know the requested file won't be there. If the cache
check succeeds, the lookup fails immediately, and you "unwind" the
translator stack to report that fact. As with the registration functions,
each operation type has its own specific wind (call downward) and unwind
(return upward) functions as well. This represents a temporary return
from the "Python world" to the "C world", and it's worth noting that
these transitions between worlds might occur seamlessly many times while
processing a single request. In particular, a common GlusterFS translator
idiom is for a completion callback on one request to initiate the next,
and if that request completes immediately (as done here), then you can
have multiple requests and completions all on the stack at once.

Returning to the code, if you do not find an entry in the cache (and you
already know it must not be in the standard positive-lookup cache or
else you wouldn't even have been called), you pass the request on to the
next translator using wind_lookup. When that next translator is done, it
returns control (through the glupy meta-translator) to
lookup_cbk. Here
you retrieve your request context, conveniently stashed in a dictionary
for you by lookup_fop, and use it to update the cache according to
whether the file was found.

There are a few other less relevant details of how this particular glupy
translator works, but that really is the meat of it. With less than a
hundred lines of Python code, including comments and empty lines, you can
add a significant piece of functionality to a real filesystem. But,
how well does it really work? As it turns out, it works very well; see
Table 1. A simple test reveals that the result is slower than the
C-based version of the same thing, but still more than four times as
fast as the baseline. Clearly, the fact that you're caching these results
matters more than what language you're using to do it.

Table 1. Results of Caching Failed-Lookup Requests

ms/lookup

minimum

average

maximum

99th percentile

no caching

0.368

6.898

16.286

9.702

C version

0.379

1.036

18.503

2.180

glupy version

0.381

1.527

21.163

2.916

As promising as these results are, they're more of a beginning than
an end. Glupy is still a very young project, and much remains to
be done. Support needs to be added for a few dozen more operation
types and several data structures. There still are more ways that
GlusterFS calls into translators and utility functions that translators
themselves call. There are many ways the glupy interface could be made
more convenient, and there are undoubtedly performance or concurrency
issues still to be resolved. The most important thing is that the basic
infrastructure for doing all of these things already exists, and not
just for GlusterFS translators. If even a highly multithreaded and
asynchronous program like this can take advantage of all that Python
has to offer, so can just about any other program. Thanks to Python's
extension/embedding interface and ctypes module, a "best of both
worlds"
approach to developing complex software is more achievable than most
people think.