Tuesday, August 30, 2011

Wrapping C++ Libraries with Reflection — Status Report One Year Later

Well over a year ago, work was started on the cppyy module which lives in the
reflex-support branch.
Since then, work has progressed at a varying pace and has included a recent
sprint in Düsseldorf, last July.

Let's first take a step back and recap why we're interested in doing this,
given that it is perfectly possible to use C++ through generated bindings and
cpyext.
cppyy makes use of reflection information generated for the C++ classes of
interest, and has that reflection information available at run time.
Therefore, it is able to open up complex C++ types to the JIT in a
conceptually similar manner as simple types are open to it.
This means that it is possible to get rid of a lot of the marshalling layers
when making cross-language calls, resulting in much lower call overhead than
is possible when going through the CPython API, or other methods of wrapping.

There are two problems that need to be solved: C++ language constructs need to
be presented on the Python side in a natural way; and cross-language impedance
mismatches need to be minimized, with some hints of the user if need be.
For the former, the list of mapped features has grown to a set that is
sufficient to do real work.
There is now support for:

builtin, pointer, and array types

namespaces, classes, and inner classes

global functions, global data

static/instance data members and methods

default variables, object return by value

single and multiple (virtual) inheritance

templated classes

basic STL support and pythonizations

basic (non-global) operator mapping

The second problem is harder and will always be an on-going process.
But one of the more important issues has been solved at the recent Düsseldorf
sprint, namely, that of reclaiming C++ objects instantiated from the Python
side by the garbage collector.

Performance has also improved, especially that of the nicer "pythonized"
interface that the user actually sees, although it still misses out on
about a factor of 2.5 in comparison to the lower-level interface (which has
gotten uglier, so you really don't want to use that).
Most of this improvement is due to restructuring so that it plays nicer with
the JIT and libffi, both of which themselves have seen improvements.

Work is currently concentrated on the back-ends: a CINT back-end is underway
and a LLVM/CLang pre-compiled headers (PCH) back-end is planned.
The latter is needed for this code to be released in the wild, rather than
just used in high energy physics (HEP), as that would be easier to support.
Also, within HEP, CLang's PCH are foreseen to be the future format of
reflection information.

At the end of the Düsseldorf sprint, we tried a little code that did something
actually "useful," namely the filling of a histogram with some random values.
We did get it to work, but trying cppyy on a large class library showed
that a good warning system for such things like missing classes was sorely
needed.
That has been added since, and revisiting the histogram example later, here is
an interesting note: the pypy-c run takes 1.5x the amount of time of that
of the compiled, optimized, C++ code.
The run was timed start to finish, including the reflection library loading
and JIT warm-up that is needed in the case of Python, but not for the compiled
C++ code.
However, in HEP, scientists run many short jobs while developing their
analysis codes, before submitting larger jobs on the GRID to run during lunch
time or overnight.
Thus, a more realistic comparison is to include the compilation time needed
for the C++ code and with that, the Python code needs only 55% of the time
required by C++.

The choice of a programming language is often a personal one, and such
arguments like the idea that C++ is hard to use typically do not carry much
weight with the in-crowd that studies quantum field dynamics for fun.
However, getting the prompt with your analysis results back faster is a sure
winner. We hope that cppyy will soon have progressed far enough to make it
useful first to particle physicists and then other uses for wrapping C++
libraries.

Wim Lavrijsen, Carl Friedrich Bolz, Armin Rigo

Well over a year ago, work was started on the cppyy module which lives in the
reflex-support branch.
Since then, work has progressed at a varying pace and has included a recent
sprint in Düsseldorf, last July.

Let's first take a step back and recap why we're interested in doing this,
given that it is perfectly possible to use C++ through generated bindings and
cpyext.
cppyy makes use of reflection information generated for the C++ classes of
interest, and has that reflection information available at run time.
Therefore, it is able to open up complex C++ types to the JIT in a
conceptually similar manner as simple types are open to it.
This means that it is possible to get rid of a lot of the marshalling layers
when making cross-language calls, resulting in much lower call overhead than
is possible when going through the CPython API, or other methods of wrapping.

There are two problems that need to be solved: C++ language constructs need to
be presented on the Python side in a natural way; and cross-language impedance
mismatches need to be minimized, with some hints of the user if need be.
For the former, the list of mapped features has grown to a set that is
sufficient to do real work.
There is now support for:

builtin, pointer, and array types

namespaces, classes, and inner classes

global functions, global data

static/instance data members and methods

default variables, object return by value

single and multiple (virtual) inheritance

templated classes

basic STL support and pythonizations

basic (non-global) operator mapping

The second problem is harder and will always be an on-going process.
But one of the more important issues has been solved at the recent Düsseldorf
sprint, namely, that of reclaiming C++ objects instantiated from the Python
side by the garbage collector.

Performance has also improved, especially that of the nicer "pythonized"
interface that the user actually sees, although it still misses out on
about a factor of 2.5 in comparison to the lower-level interface (which has
gotten uglier, so you really don't want to use that).
Most of this improvement is due to restructuring so that it plays nicer with
the JIT and libffi, both of which themselves have seen improvements.

Work is currently concentrated on the back-ends: a CINT back-end is underway
and a LLVM/CLang pre-compiled headers (PCH) back-end is planned.
The latter is needed for this code to be released in the wild, rather than
just used in high energy physics (HEP), as that would be easier to support.
Also, within HEP, CLang's PCH are foreseen to be the future format of
reflection information.

At the end of the Düsseldorf sprint, we tried a little code that did something
actually "useful," namely the filling of a histogram with some random values.
We did get it to work, but trying cppyy on a large class library showed
that a good warning system for such things like missing classes was sorely
needed.
That has been added since, and revisiting the histogram example later, here is
an interesting note: the pypy-c run takes 1.5x the amount of time of that
of the compiled, optimized, C++ code.
The run was timed start to finish, including the reflection library loading
and JIT warm-up that is needed in the case of Python, but not for the compiled
C++ code.
However, in HEP, scientists run many short jobs while developing their
analysis codes, before submitting larger jobs on the GRID to run during lunch
time or overnight.
Thus, a more realistic comparison is to include the compilation time needed
for the C++ code and with that, the Python code needs only 55% of the time
required by C++.

The choice of a programming language is often a personal one, and such
arguments like the idea that C++ is hard to use typically do not carry much
weight with the in-crowd that studies quantum field dynamics for fun.
However, getting the prompt with your analysis results back faster is a sure
winner. We hope that cppyy will soon have progressed far enough to make it
useful first to particle physicists and then other uses for wrapping C++
libraries.

5 comments:

nice result. Wrapping C++ code can be even more tiresome than C, especially with large code bases. This will be a very welcome tool.

This question has probably been answered before... but I ask anyway since I couldn't find the answer.

Can the jit information be saved, so it does not need to be worked out again? Assuming all of the dependencies have not changed (.py files, pypy itself, .so files etc). Maybe if location independent code can not be saved, then trace hints or some higher level structure could be saved to inform the jit about what traces to jit? That sounds like a solution to jit warm up for code that is used repeatedly.

and the conclusion there was that it is too hard to be of benefit because too many parts contain addresses or calculated variables that were turned into constants.

For our (HEP) purposes, it would be of limited benefit: in the development cycle, the .py's would change all the time, and it is a safe assumption that the user codes that are being developed are the most "hot." If there is anything in the supporting code that is "hot" (most likely in the framework) it'd be in C/C++ at that point anyway.

Rather, I'd like to have an easy way of letting the user determine which portions of the code will be hot. Saving not having to run a hot loop 1000x in interpreted mode before the JIT kicks in, is going to be more valuable in scientific codes where the hot loops tend to be blatantly obvious.

This is great. I have been looking for just such a tool to wrap C++ numerical code.

I guess I have two questions:1. Is there any documentation on how to use it?2. It is very important to be able to translate between NumPy data structure and C++ data structure for me, so is there any plan to make this easy?

ad 1) it's not at the level of being usable in a production environment. I have two known issues to resolve and probably some more unknowns. I've posted a description on pypy-dev, and I'm helping a few patient, very friendly users along. But actual documentation suggest a level of support that currently can't be offered, because all the current (and soon to disappear) caveats would need documenting as well.

ad 2) not sure what data translation you're thinking of, but in the CPython equivalent, support was added for the buffer interface and MemoryView. Those, or something similar, will be there so that numpy array's etc. can be build from return values, from public data members, and passed into function calls as arguments. Those are not translations, but rather extraction of the data pointers (which is typically intended and the most efficient, to be sure).