Monday, June 25, 2012

Architecture of Cppyy

The cppyy module makes it possible to call into C++ from PyPy through the Reflex package. Work started about two years ago, with a follow-up sprint a year later. The module has now reached an acceptable level of maturity and initial documentation with setup instructions, as well as a list of the currently supported language features, are now available here. There is a sizable (non-PyPy) set of unit and application tests that is still being worked through, not all of them of general applicability, so development continues its current somewhat random walk towards full language coverage. However, if you find that cppyy by and large works for you except for certain specific features, feel free to ask for them to be given higher priority.

Cppyy handles bindings differently than what is typically found in other tools with a similar objective, so this update walks through some of these differences, and explains why choices were made as they are.

The most visible difference, is from the viewpoint of the Python programmer interacting with the module. The two canonical ways of making Python part of a larger environment, are to either embed or extend it. The latter is done with so-called extension modules, which are explicitly constructed to be very similar in their presentation to the Python programmer as normal Python modules. In cppyy, however, the external C++ world is presented from a single entrance point, the global C++ namespace (in the form of the variable cppyy.gbl). Thus, instead of importing a package that contains your C++ classes, usage looks like this (assuming class MyClass in the global namespace):

>>>> import cppyy
>>>> m = cppyy.gbl.MyClass()
>>>> # etc.

This is more natural than it appears at first: C++ classes and functions are, once compiled, represented by unique linker symbols, so it makes sense to give them their own unique place on the Python side as well. This organization allows pythonizations of C++ classes to propagate from one code to another, ensures that all normal Python introspection (such as issubclass and isinstance) works as expected in all cases, and that it is possible to represent C++ constructs such as typedefs simply by Python references. Achieving this unified presentation would clearly require a lot of internal administration to track all C++ entities if they each lived in their own, pre-built extension modules. So instead, cppyy generates the C++ bindings at run-time, which brings us to the next difference.

Then again, that is not really a difference: when writing or generating a Python extension module, the result is some C code that consists of calls into Python, which then gets compiled. However, it is not the bindings themselves that are compiled; it is the code that creates the bindings that gets compiled. In other words, any generated or hand-written extension module does exactly what cppyy does, except that they are much more specific in that the bound code is hard-wired with e.g. fixed strings and external function calls. The upshot is that in Python, where all objects are first-class and run-time constructs, there is no difference whatsoever between bindings generated at run-time, and bindings generated at ... well, run-time really. There is a difference in organization, though, which goes back to the first point of structuring the C++ class proxies in Python: given that a class will settle in a unique place once bound, instead of inside a module that has no meaning in the C++ world, it follows that it can also be uniquely located in the first place. In other words, cppyy can, and does, make use of a class loader to auto-load classes on-demand.

If at this point, this all reminds you of a bit ctypes, just with some extra bells and whistles, you would be quite right. In fact, internally cppyy makes heavy use of the RPython modules that form the guts of ctypes. The difficult part of ctypes, however, is the requirement to annotate functions and structures. That is not very pleasant in C, but in C++ there is a whole other level of complexity in that the C++ standard specifies many low-level details, that are required for dispatching calls and understanding object layout, as "implementation defined." Of course, in the case of Open Source compilers, getting at those details is doable, but having to reverse engineer closed-source compilers gets old rather quickly in more ways than one. More generally, these implementation defined details prevent a clean interface, i.e. without a further dependency on the compiler, into C++ like the one that the CFFI module provides for C. Still, once internal pointers have been followed, offsets have been calculated, this objects have been provided, etc., etc., the final dispatch into binary C++ is no different than that into C, and cppyy will therefore be able to make use of CFFI internally, like it does with ctypes today. This is especially relevant in the CLang/LLVM world, where stub functions are done away with. To get the required low-level details then, cppyy relies on a back-end, rather than getting it from the programmer, and this is where Reflex (together with the relevant C++ compiler) comes in, largely automating this tedious process.

There is nothing special about Reflex per se, other than that it is relatively lightweight, available, and has proven to be able to handle huge code bases. It was a known quantity when work on cppyy started, and given the number of moving parts in learning PyPy, that was a welcome relief. Reflex is based on gccxml, and can therefore handle pretty much any C or C++ code that you care to throw at it. It is also technically speaking obsolete as it will not support C++11, since gccxml won't, but its expected replacement, based on CLang/LLVM, is not quite there yet (we are looking at Q3 of this year). In cppyy, access to Reflex, or any back-end for that matter, is through a thin C API (see the schematic below): cppyy asks high level questions to the back-end, and receives low-level results, some of which are in the form of opaque handles. This ensures that cppyy is not tied to any specific back-end. In fact, currently it already supports another, CINT, but that back-end is of little interest outside of High Energy Physics (HEP). The Python side is always the same, however, so any Python code based on cppyy does not have to change if the back-end changes. To use the system, a back-end specific tool (genreflex for Reflex) is first run on a set of header files with a selection file for choosing the required classes. This produces a C++ file that must be compiled into a shared library, and a corresponding map file for the class loader. These shared libraries, with their map files alongside, can be put anywhere as long as they can be located through the standard paths for the dynamic loader. With that in place, the setup is ready, and the C++ classes are available to be used from cppyy.

So far, nothing that has been described is specific to PyPy. In fact, most of the technologies described have been used for a long time on CPython already, so why the need for a new, PyPy-specific, module? To get to that, it is important to first understand how a call is mediated between Python and C++. In Python, there is the concept of a PyObject, which has a reference count, a pointer to a type object, and some payload. There are APIs to extract the low-level information from the payload for use in the C++ call, and to repackage any results from the call. This marshalling is where the bulk of the time is spent when dispatching. To be absolutely precise, most C++ extension module generators produce slow dispatches because they don't handle overloads efficiently, but even in there, they still spend most of their time in the marshalling code, albeit in calls that fail before trying the next overload. In PyPy, speed is gained by having the JIT unbox objects into the payload only, allowing it to become part of compiled traces. If the same marshalling APIs were used, the JIT is forced to rebox the payload, hand it over through the API, only to have it unboxed again by the binding. Doing so is dreadfully inefficient. The objective of cppyy, then, is to keep all code transparent to the JIT until the absolute last possible moment, i.e. the call into C++ itself, therefore allowing it to (more or less) directly pass the payload it already has, with an absolute minimal amount of extra work. In the extreme case when the binding is not to a call, but to a data member of an object (or to a global variable), the memory address is delivered to the JIT and this results in direct access with no overhead. Note the interplay: cppyy in PyPy does not work like a binding in the CPython sense that is a back-and-forth between the interpreter and the extension. Instead, it does its work by being transparent to the JIT, allowing the JIT to dissolve the binding. And with that, we have made a full circle: if to work well with the JIT, and in so doing achieve the best performance, you can not have marshalling or do any other API-based driving, then the concept of compiled extension modules is out, and the better solution is in run-time generated bindings.

That leaves one final point. What if you do want to present an extension module-like interface to programmers that use your code? But of course, this is Python: everything consists of first-class objects, whose behavior can be changed on the fly. In CPython, you might hesitate to make such changes, as every overlay or indirection results in quite a bit of overhead. With PyPy, however, these layers are all optimized out of existences, making that a non-issue.

This posting laid out the reasoning behind the organization of cppyy. A follow-up is planned, to explain how C++ objects are handled and represented internally.

Wim Lavrijsen


Fernando Perez said...

Thanks for this excellent post; any chance you'll make it to Scipy'2012 in Austin? I still remember your talk at one of the very old Scipys at Caltech as one of the best we've had; it would be great to catch up on the implications of your continued work on this front since. With the recent progress on cython and numpy/numba, fresh ideas on the C++ front are a great complement.

Sebastien Binet said...


I know you are quite attached to details so I was surprised by:

Reflex is based on gccxml, and can therefore handle pretty much any C or C++ code that you care to throw at it

but that's not true: gccxml being an interesting and useful hack of the C++ frontend of GCC, it can only correctly parse the subset of C which is valid C++.

here are a few links:

I discovered it the hard way...

Anonymous said...

@Sebastien, GCC-XML must be able to parse the entirety of C, since it has to support "extern C" blocks, mustn't it?

Sebastien Binet said...

"extern C" is "just" modifying the symbol mangling mechanism of the identifiers inside the extern-C block.

just try this example from the link I posted earlier:

struct A { struct B { int a; } b; int c; };
struct B b; // ill-formed: b has incomplete type (*not* A::B)

even if you create a foo.h like so:

#ifdef __cplusplus
extern "C" {

struct A { struct B { int a; } b; int c; };
struct B b;
#ifdef __cplusplus

and compile some main.c/cxx (which just includes that header) with gcc/g++, you'll get:

$ gcc main.c
$ echo $?

$ g++ main.cxx
In file included from main.cxx:2:0:
foo.h:7:12: error: aggregate ‘B b’ has incomplete type and cannot be defined
zsh: exit 1 g++ main.cxx

gccxml is using the C++ parser, thus my first remark :}

Sebastien Binet said...

Also, as we are in the nitpicking and parsing department, any C++ keyword which isn't a C one, can be correctly used in a C file, making that file landing in the valid-C-which-isnt-in-the-C++-subset-of-C
(e.g.: class,new,this to name a few of the most popular types or identifiers one can find in C codebases)

Wim Lavrijsen said...

@Fernando: no, no travel for me anytime soon. If Py4Science is still going, though, I can always walk down the hill, of course. :)

I've seen Numba (Stefan brought it up on the pypy-dev list), but it appears to be focused on C. With LLVM, we are using the AST directly. I don't think you can drive C++ through llvm-py.

@Sebastien: the "details" that you are missing are in that "pretty much any" is not the same as "all." Worse, Reflex has a whole toolchain of gccxml, genreflex, C++ compiler, and finally the Reflex API. You lose information at every step along the way. It's one more reason for CLang/LLVM, but as said, that's for Q3/2012.

Note though that there are two kinds of C headers that one may encounter. Those that are in a pure C environment, and those for mixed C/C++ use (e.g. Python.h and the system headers). In the former case, no-one would drag in the dependency on a C++ compiler, just to use Reflex. Using e.g. CFFI is a much better option. In the other case, there is no problem either way.


Anonymous said...

On a similar note, what's the state of embedding PyPy into C++ (or does cppyy make that case fully obsolete?)?

Wim Lavrijsen said...

@anonymous: there was a recent thread on pypy-dev, showing a successful embedding:

If done through C++, you can use the Python C-API (through cpyext), but AFAIK, that doesn't play nicely with threads yet.


Matthias said...

From my past experience wrapping a C++ library to python is a whole lot more than just being able to call functions and having objects.

For example using a binding generator like SWIG you need to annotate your source, because the source alone does not have sufficient information to generate proper bindings (at least no bindings that feel python-like).

So I am wondering how Cppyy behaves in this area.

E.g. how does this play with templates? I will probably still need to define up-front which instantiations I need to be available in python?

How does it deal with object ownership? E.g. what happens if the C++ code decides to delete an object that python still points to? Or how are shared pointers dealt with?

How is type mapping handled? E.g. you might want to call functions taking MyString with "standard" python strings instead of having to construct MyString() objects first and then passing those.

Wim Lavrijsen said...

@Matthias: there are several follow-up posts planned to explain everything in detail, so just a few quick answers now.

Pythonizations are handled automatically based on signature, otherwise by allowing user defined pythonization functions.

Template instantiations are still needed in the Reflex world, but with CLang/LLVM, those can be generated by the backend (CINT can perform the instantiations automatically as well).

Object ownership can be handled heuristically if the C++ side behaves (this is e.g. the case for most of ROOT). If that's not the case, extra annotations per function or per object are needed. In addition, communication with the memory regulator (a tracker of all proxies on the python side) through a callback on both sides is possible.

Type mappings happen through custom converters that are to be coded up in either Python or C++. Standard mappings (e.g. the use of std::string in the way that you describe for MyString) have been added by default. Type mappings can also be done based on signature in some cases.

Not everything of the above is implemented in cppyy yet, but all have been solved before in PyROOT on CPython. It's just a matter of time to implement things for cppyy. The important point, however, is that none of this needs a separate language: most of it can be handled automatically, with a little work of the programmer in python proper or, worst case, with a C++ helper.


Anonymous said...

Hmm is anyone else experiencing problems with the pictures on this blog loading?
I'm trying to find out if its a problem on my end or if it's the
blog. Any feed-back would be greatly appreciated.

my site ... Splendyr REview - -